Apply on Employer Site

Karsun Solutions · 19 hours ago

Site Reliability Engineer Lead

Herndon, VA, US

Full-time

Onsite

Senior Level, Lead/Staff

8+ years exp

Karsun Solutions is seeking a Site Reliability Engineer Lead to build and support production environments while driving service level objectives for multiple applications. The role involves automating operations, managing infrastructure, and collaborating with development teams to ensure system reliability and performance.

ConsultingGovernmentInformation Technology

Comp. & Benefits

No H1B

U.S. Citizen Only

Responsibilities

Develop and maintain applications on Kubernetes container platform using Helm charts, K8s configurations, and GitOps workflows for repeatable and consistent deployments

Monitor and troubleshoot complex issues involving container networking, zero-downtime availability, scaling behavior, and cluster reliability

Architect, deploy, and optimize resilient cloud-native systems in AWS using services EKS, Lambda, RDS, Aurora, S3, and VPC networking components

Build self-service deployment capabilities for development teams, enabling application deployments through standardized pipelines

Integrate security scanning tools (SAST, SCA, secrets detection, container scanning) into the build pipeline to ensure DevSecOps alignment

Implement automated release strategies using blue/green, canary, feature flags, and zero-downtime deployment patterns

Implement all infrastructure and configuration using Terraform, CloudFormation, CDK, or Ansible, ensuring consistent and repeatable deployments

Develop robust Python, or Bash scripts to streamline operational tasks

Implement and manage observability stacks using CloudWatch, Prometheus, Grafana, ELK/Opensearch, Jaeger/Zipkin, or DataDog for full stack visibility

Develop proactive alerting strategies to minimize false positives and ensure actionable notifications; establish performance dashboards to measure system reliability and drive continuous improvement

Conduct root cause analysis (RCA) for production incidents and drive long-term remediation through automated guardrails

Partner with security teams to implement vulnerability management, patch automation, and continuous compliance monitoring

Lead blameless post-incident reviews and drive implementation of resilient engineering patterns such as retries, graceful degradation, chaos testing, and redundancy strategies

Work closely with software engineers, architects, product owners, and security stakeholders to design reliable systems that support mission-critical government applications

Coach development teams on cloud-native principles, observability, performance tuning, and infrastructure best practices

Advocate for SRE/DevOps culture, driving automation-first mindset and continuous improvement across the engineering organization

Qualification

KubernetesAWSPythonTerraformCI/CDObservability toolsService mesh technologiesInfrastructure as codeAnalytical skillsProblem-solving skillsAttention to detail

Required

Bachelor's degree in computer science, Engineering, or a related field and 8-10 years of relevant experience

5+ years in SRE, Platform Engineering and DevOps supporting operations and maintenance for cloud-native, scalable, and highly available applications

Expertise in scripting (Python, Bash, Go preferred)

Deep understanding of cloud computing platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Kubernetes)

Experience with monitoring, logging, and observability tools like DataDog, AWS Cloudwatch, Splunk etc

Experience with service mesh technologies (Istio, Linkerd) and GitOps platforms (ArgoCD/FluxCD)

Knowledge of infrastructure as code tools (e.g., Terraform, Ansible) and CI/CD pipelines

Experience deploying enterprise software within AWS Services such as EKS, RDS, EC2, Elastic Load Balancers, Lambda

Strong problem-solving and analytical skills, with a keen attention to detail

Ability to obtain and maintain a Public Trust clearance

Company

Karsun Solutions

Karsun Solutions LLC specializes in Enterprise Modernization and Transformation solutions for the Federal Government.

Founded in 2009

Herndon, Virginia, USA

201-500 employees

http://www.karsun-llc.com/

Funding

Current Stage

Growth Stage

Leadership Team

Kelly Demaitre

Chief People Officer

Company data provided by crunchbase