Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

ECS · 9 hours ago

Senior Site Reliability Engineer

ECS is a leading mid-sized provider of technology services to the United States Federal Government. They are seeking a Senior Site Reliability Engineer to define, implement, and grow their SRE practice, ensuring the reliability and performance of critical production environments.

Artificial Intelligence (AI)Cloud InfrastructureComplianceConsultingCyber SecurityInformation TechnologyMachine LearningSecuritySoftware
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Play a key role in defining, implementing, and growing our SRE practice to ensure the reliability, availability, and performance of our critical production environments
Contribute to a culture of continuous improvement, identifying areas for enhancement, and driving initiatives to improve system reliability, scalability, and efficiency
Demonstrated hands-on experience designing, implementing, and maintaining solutions to ensure that systems, including infrastructure and applications, are resilient, highly available, and performant
Play a critical role in defining and measuring the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our solution
Setting up comprehensive logging, monitoring, and alerting solutions using the Elastic stack and other tools as necessary to ensure the continuous performance of services
Respond to incidents, perform root cause analyses, and implement solutions to prevent reoccurrences
Work in close collaboration with other SRE team members, developers, testers, infrastructure engineers, DevOps engineers, and other stakeholders to integrate reliability and observability into the software development lifecycle

Qualification

Site Reliability EngineeringSLOsSLIsSRE toolsObservability solutionsCloud platformsProgrammingScriptingMicroservicesCI/CD principlesConfiguration managementInfrastructure as codeAnalytical skillsProblem-solving skillsCollaboration skillsDetail-oriented

Required

Must be a US citizen with the ability to obtain Public Trust Suitability
6+ years of experience as a Site Reliability Engineer (SRE) or equivalent
6+ years of demonstrated experience designing, implementing, and maintaining observability solutions to include logging, monitoring, and alerting
6+ years of hands-on experience with SRE tools (e.g., Elastic, Prometheus, Grafana, Splunk, etc.)
3+ years defining and measuring SLOs and SLIs
3+ years of relevant experience using cloud platforms (AWS GovCloud preferred)
3+ years of hands-on programming or scripting (e.g., Python, Bash, etc.)
Strong knowledge of microservices, containerization, and orchestration tools (Docker, Kubernetes)
Proven ability to collaborate with cross-functional teams (development, testing, and product) to integrate reliability and observability into the software development lifecycle
Strong problem-solving and analytical skills
Proactive, detail-oriented approach to identifying inefficiencies and implementing improvements
Proficient in developing Synthetic monitoring scripts using typescript

Preferred

Bachelor's degree in Computer Science, Engineering, or a related field (or 4 additional years of related experience)
Experience working in an Agile/SAFe environment using ALM tools (Jira, Confluence, or similar)
Strong understanding of CI/CD principles and platforms (Jenkins, CircleCI, GitLab, GitHub Actions, Argo, Travis CI, etc.)
Expertise in configuration management tools (Ansible, Puppet, Chef)
Experience with infrastructure as code (Terraform, CloudFormation)
In-depth understanding of networking, security, and system administration of Linux operating systems
Knowledge of version control platforms and branching strategies
Knowledge of disaster recovery planning, backup strategies, and data replication
Experience supporting large Federal programs ($200M+)

Company

ECS is a fast-growing 4,000-person, $1.2B provider of advanced technology solutions for federal civilian, defense, intelligence, and commercial customers.

Funding

Current Stage
Late Stage
Total Funding
unknown
2018-01-31Acquired
2015-04-10Private Equity

Leadership Team

leader-logo
Keith McCloskey
VP / Chief Technology Officer
linkedin
leader-logo
Ryan Garner
Chief Financial Officer
linkedin
Company data provided by crunchbase