Site Reliability Engineer - Remote jobs in United States
cer-icon
Apply on Employer Site
company-logo

ICF · 3 hours ago

Site Reliability Engineer - Remote

ICF is a mission-driven company focused on improving lives and making the world a better place. The role of Site Reliability Engineer involves establishing a culture of improvement in observability and reliability while working closely with software engineering teams to ensure the reliability of applications and services supporting critical programs for the Centers for Medicare & Medicaid Services (CMS).

ConsultingInformation TechnologyProfessional Services
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Define and maintain SLIs, SLOs, and SLAs for the Internet-based Quality Improvement and Evaluation System (iQIES) application
Performance tuning that will model load scenarios, forecasting capacity, and optimize scaling strategies
Design and optimize the observability stack through New Relic, CloudWatch, and Jenkins CI/CD pipelines
Participate in root cause analysis for operational issues and improve incident response process
Participate in creating, monitoring, and optimizing actionable alerts to respond to issues in a timely manner
Develop tools and scripts
Develop and maintain Jenkins CI/CD pipelines, using declarative Jenkinsfiles and foundational Groovy for pipeline logic and enhancements
Deploy services to Fargate, EKS, Lambda, Airflow, Databases
Manage security groups and access controls. Thoroughly understand fundamentals like security groups, IAM, managing RDS
Apply patch management and hardening practices
Align with DevOps and Technical Leads to ensure overall strategy
Actively participate in releases and product launches with expectation of being online during release windows

Qualification

AWSSRECI/CDKubernetesTerraformDockerSQLNew RelicCloudWatchJenkinsAnalytical skillsProblem-solving skillsCommunication skillsTime managementOrganizational skills

Required

5+ years experience in a software development environment and a Bachelor's degree; OR 3+ years experience in a software development environment and a Master's degree
5+ years supporting a high‑availability production environment (cloud or on‑prem)
3+ years of working in a SRE role in a large scale cloud implementing high availability and scalability
3+ years of experience focused on SRE, DevOps, or Platform Engineering
Must be able to obtain and maintain a public trust clearance
Candidate must reside in the US, be authorized to work in the US, and work must be performed in the US
Must have lived in the US 3 full years out of the last 5 years

Preferred

Previous work in a regulated healthcare or federal agency environment
Full stack web development experience
Expert in deployment techniques to minimize down-time like Blue-Green, Canary, A/B testing approaches, and zero downtime deployments
Understanding of security groups and access controls
Experience with Atlassian tooling such as Jira and Confluence

Company

ICF is a global consulting and technology services provider focused on making big things possible for our clients.

Funding

Current Stage
Public Company
Total Funding
$59M
Key Investors
New York State Department of TransportationU.S. Environmental Protection Agency
2023-02-13Grant· $29M
2021-03-15Grant· $30M
2006-09-28IPO

Leadership Team

leader-logo
John Wasson
Chairman, President and Chief Executive Officer
linkedin
leader-logo
James Morgan
Chief Operating Officer and EVP
linkedin
Company data provided by crunchbase