Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

LMI · 3 hours ago

Site Reliability Engineer

LMI is a digital solutions provider focused on enhancing government outcomes through technology and innovation. They are seeking a Site Reliability Engineer to ensure the reliability and operational integrity of the Holistic Health & Fitness Management System hosted in Army GovCloud, while collaborating with various technical teams to implement automation and monitoring solutions.

AnalyticsConsultingInformation TechnologyLogisticsManagement ConsultingProfessional Services
check
Comp. & Benefits
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Monitor the health, performance, and availability of H2FMS applications, services, APIs, and data services in Army GovCloud
Troubleshoot system issues across application, data, and infrastructure layers
Implement reliability patterns such as redundancy, graceful degradation, and failover strategies
Support performance optimization activities based on monitoring metrics and trends
Manage user access controls, role-based permissions, and environment access configurations
Maintain, monitor, and archive system logs, audit logs, and access logs to support RMF and cATO requirements
Support ISSO and Cybersecurity teams in log retrieval, incident investigations, and audit preparation
Develop and maintain automation scripts to improve environment stability, operational workflows, and deployment reliability
Collaborate with DevSecOps engineers to integrate automated runtime checks, monitoring, and health checks within CI/CD pipelines
Assist in implementing automated scaling, alerting, and self-healing mechanisms
Participate in incident response activities, including detection, diagnosis, escalation, mitigation, and documentation
Coordinate with cybersecurity teams during security events or anomalies
Conduct root-cause analysis and contribute to long-term corrective actions
Maintain environment configuration inventories related to access, logging, monitoring, and deployment parameters
Support configuration management, patch activities, and version control for infrastructure and application components
Collaborate with the Cloud Architect on environment design updates and capacity planning
Document system configurations, access processes, log retention procedures, and environment health dashboards
Support the ISSM and ISSO teams in continuous monitoring package updates and RMF documentation
Maintain audit-ready artifacts related to reliability operations and environment management

Qualification

Cloud operationsSite Reliability EngineeringAutomation toolsCloud monitoring toolsCI/CD pipelinesIncident responseBachelor’s degreeCybersecurity familiarityContainer orchestrationCertifications

Required

Bachelor's degree in information technology, Computer Science, Engineering, Cybersecurity, or a related field
3–6 years of experience in cloud operations, SRE, DevOps, or system administration roles
Hands-on experience with cloud monitoring, logging, and performance management tools (AWS CloudWatch, Azure Monitor, ELK/Splunk, Prometheus/Grafana, etc.)
Experience with automation tools (Python, Bash, Terraform, Ansible, etc.)
Familiarity with RMF, Zero Trust, and DoW cloud security requirements
Understanding of CI/CD pipelines and deployment processes
Ability to obtain and maintain a DoD Secret clearance
Location: Remote
Travel: 1–2 trips per quarter to Fort Eustis, VA or LMI HQ in Tysons, VA

Preferred

Experience supporting DoW programs or operating in secure cloud environments (AWS GovCloud, Azure IL4/IL5, cARMY)
Experience with container orchestration (Kubernetes/EKS/AKS)
Familiarity with incident response processes and SRE best practices (SLOs, SLIs, error budgets)
Certifications such as AWS SysOps, AWS Cloud Practitioner, Azure Administrator, or equivalent

Company

LMI is a consulting firm dedicated to improving the management of government.

Funding

Current Stage
Late Stage
Total Funding
$0.25M
Key Investors
Mission Daybreak
2022-09-19Grant· $0.25M
2022-07-12Private Equity
2020-12-21Acquired

Leadership Team

leader-logo
Doug Wagoner
Chief Executive Officer
linkedin
leader-logo
Joshua Wilson
President - Markets, Growth and Technology
linkedin
Company data provided by crunchbase