LMI · 3 hours ago
Site Reliability Engineer
LMI is a digital solutions provider focused on enhancing government outcomes through technology and innovation. They are seeking a Site Reliability Engineer to ensure the reliability and operational integrity of the Holistic Health & Fitness Management System hosted in Army GovCloud, while collaborating with various technical teams to implement automation and monitoring solutions.
AnalyticsConsultingInformation TechnologyLogisticsManagement ConsultingProfessional Services
Responsibilities
Monitor the health, performance, and availability of H2FMS applications, services, APIs, and data services in Army GovCloud
Troubleshoot system issues across application, data, and infrastructure layers
Implement reliability patterns such as redundancy, graceful degradation, and failover strategies
Support performance optimization activities based on monitoring metrics and trends
Manage user access controls, role-based permissions, and environment access configurations
Maintain, monitor, and archive system logs, audit logs, and access logs to support RMF and cATO requirements
Support ISSO and Cybersecurity teams in log retrieval, incident investigations, and audit preparation
Develop and maintain automation scripts to improve environment stability, operational workflows, and deployment reliability
Collaborate with DevSecOps engineers to integrate automated runtime checks, monitoring, and health checks within CI/CD pipelines
Assist in implementing automated scaling, alerting, and self-healing mechanisms
Participate in incident response activities, including detection, diagnosis, escalation, mitigation, and documentation
Coordinate with cybersecurity teams during security events or anomalies
Conduct root-cause analysis and contribute to long-term corrective actions
Maintain environment configuration inventories related to access, logging, monitoring, and deployment parameters
Support configuration management, patch activities, and version control for infrastructure and application components
Collaborate with the Cloud Architect on environment design updates and capacity planning
Document system configurations, access processes, log retention procedures, and environment health dashboards
Support the ISSM and ISSO teams in continuous monitoring package updates and RMF documentation
Maintain audit-ready artifacts related to reliability operations and environment management
Qualification
Required
Bachelor's degree in information technology, Computer Science, Engineering, Cybersecurity, or a related field
3–6 years of experience in cloud operations, SRE, DevOps, or system administration roles
Hands-on experience with cloud monitoring, logging, and performance management tools (AWS CloudWatch, Azure Monitor, ELK/Splunk, Prometheus/Grafana, etc.)
Experience with automation tools (Python, Bash, Terraform, Ansible, etc.)
Familiarity with RMF, Zero Trust, and DoW cloud security requirements
Understanding of CI/CD pipelines and deployment processes
Ability to obtain and maintain a DoD Secret clearance
Location: Remote
Travel: 1–2 trips per quarter to Fort Eustis, VA or LMI HQ in Tysons, VA
Preferred
Experience supporting DoW programs or operating in secure cloud environments (AWS GovCloud, Azure IL4/IL5, cARMY)
Experience with container orchestration (Kubernetes/EKS/AKS)
Familiarity with incident response processes and SRE best practices (SLOs, SLIs, error budgets)
Certifications such as AWS SysOps, AWS Cloud Practitioner, Azure Administrator, or equivalent
Company
LMI
LMI is a consulting firm dedicated to improving the management of government.
Funding
Current Stage
Late StageTotal Funding
$0.25MKey Investors
Mission Daybreak
2022-09-19Grant· $0.25M
2022-07-12Private Equity
2020-12-21Acquired
Leadership Team
Recent News
Washington Technology
2025-10-03
2025-10-02
Washington Technology
2025-08-09
Company data provided by crunchbase