Lead Site Reliability Engineer - Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

Milestone Technologies, Inc. · 5 hours ago

Lead Site Reliability Engineer - Infrastructure

Milestone Systems, Inc. is seeking a Lead Site Reliability Engineer (Infrastructure) to join their fast-moving VSaaS engineering organization. This role is responsible for technical leadership and operational execution of the Infrastructure SRE team, focusing on the reliability, scalability, and operability of shared platforms and production systems while mentoring senior engineers.

Application Performance ManagementConsumer ElectronicsInformation Technology
check
H1B Sponsor Likelynote

Responsibilities

Operate and evolve large-scale distributed systems, anticipating failure modes and proactively mitigating risks across production environments, while owning day-to-day production operations, including monitoring, alert triage, incident response, post-incident analysis, and critical incident coordination and documentation
Lead the design, build, and implementation of automation, orchestration, and operational tooling to improve efficiency, reliability, signal-to-noise ratio, and reduce recurring issues, minimizing service-impacting events
Set technical direction and influence platform strategy by defining platform architecture, system design, and documentation to guide development, testing, deployment, and long-term maintenance of complex distributed systems
Establish and enforce standards, operational rigor, and best practices for deploying, monitoring, managing, and operating cloud-native and distributed infrastructure environments
Lead the adoption and execution of modern CI/CD, GitOps, and cloud-native infrastructure practices, ensuring reliable, scalable, and traceable software and infrastructure releases
Mentor and develop senior and staff engineers, reinforcing SRE principles, DevOps practices, accountability, and operational excellence across the Infrastructure SRE team
Collaborate closely with product and engineering stakeholders, advocating for an SRE mindset and system-level thinking to maximize reliability, performance, availability, security, and scalability across shared platforms and services
Other duties as assigned are absorbed into the above ownership and operational responsibilities

Qualification

Site Reliability EngineeringCloud InfrastructureAutomation ToolingGolangPythonDistributed SystemsCI/CD PipelinesGitOpsTerraformDockerKubernetesObservability ToolsSoft Skills

Required

10+ years of experience in site reliability engineering, infrastructure, or systems engineering, with deep ownership of large-scale production systems and demonstrated leadership of SRE or infrastructure teams, including setting technical direction and mentoring senior engineers
Strong hands-on experience designing and building automation and operational tooling using Golang and/or Python, with expert-level proficiency in Linux/Unix systems, shell scripting, and production troubleshooting
Advanced expertise in cloud-native and IaaS architectures, distributed systems, and container orchestration in production environments, including compliance, security, and network considerations
Expertise in architecting modular Terraform frameworks and Infrastructure-as-code (IaC) design patterns
Deep understanding of SRE and DevOps principles, including incident management, SLA/SLO ownership, automation, reliability engineering practices and leading incident response with post-incident analysis and preventive improvements
Strong experience with CI/CD pipelines, GitOps workflows, release tooling, and modern cloud-native infrastructure practices, ensuring reliable and traceable software and infrastructure changes
Hands-on experience operating Docker and Kubernetes environments, observability platforms (logging, monitoring, alerting), and SQL/NoSQL databases (e.g., Postgres, MongoDB, Graph DB), including performance tuning and operational troubleshooting

Preferred

Subject matter expertise in Google Cloud preferred; experience with other public cloud providers is also valuable
Demonstrated expertise in microservices lifecycle management, including integration, testing, deployment, and operational best practices, supported by advanced knowledge of software release tooling and CI/CD platforms such as GitLab, Jenkins, Cloud Build, ArgoCD, and Spinnaker
Deep understanding of the Docker and Kubernetes ecosystem, including orchestration, cluster management, and image lifecycle optimization
Strong experience with observability, logging, and monitoring tools such as ELK Stack, Prometheus, Stackdriver, Datadog, New Relic, or Dynatrace
Hands-on experience with algorithms, data structures, complexity analysis, and software/system design for large-scale distributed environments
Experience driving automation for operational efficiency, signal noise reduction, recurring issue mitigation, performance testing, capacity planning, and system optimization in production environments
Experience implementing security best practices and compliance considerations in infrastructure and platform design, along with the ability to influence cross-functional teams, evangelize SRE and DevOps practices, and foster a culture of reliability and operational excellence

Benefits

Medical/dental benefits
FSA or HSA
401k with 6% Safe Harbor employer match
Paid parental leave
Generous PTO (20 days' vacation, 10 days paid sick time, and 12 company holidays)
Fully paid Short Term disability policy
Fully paid Long Term disability policy
Life Insurance

Company

Milestone Technologies, Inc.

company-logo
Milestone Technologies is a global IT Services and Digital Solutions company based in Silicon Valley that helps hundreds of leading corporations deliver technology around the globe.

H1B Sponsorship

Milestone Technologies, Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (4)
2022 (2)
2021 (3)
2020 (5)

Funding

Current Stage
Late Stage
Total Funding
$42.5M
Key Investors
H.I.G. Capital
2022-12-13Acquired
2015-08-11Private Equity· $42.5M

Leadership Team

leader-logo
Sameer Kishore
President and Chief Executive Officer
linkedin
leader-logo
Mayank K Agrawal
Chief Financial Officer
linkedin
Company data provided by crunchbase