Apply on Employer Site

Milestone Systems · 20 hours ago

Lead Site Reliability Engineer - Infrastructure

United States

Full-time

Remote

Lead/Staff

$160K/yr - $180K/yr

10+ years exp

Milestone Systems is seeking a Lead Site Reliability Engineer (Infrastructure) to join their fast-moving VSaaS engineering organization. This role involves technical leadership and operational execution of the Infrastructure SRE team, ensuring the reliability, scalability, and operability of the platform and production systems while mentoring senior and staff engineers.

EventsInformation ServicesSoftwareTrainingVideo

H1B Sponsor Likely

Responsibilities

Operate and evolve large-scale distributed systems, anticipating failure modes and proactively mitigating risks across production environments, while owning day-to-day production operations, including monitoring, alert triage, incident response, post-incident analysis, and critical incident coordination and documentation

Lead the design, build, and implementation of automation, orchestration, and operational tooling to improve efficiency, reliability, signal-to-noise ratio, and reduce recurring issues, minimizing service-impacting events

Set technical direction and influence platform strategy by defining platform architecture, system design, and documentation to guide development, testing, deployment, and long-term maintenance of complex distributed systems

Establish and enforce standards, operational rigor, and best practices for deploying, monitoring, managing, and operating cloud-native and distributed infrastructure environments

Lead the adoption and execution of modern CI/CD, GitOps, and cloud-native infrastructure practices, ensuring reliable, scalable, and traceable software and infrastructure releases

Mentor and develop senior and staff engineers, reinforcing SRE principles, DevOps practices, accountability, and operational excellence across the Infrastructure SRE team

Collaborate closely with product and engineering stakeholders, advocating for an SRE mindset and system-level thinking to maximize reliability, performance, availability, security, and scalability across shared platforms and services

Qualification

Site Reliability EngineeringCloud InfrastructureGolangPythonCI/CDGitOpsTerraformDockerKubernetesLinux/Unix SystemsObservability ToolsSQL/NoSQL DatabasesSoft Skills

Required

10+ years of experience in site reliability engineering, infrastructure, or systems engineering, with deep ownership of large-scale production systems and demonstrated leadership of SRE or infrastructure teams, including setting technical direction and mentoring senior engineers

Strong hands-on experience designing and building automation and operational tooling using Golang and/or Python, with expert-level proficiency in Linux/Unix systems, shell scripting, and production troubleshooting

Advanced expertise in cloud-native and IaaS architectures, distributed systems, and container orchestration in production environments, including compliance, security, and network considerations

Expertise in architecting modular Terraform frameworks and Infrastructure-as-code (IaC) design patterns

Deep understanding of SRE and DevOps principles, including incident management, SLA/SLO ownership, automation, reliability engineering practices and leading incident response with post-incident analysis and preventive improvements

Strong experience with CI/CD pipelines, GitOps workflows, release tooling, and modern cloud-native infrastructure practices, ensuring reliable and traceable software and infrastructure changes

Hands-on experience operating Docker and Kubernetes environments, observability platforms (logging, monitoring, alerting), and SQL/NoSQL databases (e.g., Postgres, MongoDB, Graph DB), including performance tuning and operational troubleshooting

Preferred

Subject matter expertise in Google Cloud preferred; experience with other public cloud providers is also valuable

Demonstrated expertise in microservices lifecycle management, including integration, testing, deployment, and operational best practices, supported by advanced knowledge of software release tooling and CI/CD platforms such as GitLab, Jenkins, Cloud Build, ArgoCD, and Spinnaker

Deep understanding of the Docker and Kubernetes ecosystem, including orchestration, cluster management, and image lifecycle optimization

Strong experience with observability, logging, and monitoring tools such as ELK Stack, Prometheus, Stackdriver, Datadog, New Relic, or Dynatrace

Hands-on experience with algorithms, data structures, complexity analysis, and software/system design for large-scale distributed environments

Experience driving automation for operational efficiency, signal noise reduction, recurring issue mitigation, performance testing, capacity planning, and system optimization in production environments

Experience implementing security best practices and compliance considerations in infrastructure and platform design, along with the ability to influence cross-functional teams, evangelize SRE and DevOps practices, and foster a culture of reliability and operational excellence

Benefits

Medical/dental benefits

FSA or HSA

401k with 6% Safe Harbor employer match

Paid parental leave

Generous PTO (20 days' vacation, 10 days paid sick time, and 12 company holidays)

Fully paid Short Term disability policy

Fully paid Long Term disability policy

Life Insurance

Company

Milestone Systems

Glassdoor3.8

Milestone Systems develops open platform IP video management software, delivering easy-to-manage surveillance solutions for enterprises.

Founded in 1998

Brøndby Strand, Hovedstaden, DNK

1001-5000 employees

http://www.milestonesys.com

H1B Sponsorship

Milestone Systems has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (2)

Funding

Current Stage

Late Stage

Total Funding

$27.01M

Key Investors

Index Ventures

2014-06-12Acquired

2014-02-01Seed· $0.01M

2008-07-07Series A· $27M

Leadership Team

Thomas Jensen

Chief Executive Officer

Henrik Friborg Jacobsen

Co-Founder

Recent News

PR Newswire

Genesis Security Reduces False Alarms by 62% with Milestone and Actuate AI Integration

2026-01-14

Zawya.com

Milestone Systems sets the stage for AI-driven safety innovation at Intersec Dubai 2026

2026-01-09

Zawya.com

Milestone Systems to launch generative AI plug-in for XProtect, Streamlining Video Review and Response

2025-11-27

Company data provided by crunchbase