Associate Principal, Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Chamberlain Advisors · 1 day ago

Associate Principal, Site Reliability Engineer

Chamberlain Advisors is partnering with a leading equity derivatives clearing organization to hire a highly skilled Senior Site Reliability Engineer (SRE) to support the reliability, availability, and performance of their next-generation cloud platforms. This role is critical to ensuring systems operate at scale with high resiliency while enabling development teams to deliver features quickly and safely.

Staffing & Recruiting

Responsibilities

Ensure the availability, performance, scalability, and reliability of production systems supporting Chamberlain’s cloud-based platforms
Partner with software development, operations, and infrastructure teams to design and operate production-ready services
Design and implement automation to improve incident response, reduce manual effort, and prevent recurring issues
Develop, maintain, and continuously improve runbooks and operational documentation for service outages and degradations
Assess production readiness of services by evaluating reliability, observability, scalability, and operational risk
Define, implement, and monitor key operational metrics related to system health, performance, and capacity
Architect, develop, and maintain shared reliability services and tooling used across the organization
Participate in incident management, root cause analysis, and post-incident reviews with a focus on long-term remediation
Contribute to continuous improvement through retrospectives, technical research, code reviews, and design discussions
Influence delivery timelines and technical expectations by identifying reliability risks and improvement opportunities
Mentor junior engineers and share knowledge through documentation and collaborative team engagement
Support Agile/Scrum delivery by contributing to sprint planning, backlog refinement, and story development

Qualification

Site Reliability EngineeringCloud PlatformsObservability & AIOpsProgramming & AutomationContainers & OrchestrationDistributed SystemsCI/CD & DevOpsResilience EngineeringAgile/ScrumAnalytical SkillsCommunication SkillsDocumentation SkillsTeam Collaboration

Required

Bachelor's degree in Management Information Systems, Computer Science, or a related field
Minimum of 4+ years of experience in Site Reliability Engineering, DevOps, or a related engineering discipline
Proven experience supporting large-scale, distributed, production systems
Experience working in Agile/Scrum environments
Cloud Platforms: Public cloud experience with AWS (preferred), Azure, or GCP
Observability & AIOps: Monitoring, logging, alerting, and predictive analytics using tools such as Splunk, Datadog, AppDynamics, Prometheus, Grafana, Sysdig, or StackDriver
Programming & Automation: Proficiency in Python, Java, Go, or Bash for automation and tooling
Containers & Orchestration: Experience with Kubernetes and container platforms such as Docker, Rancher, or Mesos
Distributed Systems: Messaging and event-driven platforms including Kafka, RabbitMQ, or ActiveMQ
CI/CD & DevOps: Pipeline and deployment tools such as Jenkins, Harness, Travis CI, AWS CodeBuild/CodePipeline, or Appveyor
AI Enablement: Familiarity using Large Language Models (LLMs) to automate SRE workflows (e.g., scripting, incident analysis, reporting)
Resilience Engineering: Foundational exposure to Chaos Engineering and fault-injection tools (e.g., Gremlin, Chaos Monkey, AWS FIS)

Benefits

Comprehensive medical, dental, vision, PTO, paid holidays, 401(k) with match, professional development, collaborative culture, work-life balance

Company

Chamberlain Advisors

twitter
company-logo
Chamberlain is a talent acquisition firm for organizations that seek human capital differentiation.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Brad Thomas
Managing Partner, Executive Search
linkedin
Company data provided by crunchbase