Director, Site Reliability Engineering jobs in United States
cer-icon
Apply on Employer Site
company-logo

Leadership Triangle · 7 hours ago

Director, Site Reliability Engineering

Fidelity is a privately held company focused on making financial expertise broadly accessible. The Director of Site Reliability Engineering will manage and lead a team of SREs and Production Support Engineers, ensuring the reliability and availability of Fidelity’s systems through automation and best practices in resiliency engineering.

ConsultingNon ProfitTraining
badNo H1Bnote

Responsibilities

Help define and execute a comprehensive reliability and observability strategy, ensuring that Fidelity’s systems are always available when our customers need them
Bring together technical, procedural, and financial data to reduce toil and increase efficiency
You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers & Production Support Engineers
Troubleshoot stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers

Qualification

AWSKubernetesCloud InfrastructureObservabilityInfrastructure as CodeDevOpsScripting LanguagesRoot Cause AnalysisCommunication SkillsTeam Collaboration

Required

Bachelor's degree or higher in a technology related field (e.g. Engineering, Computer Science, etc.) required
10+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale
5+ years of experience with AWS
3+ years of experience with Kubernetes container orchestration (EKS)
Experience operating and implementing distributed & highly concurrent service-based architectures, including microservices, containerized services, and/or serverless architectures
Thought leadership and an ability to plan and drive complex initiatives using agile principles
Ability to triage, execute root cause analysis, and be decisive under pressure
Strong understanding across cloud infrastructure components (server, storage, network, data, and applications) to deliver end-to-end Cloud Infrastructure architectures and designs
Solid understanding of Cloud Computing and DevOps concepts including CI/CD pipelines
Proven experience in implementing advanced observability practices and techniques at scale
Demonstrated ability to utilize modern monitoring tools (Datadog, Prometheus, Splunk)
Proficient communication skills with an ability to reach both technical, non-technical audience
Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships
Help define and execute a comprehensive reliability and observability strategy, ensuring that Fidelity's systems are always available when our customers need them
Bring together technical, procedural, and financial data to reduce toil and increase efficiency
You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers & Production Support Engineers
Troubleshoot stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers

Preferred

Master's degree
AWS Certifications
Ability to automate with various scripting languages (Python, Shell scripting, etc.)
Experience managing systems using infrastructure as code tools (IAM, Terraform)

Benefits

401(k) with company match
Medical, dental, vision and prescription drug coverage
16-week maternity leave & 12-week parental leave
Student loan assistance

Company

Leadership Triangle

twittertwittertwitter
company-logo
Leadership Triangle educates and promotes regionalism across the separate communities.

Funding

Current Stage
Early Stage
Company data provided by crunchbase