Senior Lead Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

JPMorganChase · 7 hours ago

Senior Lead Site Reliability Engineer

JPMorgan Chase is one of the oldest financial institutions, providing innovative financial solutions to a wide range of clients. The Lead Site Reliability Engineer will solve complex business problems, optimize applications and their infrastructure, and drive the adoption of site reliability engineering best practices within the team.

Asset ManagementBankingFinancial Services
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Leads the adoption of site reliability engineering best practices within your team
Production 24•7 support for business-critical applications – be part of rotational on-call support rota

Qualification

Site Reliability EngineeringProgramming LanguagesObservability ToolsContinuous Integration/DeliveryCloud TechnologiesContainer OrchestrationNetworking TechnologiesEvent Streaming PlatformsAgile PracticesCritical Incident ManagementLinux Performance TuningCertificationsCollaborative LeadershipCommunication SkillsMentoringProblem Solving

Required

Formal training or certification in software engineering concepts with 10+ years of applied experience
Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient in at least one programming language such as Python, Java/Spring Boot, and shell scripting
Proficient experience in software engineering and technical processes within a given technology discipline (e.g., Public Cloud, artificial intelligence, etc.)
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, spinnaker, or Terraform – configuration management tools like SaltStack, ansible
Experience in managing, administering and supporting enterprise level large scale Splunk, ELK deployments catering application monitoring and observability to large number of applications
Experience in managing, administering and supporting vendor products such as Netcool, Grafana, SCOM
Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
Experience with troubleshooting performance issues, common networking technologies and issues
Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
Experience with large scale enterprise level event streaming platforms likes Kafka
Experience in handling critical incident and change management – be part of critical incident taskforce call
Familiarity of agile practices – preferably, scrum and Kanban

Preferred

Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Proven track record of initiating and executing ideas that address complex business challenges
Networking and systems
+ Deep understanding of TCP/IP, DNS, load balancing, firewalls, and VPN technologies
+ Proficient tuning Linux performance and troubleshooting system-level issues
Collaborative leadership
+ Proven track record of mentoring junior engineers and promoting SRE best-practice adoption across teams
+ Strong written and verbal communication skills; comfortable presenting to technical and non-technical stakeholders
Certifications (a plus)
+ AWS Certified SysOps Administrator or Professional, Certified Kubernetes Administrator (CKA), terraform associate level or equivalent

Benefits

Comprehensive health care coverage
On-site health and wellness centers
A retirement savings plan
Backup childcare
Tuition reimbursement
Mental health support
Financial coaching

Company

JPMorganChase

company-logo
With a history tracing its roots to 1799 in New York City, JPMorganChase is one of the world's oldest, largest, and best-known financial institutions—carrying forth the innovative spirit of our heritage firms in global operations across 100 markets.

H1B Sponsorship

JPMorganChase has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3471)
2024 (3469)
2023 (3395)
2022 (3594)
2021 (2515)
2020 (2495)

Funding

Current Stage
Public Company
Total Funding
unknown
1998-02-01IPO

Leadership Team

leader-logo
Allison Beer
CEO of Card Services and Connected Commerce
linkedin
leader-logo
Dan Mendelson
CEO, Morgan Health
linkedin
Company data provided by crunchbase