Lead Site Reliability Engineer, AI/ML Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

JPMorganChase · 2 weeks ago

Lead Site Reliability Engineer, AI/ML Platform

JPMorganChase, one of the oldest financial institutions, offers innovative financial solutions to millions of consumers and businesses. They are seeking a Lead Site Reliability Engineer to enhance the reliability and scalability of their AI/ML platforms, ensuring high performance and reliability while mentoring junior engineers.

Asset ManagementBankingFinancial Services
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Design and implement solutions to enhance the reliability and scalability of AI/ML platforms and applications to accommodate fast growing demands
Partner with product engineering teams to ensure the AI/ML systems are reliable and high performing
Develop observability, security, automation and fin-ops tools and orchestration
Provide strategic technology leadership by defining and evaluating standards and architecture for reliability, observability and automation frameworks
Build strong cross-functional relationships that foster engagements across the organization and deliver solutions to user problems
Debug and solve issues in a production environment, identify root cause and remediate
Participates in on-call rotations, incident management and escalation workflows
Take full ownership of problems, develop solutions, and acquire new knowledge to complete the task
Mentor and guide junior engineers

Qualification

SRE principlesCloud platformsObservability toolsDistributed systems architectureIaC toolsProblem-solving skillsCommunicationSelf-motivatedMentoring skills

Required

Bachelor's degree in computer science, Information Technology, or equivalent technical qualification with 5+ years professional experience
Expertise in SRE principles, reliability, scalability and performance of application and infrastructure
Have hands-on experience with cloud platforms (AWS, GCP, Azure) and IaC tools (Terraform, Ansible)
Extensive experience implementing advanced observability using tools like Open Telemetry, Dynatrace, Grafana, and/or cloud-native services
Experience in architecting distributed systems and cloud-native architecture in AWS
Systematic problem-solving and troubleshooting skills in a complex system
Excellent communication skills and ability to represent and present business and technical concepts to stakeholders
Self-managed, self-motivated with strong sense of ownership, urgency, and drive

Preferred

Prior experience working in AI, ML, or Data engineering
Prior experience developing AI Ops/AI Agents
Multi cloud experience (AWS, GCP, Azure) is a plus

Benefits

Comprehensive health care coverage
On-site health and wellness centers
A retirement savings plan
Backup childcare
Tuition reimbursement
Mental health support
Financial coaching

Company

JPMorganChase

company-logo
With a history tracing its roots to 1799 in New York City, JPMorganChase is one of the world's oldest, largest, and best-known financial institutions—carrying forth the innovative spirit of our heritage firms in global operations across 100 markets.

H1B Sponsorship

JPMorganChase has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (3471)
2024 (3469)
2023 (3395)
2022 (3594)
2021 (2515)
2020 (2495)

Funding

Current Stage
Public Company
Total Funding
unknown
1998-02-01IPO

Leadership Team

leader-logo
Allison Beer
CEO of Card Services and Connected Commerce
linkedin
leader-logo
Dan Mendelson
CEO, Morgan Health
linkedin
Company data provided by crunchbase