Apply on Employer Site

PayPal · 5 hours ago

Sr Site Reliability Engineer

Scottsdale, Arizona, United States of America

Full-time

Hybrid

Senior Level

$112K/yr - $166K/yr

5+ years exp

PayPal has been revolutionizing commerce globally for more than 25 years. They are seeking a Sr Site Reliability Engineer to lead and manage the response to complex incidents, ensuring system performance and reliability while driving continuous improvement initiatives. The role involves directing teams during incidents and enhancing operational efficiencies through advanced automation frameworks.

E-Commerce PlatformsFinTechMobile PaymentsTransaction Processing

H1B Sponsor Likely

Responsibilities

Take ownership of system performance monitoring, identify inefficiencies, and lead initiatives to improve the overall availability and reliability of digital platforms and applications

Lead and manage the response to complex, high-priority incidents, ensuring prompt resolution and a thorough root cause analysis to prevent future occurrences

Design and implement advanced automation frameworks to improve operational efficiency, streamline processes, and reduce human error

Lead reliability-focused initiatives, ensuring systems are highly available, resilient, and scalable, and promote best practices across engineering teams

Enhance the monitoring infrastructure by identifying key metrics, optimizing alerting, and improving system observability to ensure the reliability of large-scale systems

Forecast resource requirements and lead capacity planning activities to ensure systems can scale effectively to meet growing user demand

Ensure robust disaster recovery strategies are in place and conduct regular testing to ensure systems can recover quickly from failures

Partner with engineering and product teams to identify opportunities for improving system architecture, focusing on scalability, reliability, and fault tolerance

Provide mentorship and technical guidance to junior site reliability engineers, fostering skill development and knowledge sharing

Drive continuous improvement across operational workflows, identifying areas for optimization, cost reduction, and performance enhancement

Proactively identify and address vulnerabilities in cloud (AWS, GCP, Azure) and on-premises infrastructure

Review Infrastructure as Code changes for reliability risks as part of change approval process

Identify architectural anti-patterns in Kubernetes deployments and cloud migrations

Conduct regular disaster recovery drills and readiness tests before major events (Thanksgiving, Cyber 5, peak shopping seasons)

Participate in situation room activities for new product rollouts

Drive site resilience projects to enhance system reliability and uptime

Serve as incident commander with final decision authority -- directing engineering teams, authorizing rollbacks, and commanding regional failovers

Direct application and infrastructure teams during incidents by making work assignments and prioritizing troubleshooting paths

Rapidly assess incidents by reading Infrastructure as Code (Terraform, CloudFormation), Kubernetes manifests, and CI/CD configurations

Give final authorization for critical actions including production rollbacks, regional failovers, and emergency changes

Interface with executive leadership during critical incidents and post-mortems to provide technical guidance and impact assessments

Identify when incidents stem from teams deviating from established cloud-native patterns

Command cross-functional teams during high-severity incidents affecting PayPal core and brand platforms (Venmo, Xoom, Zettle, Braintree)

Lead blameless postmortem sessions and contribute to Root Cause Analysis (RCA) processes

Drive continuous improvement initiatives based on incident learnings

Serve as the primary technical escalation point during critical incidents

Accelerate incident response times through standardized playbooks and automated workflows

Coordinate cross-functional teams during high-severity incidents affecting PayPal core and brand platforms (Venmo, Xoom, Zettle, Braintree)

Lead blameless postmortem sessions and contribute to Root Cause Analysis (RCA) processes

Drive continuous improvement initiatives based on incident learnings

Manage multiple concurrent incidents during peak periods with efficiency and precision

Serve as final approver for emergency changes and provide expert guidance on all production changes

Act as advisor and technical authority during change approval processes, identifying potential reliability risks

Provide training and guidance to engineering teams on change management best practices

Maintain change audit documentation and compliance requirements

Review and approve changes to production systems, ensuring comprehensive risk assessment

Automate change validation and rollback procedures to minimize service disruptions

Streamline change management processes to reduce manual errors and bottlenecks

Provide training and guidance to engineering teams on change management best practices

Maintain change audit documentation and compliance requirements

Leverage deep expertise in cloud platforms (AWS, GCP, Azure) to drive incident resolution

Support Braintree and Venmo cloud infrastructure operations

Guide teams toward solutions by providing architectural direction during incidents

Stay current with emerging cloud technologies and best practices

Mentor team members on cloud technologies and incident management techniques

Implement automation, dashboards, and tooling to enhance the team's incident response capabilities

Build runbooks and playbooks for cloud-native incident scenarios

Develop internal tools and scripts to improve TDO operational efficiency

Drive projects that advance the Command Center's operational capabilities

Qualification

Cloud platforms AWSCloud platforms GCPCloud platforms AzureInfrastructure as Code (IaC)KubernetesIncident managementMonitoring tools SplunkMonitoring tools DatadogScripting PythonScripting BashDisaster recovery strategiesTechnical leadershipCommunication skillsProblem-solving abilitiesCollaboration skillsDocumentation skills

Required

3+ years relevant experience and a Bachelor's degree OR Any equivalent combination of education and experience

Significant hands-on experience with at least one major cloud provider (AWS or GCP required; multi-cloud experience preferred)

Strong proficiency with Infrastructure as Code tools (Terraform, CloudFormation, Pulumi, or equivalent) including ability to read, review, and troubleshoot IaC configurations during incidents

Significant hands-on experience with Kubernetes and CNCF ecosystem tools, including troubleshooting K8s deployments, manifests, and cluster issues

Ability to quickly read and review code across multiple languages (Python, Go, Bash) and configuration formats (YAML, HCL, JSON) essential for effective incident troubleshooting

Proven experience managing critical incidents in Infrastructure-as-Code driven environments, including troubleshooting IaC state issues, GitOps failures, and cloud-native deployment problems

Professional-level certification in at least one major cloud platform (AWS Solutions Architect Professional, Google Cloud Professional Cloud Architect, or equivalent)

Experience with monitoring and observability tools (Splunk, Datadog, Prometheus, Grafana)

Strong knowledge of networking, load balancing, CDN technologies, and DNS management

Proficiency in scripting for operational automation (Python, Bash, PowerShell)

5+ years of experience in site reliability engineering, infrastructure operations, or similar technical operations roles

Strong expertise in cloud platforms (AWS, GCP, and/or Azure)

Proficiency in infrastructure automation tools (Terraform, Ansible, CloudFormation, etc.)

Deep understanding of distributed systems, microservices architecture, and containerization (Docker, Kubernetes)

Strong knowledge of networking, load balancing, CDN technologies, and DNS management

Proficiency in scripting languages (Python, Bash, PowerShell, etc.)

Exceptional communication skills with ability to articulate complex technical issues to both technical and non-technical stakeholders

Executive presence and ability to effectively communicate with senior leadership during high-pressure incidents and post-mortems

Strong analytical and problem-solving abilities with a systematic approach to troubleshooting

Ability to remain calm under pressure and make critical decisions during incidents

Excellent collaboration skills with experience working across global, cross-functional teams

Strong documentation skills and attention to detail

Preferred

Experience with multiple cloud providers (AWS + GCP + Azure)

Broader toolset expertise across multiple IaC tools, CI/CD platforms, or GitOps solutions

Experience with payment processing systems or fintech platforms

ITIL Foundation or higher certification

Background in security operations or compliance (PCI-DSS, SOC 2, etc.)

Experience mentoring or leading technical teams

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)

Certifications in cloud platforms (AWS Solutions Architect, Google Cloud Professional, Azure Administrator, etc.)

Experience with infrastructure as code and GitOps practices

Benefits

Medical, dental, vision, life and disability insurance

Parental and family leave

401(k) savings plan

Paid time off

Flexible work environment

Employee shares options

Health and life insurance

Company

PayPal

Glassdoor3.8

PayPal is a financial service company that provides online payment solutions to its users worldwide.

Founded in 1998

San Jose, California, USA

10001+ employees

https://www.paypal.com/home

H1B Sponsorship

PayPal has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1144)

2024 (917)

2023 (775)

2022 (921)

2021 (1051)

2020 (1049)

Funding

Current Stage

Public Company

Total Funding

$12.17B

Key Investors

Kohlberg Kravis RobertsBlueRun Ventures

2025-11-17Post Ipo Debt· $6.95B

2023-06-07Post Ipo Debt· $5B

2015-07-20IPO

Leadership Team

Simon Bladon

PayPal UK CEO

chaloem khompitoon

President & CEO

Recent News

Finovate

Fintech Rundown: A Rapid Review of Weekly News

2026-01-14

PYMNTS.com

AI Rewrites Lending for Borrowers FICO Scores Miss

2026-01-13

The Real Deal

PayPal inks deal for Hudson Square office

2026-01-13

Company data provided by crunchbase