Senior Site Reliability Engineer II jobs in United States
cer-icon
Apply on Employer Site
company-logo

Shutterfly · 2 months ago

Senior Site Reliability Engineer II

Shutterfly is a company that helps customers create products and capture moments reflecting their unique selves. They are looking for a Senior Site Reliability Engineer II to ensure the reliability, availability, and performance of their consumer systems while collaborating with development and operations teams.

GiftHome DecorInternet
check
H1B Sponsor Likelynote

Responsibilities

Perform advanced performance analysis and troubleshooting across distributed systems to ensure optimal availability, scalability, and cost efficiency
Implement and maintain monitoring, alerting, and observability solutions to provide proactive visibility into application and infrastructure health
Partner with development teams to influence service design and architecture so that new features meet high standards for reliability and scalability
Participate in incident response, including root cause analysis and long-term reliability improvements
Contribute to capacity planning, cost optimization, and performance tuning of large-scale systems
Build and maintain automation and tooling that reduces manual effort, accelerates delivery, and minimizes human error
Explore and apply AI/ML technologies (e.g., anomaly detection, predictive scaling, automated alerting) to enhance SRE practices
Share expertise with peers by documenting best practices, solutions, and troubleshooting methodologies
Collaborate across infrastructure, development, and business teams to align on standards and reliability goals
Provide technical depth and decisive action during critical incidents

Qualification

Performance troubleshootingDistributed system optimizationProgramming languagesObservability platformsAWS servicesInfrastructure as CodeIncident managementAI/ML technologies’s degreeEffective communication

Required

5-7+ years of experience in software engineering, SRE, or DevOps roles supporting large-scale, highly available systems
Strong skills in performance troubleshooting, root cause analysis, and distributed system optimization
Proficiency in at least one programming language (Python, Go, Java, or similar) with ability to write production-quality code
Hands-on experience with observability platforms (e.g., Splunk, Datadog, SignalFx, Prometheus, OpenTelemetry)
Strong knowledge of AWS services, cloud deployment models, and cost optimization strategies
Experience with Infrastructure as Code (Terraform, CloudFormation) and configuration management (Ansible, Chef, Puppet)
Solid understanding of distributed systems concepts (scalability, high availability, fault tolerance)
Experience in incident management and driving operational improvements
Effective communication skills with ability to work across engineering and business teams
Bachelor's degree in Computer Science, Engineering, or equivalent experience

Preferred

Exposure to AI/ML or AIOps tools for anomaly detection, predictive analytics, or automated incident response (preferred but not required)

Benefits

Bonus incentive
Health benefits
401K program
Other employee perks

Company

Shutterfly

company-logo
Shutterfly is a photography company provides products and services to preserve their memories and sharing stories.

H1B Sponsorship

Shutterfly has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (36)
2024 (45)
2023 (59)
2022 (83)
2021 (53)
2020 (50)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Sally Pofcher
Chief Executive Officer
linkedin
leader-logo
Zac Bauman
Chief of Staff to CFO at Shutterfly
linkedin
Company data provided by crunchbase