Shutterfly · 2 months ago
Senior Site Reliability Engineer II
Shutterfly is a company that helps customers create products and capture moments reflecting their unique selves. They are looking for a Senior Site Reliability Engineer II to ensure the reliability, availability, and performance of their consumer systems while collaborating with development and operations teams.
GiftHome DecorInternet
Responsibilities
Perform advanced performance analysis and troubleshooting across distributed systems to ensure optimal availability, scalability, and cost efficiency
Implement and maintain monitoring, alerting, and observability solutions to provide proactive visibility into application and infrastructure health
Partner with development teams to influence service design and architecture so that new features meet high standards for reliability and scalability
Participate in incident response, including root cause analysis and long-term reliability improvements
Contribute to capacity planning, cost optimization, and performance tuning of large-scale systems
Build and maintain automation and tooling that reduces manual effort, accelerates delivery, and minimizes human error
Explore and apply AI/ML technologies (e.g., anomaly detection, predictive scaling, automated alerting) to enhance SRE practices
Share expertise with peers by documenting best practices, solutions, and troubleshooting methodologies
Collaborate across infrastructure, development, and business teams to align on standards and reliability goals
Provide technical depth and decisive action during critical incidents
Qualification
Required
5-7+ years of experience in software engineering, SRE, or DevOps roles supporting large-scale, highly available systems
Strong skills in performance troubleshooting, root cause analysis, and distributed system optimization
Proficiency in at least one programming language (Python, Go, Java, or similar) with ability to write production-quality code
Hands-on experience with observability platforms (e.g., Splunk, Datadog, SignalFx, Prometheus, OpenTelemetry)
Strong knowledge of AWS services, cloud deployment models, and cost optimization strategies
Experience with Infrastructure as Code (Terraform, CloudFormation) and configuration management (Ansible, Chef, Puppet)
Solid understanding of distributed systems concepts (scalability, high availability, fault tolerance)
Experience in incident management and driving operational improvements
Effective communication skills with ability to work across engineering and business teams
Bachelor's degree in Computer Science, Engineering, or equivalent experience
Preferred
Exposure to AI/ML or AIOps tools for anomaly detection, predictive analytics, or automated incident response (preferred but not required)
Benefits
Bonus incentive
Health benefits
401K program
Other employee perks
Company
Shutterfly
Shutterfly is a photography company provides products and services to preserve their memories and sharing stories.
H1B Sponsorship
Shutterfly has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (36)
2024 (45)
2023 (59)
2022 (83)
2021 (53)
2020 (50)
Funding
Current Stage
Late StageRecent News
2025-12-22
2025-10-23
Company data provided by crunchbase