Apply on Employer Site

Shutterfly · 2 months ago

Senior Site Reliability Engineer II

Plano, TX

Full-time

Onsite

Senior Level

$106K/yr - $151K/yr

5+ years exp

Shutterfly is a company that helps customers create products and capture moments reflecting their unique selves. They are looking for a Senior Site Reliability Engineer II to ensure the reliability, availability, and performance of their consumer systems while collaborating with development and operations teams.

GiftHome DecorInternet

H1B Sponsor Likely

Responsibilities

Perform advanced performance analysis and troubleshooting across distributed systems to ensure optimal availability, scalability, and cost efficiency

Implement and maintain monitoring, alerting, and observability solutions to provide proactive visibility into application and infrastructure health

Partner with development teams to influence service design and architecture so that new features meet high standards for reliability and scalability

Participate in incident response, including root cause analysis and long-term reliability improvements

Contribute to capacity planning, cost optimization, and performance tuning of large-scale systems

Build and maintain automation and tooling that reduces manual effort, accelerates delivery, and minimizes human error

Explore and apply AI/ML technologies (e.g., anomaly detection, predictive scaling, automated alerting) to enhance SRE practices

Share expertise with peers by documenting best practices, solutions, and troubleshooting methodologies

Collaborate across infrastructure, development, and business teams to align on standards and reliability goals

Provide technical depth and decisive action during critical incidents

Qualification

Performance troubleshootingDistributed system optimizationProgramming languagesObservability platformsAWS servicesInfrastructure as CodeIncident managementAI/ML technologies’s degreeEffective communication

Required

5-7+ years of experience in software engineering, SRE, or DevOps roles supporting large-scale, highly available systems

Strong skills in performance troubleshooting, root cause analysis, and distributed system optimization

Proficiency in at least one programming language (Python, Go, Java, or similar) with ability to write production-quality code

Hands-on experience with observability platforms (e.g., Splunk, Datadog, SignalFx, Prometheus, OpenTelemetry)

Strong knowledge of AWS services, cloud deployment models, and cost optimization strategies

Experience with Infrastructure as Code (Terraform, CloudFormation) and configuration management (Ansible, Chef, Puppet)

Solid understanding of distributed systems concepts (scalability, high availability, fault tolerance)

Experience in incident management and driving operational improvements

Effective communication skills with ability to work across engineering and business teams

Bachelor's degree in Computer Science, Engineering, or equivalent experience

Preferred

Exposure to AI/ML or AIOps tools for anomaly detection, predictive analytics, or automated incident response (preferred but not required)

Benefits

Bonus incentive

Health benefits

401K program

Other employee perks

Company

Shutterfly

Glassdoor3.2

Shutterfly is a photography company provides products and services to preserve their memories and sharing stories.

Founded in 1999

Redwood City, California, USA

10001+ employees

http://www.shutterflyinc.com

H1B Sponsorship

Shutterfly has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (36)

2024 (45)

2023 (59)

2022 (83)

2021 (53)

2020 (50)