Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Stord · 11 hours ago

Staff Site Reliability Engineer

Stord is The Consumer Experience Company, powering seamless checkout through delivery for today's leading brands. They are seeking a Staff Site Reliability Engineer to lead architecture decisions, implement infrastructure as code, and enhance system reliability and performance within their production systems.

E-CommerceFreight ServiceLogisticsSaaSSupply Chain Management
check
H1B Sponsor Likelynote

Responsibilities

Lead architecture decisions to deliver scalable and reliable infrastructure, primarily on Google Cloud Platform (GCP)
Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, Pulumi, or similar
Manage containerized environments with Docker and Kubernetes
Drive system performance tuning, capacity planning, and resource optimization
Define and maintain Service Level Objectives (SLOs) and Indicators (SLIs)
Build robust monitoring, alerting, and observability solutions using Prometheus, Grafana, DataDog, or New Relic
Develop and maintain disaster recovery and business continuity strategies
Design and maintain CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, etc.)
Automate operational workflows and infrastructure provisioning
Implement configuration management with Ansible, Chef, Puppet, or similar tools
Develop custom tooling and scripts to enhance operational efficiency
Partner with engineering teams to improve deployment practices and application reliability
Provide escalation support for production incidents and lead post-incident reviews
Conduct technical design reviews and offer architectural guidance
Mentor junior engineers on SRE and infrastructure best practices
Participate in on-call rotations for critical systems

Qualification

Google Cloud PlatformInfrastructure as CodeContainerizationMonitoring/ObservabilityCI/CD PipelinesPythonKubernetesTerraformNetworkingTroubleshootingIncident ManagementCommunicationProblem-solving

Required

8+ years of experience in site reliability, platform engineering, or infrastructure roles with leadership exposure
Proficiency in at least one programming language (Python, Go, Java, etc.)
Strong hands-on experience with GCP and its core services
Expertise in containerization (Docker) and orchestration (Kubernetes)
Deep knowledge of Infrastructure as Code (Terraform, CloudFormation, etc.)
Skilled in monitoring/observability (Prometheus, Grafana, ELK, etc.)
Solid understanding of networking, load balancing, and distributed systems
Experience with Git and collaborative development workflows
Exceptional troubleshooting and problem-solving abilities
Strong grasp of system design principles and scalability patterns
Experience with incident management and post-mortem practices
Familiarity with security best practices and compliance standards
Excellent communication skills and ability to work cross-functionally

Preferred

Database administration experience (PostgreSQL, MySQL, Redis, etc.)
Familiarity with event-driven systems and platforms (Kafka, Pub/Sub, etc.)
Experience with log aggregation tools (ELK, Splunk, Fluentd)
Exposure to chaos engineering and resilience testing
Performance testing and optimization experience
Relevant GCP certifications (Cloud Architect, Cloud DevOps Engineer)
Knowledge of GCP-specific services (Cloud Run, GKE, Cloud Functions, BigQuery, etc.)
Experience with multi-cloud or hybrid architectures
Background in functional programming (Elixir, Haskell, F#, Clojure, etc.)
Strong DevOps background and mindset

Company

Stord is a commerce enablement platform that operates a distributed fulfillment network for various brands across multiple channels.

H1B Sponsorship

Stord has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (5)
2024 (4)
2023 (2)
2022 (7)
2021 (2)
2020 (2)

Funding

Current Stage
Late Stage
Total Funding
$525.04M
Key Investors
Strike CapitalFranklin TempletonKleiner Perkins
2025-05-15Series E· $80M
2025-05-15Debt Financing· $120M
2022-05-03Series D· $120M

Leadership Team

leader-logo
Sean Henry
Founder & CEO
linkedin
leader-logo
Jacob Boudreau
Co-Founder / CTO
linkedin
Company data provided by crunchbase