SIGN IN
Senior Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Transcend · 15 hours ago

Senior Site Reliability Engineer

Transcend is building a privacy platform that integrates privacy into technology stacks. They are seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of their privacy infrastructure while collaborating with various teams to improve system resilience and incident management practices.
Cloud InfrastructureCompliancePrivacy
badNo H1Bnote

Responsibilities

Lead reliability-focused design and readiness reviews for new and existing services, ensuring production readiness, clear rollout and rollback strategies, and strong observability for every launch
Build, operate, and continuously improve our observability stack (e.g., logging, metrics, tracing) to provide meaningful dashboards, alerts, and runbooks that enable fast, high-quality incident response across engineering teams
Own and evolve incident management practices, including on-call participation, incident response processes, and post-incident reviews that drive long-term remediation and learning across teams
Plan and execute disaster recovery exercises and game days to validate our resilience posture, test failover and backup strategies, and systematically reduce single points of failure
Perform capacity planning and cost optimization for our cloud infrastructure, helping ensure we run a cost-effective environment that meets performance and availability goals as usage grows
Identify and drive down systemic reliability risks across application, infrastructure, and process layers—owning cross-team projects that significantly reduce incident frequency and severity over time
Collaborate closely with Developer Experience, Security, and product engineering to embed reliability best practices—testing, rollout patterns, guardrails, and "golden paths"—into shared tools and CI/CD pipelines
Participate in and help continuously improve the on-call rotation, using real incidents and near-misses to prioritize automation, better alerting, and clearer documentation

Qualification

Site Reliability EngineeringAWSInfrastructure-as-CodeObservability SystemsIncident ManagementProgramming LanguageCI/CD ToolingDisaster RecoveryCapacity PlanningData Privacy RegulationsTechnical CertificationsCommunication SkillsCollaboration Skills

Required

5+ years of experience in Site Reliability Engineering, Production Engineering, Infrastructure Engineering, or a closely related role, including hands-on ownership of production systems
Strong experience operating modern cloud infrastructure, ideally on AWS, including core services for compute, networking, storage, and security primitives
Proficiency with at least one programming language used at Transcend (e.g., JavaScript, Typescript, or Python), and comfort reading and reviewing application code for reliability and performance concerns
Hands-on experience with infrastructure-as-code and CI/CD tooling (e.g., Terraform, CloudFormation, or similar; modern build/deploy pipelines) to reliably provision and change infrastructure
Deep familiarity with observability and monitoring systems (e.g., Datadog or equivalent), including designing alerts that balance coverage and noise to avoid alert fatigue while protecting customer experience
Proven track record running incident response and post-incident analysis, including root cause identification, clear documentation, and driving follow-through on remediation work
Excellent communication and collaboration skills, with experience working across multiple engineering teams to align on reliability goals, share context, and influence technical direction without formal authority
Comfort participating in an on-call rotation, and experience helping to design or improve on-call processes, runbooks, and escalation paths
Minimum level of education: Bachelor's degree in Computer Science, Engineering, Information Systems, or a related technical field, or equivalent practical experience
Demonstrated ability to thrive in a remote-first, high-autonomy environment, managing priorities, communicating asynchronously, and driving projects to completion with limited oversight

Preferred

Experience working in a high-growth B2B SaaS environment, ideally on security, data, or privacy-focused products
Experience designing, operating, and tuning Docker-based or serverless architectures on AWS or another major cloud provider
Familiarity with data privacy regulations and practices (e.g., GDPR, CPRA) and how they inform system design, reliability expectations, and incident response requirements
Experience defining and rolling out SRE frameworks such as SLOs/SLIs, error budgets, incident management processes, and production-readiness checklists across multiple teams
Experience working closely with Developer Experience / Platform teams to create paved roads, tooling, and documentation that make it easy for product teams to build reliable services by default
Relevant technical certifications (e.g., AWS Certified Solutions Architect or DevOps Engineer, CKA/CKAD, or equivalent SRE training) are a plus

Benefits

Flexible PTO
Parental leave
A 401(k) match
A competitive compensation packages that include employee equity

Company

Transcend

twittertwitter
company-logo
Transcend is the compliance layer for customer data, enabling enterprises to activate AI responsibly and at scale.

Funding

Current Stage
Growth Stage
Total Funding
$68.95M
Key Investors
StepStone GroupAccel,Index VenturesAccel
2024-05-28Series B· $40M
2020-06-10Series A· $25M
2019-04-04Seed· $3.95M

Leadership Team

leader-logo
Ben Brook
Co-Founder, CEO @ Transcend
linkedin
leader-logo
Michael Farrell
Co-Founder and CTO
linkedin
Company data provided by crunchbase