Staff Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Altana · 7 hours ago

Staff Site Reliability Engineer

Altana is the network for trusted trade, empowering governments and businesses to build a more resilient and secure global economy. As a Staff Site Reliability Engineer, you will ensure the availability, performance, and scalability of Altana’s critical production services, focusing on cloud-native environments and data pipelines.

Data IntegrationLogisticsSoftwareSupply Chain Management
check
H1B Sponsor Likelynote

Responsibilities

Reliability Engineering: Champion and implement SRE principles, including establishing and monitoring Service Level Objectives (SLOs) and error budgets for critical services. Drive initiatives to improve system reliability, availability, performance, and efficiency
Observability & Monitoring: Design, implement, and maintain advanced monitoring, logging, and tracing solutions for our cloud-native applications and infrastructure (e.g., Kubernetes, microservices). Develop dashboards, alerts, and runbooks that provide deep insights into system health and behavior
Automation & Toil Reduction: Identify and automate repetitive operational tasks and manual processes across our production environment. Develop tools and scripts to enhance system operations, deployment pipelines, and incident response
Incident Management & Postmortems: Actively participate in the incident response lifecycle, including detection, triage, mitigation, and resolution of production issues. Lead thorough blameless postmortems to identify root causes and implement preventative measures and lasting improvements
System Design & Optimization: Collaborate closely with development teams to influence the design of new services, ensuring they are built for operability, reliability, and cost-efficiency. Proactively identify and address performance bottlenecks and architectural weaknesses
On-Call Rotation: Participate in a periodic on-call rotation, responding to critical alerts and ensuring rapid resolution of production incidents
Data Reliability: Implement and maintain reliability and observability for critical data pipelines and data infrastructure, ensuring data integrity, availability, and timely processing

Qualification

Site Reliability EngineeringObservability platformsCloud platformsContainerization technologiesInfrastructure as CodeProgramming/scripting languageIncident managementData engineering conceptsProblem-solving skillsCommunication skillsCollaboration skills

Required

5+ years of hands-on experience in a Site Reliability Engineering (SRE), DevOps, or equivalent role focusing on production system reliability and operations
Strong understanding and practical application of Site Reliability Engineering (SRE) principles, including SLOs, error budgets, toil reduction, and blameless culture
Expertise in designing, implementing, and managing observability platforms for cloud-native environments (e.g., Prometheus, Grafana, Datadog, ELK stack, OpenTelemetry, Jaeger)
Proficiency in at least one programming/scripting language (e.g., Python, Go) for automation and tool development
Extensive hands-on experience with cloud platforms (AWS, Azure, or GCP), including their compute, networking, and database services
Demonstrated experience with containerization technologies (Docker) and container orchestration platforms (Kubernetes)
Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, OpenTofu, CloudFormation) for managing cloud resources
Proven experience participating in and improving incident management processes for critical systems
Knowledge of modern software delivery paradigms, including microservices architectures and CI/CD pipelines
Excellent problem-solving, analytical, and troubleshooting skills in complex distributed systems
Strong communication and collaboration skills, with the ability to work effectively across engineering teams
Experience with data engineering concepts, including building or operating reliable data pipelines, data streaming technologies, or managing large-scale data infrastructure

Benefits

Flexible Time Off
Paid Parental Leave
Health Benefits
Supplemental Benefits
401(k) Savings
Commuter Benefits
Wellness
Pet Insurance
Employee Assistance Program
Dependent Care FSA

Company

Altana

twittertwittertwitter
company-logo
Altana is the only Product Network connecting buyers, suppliers, logistics providers & government agencies across the global supply chain.

H1B Sponsorship

Altana has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8)
2024 (4)
2023 (3)
2022 (1)

Funding

Current Stage
Growth Stage
Total Funding
$322M
Key Investors
US Innovative Technology FundActivate Capital PartnersGoogle Ventures
2024-07-29Series C· $200M
2022-10-03Series B· $100M
2021-09-20Series A· $15M

Leadership Team

leader-logo
Evan Smith
CEO and Co-Founder
linkedin
leader-logo
Peter Swartz
Chief Science Officer and Co-Founder
linkedin
Company data provided by crunchbase