Apply on Employer Site

Altana · 7 hours ago

Staff Site Reliability Engineer

San Francisco, CA

Full-time

Hybrid

Senior Level, Lead/Staff

$170K/yr - $220K/yr

5+ years exp

Altana is the network for trusted trade, empowering governments and businesses to build a more resilient and secure global economy. As a Staff Site Reliability Engineer, you will ensure the availability, performance, and scalability of Altana’s critical production services, focusing on cloud-native environments and data pipelines.

Data IntegrationLogisticsSoftwareSupply Chain Management

H1B Sponsor Likely

Responsibilities

Reliability Engineering: Champion and implement SRE principles, including establishing and monitoring Service Level Objectives (SLOs) and error budgets for critical services. Drive initiatives to improve system reliability, availability, performance, and efficiency

Observability & Monitoring: Design, implement, and maintain advanced monitoring, logging, and tracing solutions for our cloud-native applications and infrastructure (e.g., Kubernetes, microservices). Develop dashboards, alerts, and runbooks that provide deep insights into system health and behavior

Automation & Toil Reduction: Identify and automate repetitive operational tasks and manual processes across our production environment. Develop tools and scripts to enhance system operations, deployment pipelines, and incident response

Incident Management & Postmortems: Actively participate in the incident response lifecycle, including detection, triage, mitigation, and resolution of production issues. Lead thorough blameless postmortems to identify root causes and implement preventative measures and lasting improvements

System Design & Optimization: Collaborate closely with development teams to influence the design of new services, ensuring they are built for operability, reliability, and cost-efficiency. Proactively identify and address performance bottlenecks and architectural weaknesses

On-Call Rotation: Participate in a periodic on-call rotation, responding to critical alerts and ensuring rapid resolution of production incidents

Data Reliability: Implement and maintain reliability and observability for critical data pipelines and data infrastructure, ensuring data integrity, availability, and timely processing

Qualification

Site Reliability EngineeringObservability platformsCloud platformsContainerization technologiesInfrastructure as CodeProgramming/scripting languageIncident managementData engineering conceptsProblem-solving skillsCommunication skillsCollaboration skills

Required

5+ years of hands-on experience in a Site Reliability Engineering (SRE), DevOps, or equivalent role focusing on production system reliability and operations

Strong understanding and practical application of Site Reliability Engineering (SRE) principles, including SLOs, error budgets, toil reduction, and blameless culture

Expertise in designing, implementing, and managing observability platforms for cloud-native environments (e.g., Prometheus, Grafana, Datadog, ELK stack, OpenTelemetry, Jaeger)

Proficiency in at least one programming/scripting language (e.g., Python, Go) for automation and tool development

Extensive hands-on experience with cloud platforms (AWS, Azure, or GCP), including their compute, networking, and database services

Demonstrated experience with containerization technologies (Docker) and container orchestration platforms (Kubernetes)

Experience with Infrastructure as Code (IaC) tools (e.g., Terraform, OpenTofu, CloudFormation) for managing cloud resources

Proven experience participating in and improving incident management processes for critical systems

Knowledge of modern software delivery paradigms, including microservices architectures and CI/CD pipelines

Excellent problem-solving, analytical, and troubleshooting skills in complex distributed systems

Strong communication and collaboration skills, with the ability to work effectively across engineering teams

Experience with data engineering concepts, including building or operating reliable data pipelines, data streaming technologies, or managing large-scale data infrastructure

Benefits

Flexible Time Off

Paid Parental Leave

Health Benefits

Supplemental Benefits

401(k) Savings

Commuter Benefits

Wellness

Pet Insurance

Employee Assistance Program

Dependent Care FSA

Company

Altana

Altana is the only Product Network connecting buyers, suppliers, logistics providers & government agencies across the global supply chain.

Founded in 2018

New York, New York, USA

51-200 employees

https://altana.ai

H1B Sponsorship

Altana has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (8)

2024 (4)

2023 (3)

2022 (1)

Funding

Current Stage

Growth Stage

Total Funding

$322M

Key Investors

US Innovative Technology FundActivate Capital PartnersGoogle Ventures

2024-07-29Series C· $200M

2022-10-03Series B· $100M

2021-09-20Series A· $15M

Leadership Team

Evan Smith

CEO and Co-Founder

Peter Swartz

Chief Science Officer and Co-Founder

Recent News

EIN Presswire

AUVSI Enhances Green UAS Program With New Tiers Aligned With Federal Push for Trusted, NDAA-Compliant U.S. Drone Systems

2025-12-09

Techmeme

The US CBP signs a two-year deal with AI logistics company Altana, and will use its platform for real-time trade enforcement, forced labor detection, and more (Mackenzie Weinger/Axios)

2025-10-31

EIN Presswire

AUVSI and Altana Partner to Bolster Supply Chain Security for Green UAS and Blue UAS Programs

2025-10-29

Company data provided by crunchbase