Altana · 22 hours ago
Staff Cloud Engineer
Altana is the network for trusted trade, empowering governments and businesses to build a more resilient and secure global economy. The Staff Cloud Engineer will work closely with developers and data scientists to ensure the availability and performance of critical production services across cloud-native environments and data pipelines.
Data IntegrationLogisticsSoftwareSupply Chain Management
Responsibilities
Observability & Monitoring: Design, implement, and maintain comprehensive observability solutions across the platform stack, including metrics, logging, tracing, and alerting using modern tools (Prometheus, Grafana, Datadog, OpenTelemetry). Develop dashboards and runbooks that provide deep insights into system health and behavior
Internal Developer Platforms: Build and maintain internal developer platforms using infrastructure as code (Terraform) to enable self-service provisioning across multi-cloud environments (AWS, Azure)
Automation & CI/CD: Design and implement automation pipelines for infrastructure provisioning, application deployments, and operational tasks using GitLab CI/CD, GitHub Actions, or similar tools
Kubernetes & Container Platforms: Develop and maintain Kubernetes platforms including writing Helm charts, managing cluster operations, implementing pod security policies, and optimizing resource utilization
Reliability Engineering: Champion SRE principles including establishing and monitoring Service Level Objectives (SLOs) and error budgets for critical services. Drive initiatives to improve system reliability, availability, performance, and efficiency
Platform Abstractions: Create platform abstractions and tooling that enable development teams to deploy and operate services independently while maintaining security and compliance standards
Security & Compliance: Build and maintain secure container images and deployment pipelines with automated security scanning, vulnerability management, and compliance checks. Support deployments in highly regulated customer environments
Incident Management: Participate in incident response lifecycle including detection, triage, mitigation, and resolution. Lead blameless postmortems to identify root causes and implement preventative measures
Toil Reduction: Automate operational tasks to reduce toil and improve system reliability through scripting, tooling development, and process improvement
Collaboration & Mentorship: Collaborate with engineering teams to understand their needs and translate them into platform capabilities. Mentor team members on cloud best practices, platform patterns, and automation techniques
On-Call Rotation: Participate in a periodic on-call rotation, responding to critical alerts and ensuring rapid resolution of production incidents
Qualification
Required
5+ years of experience building developer platforms, infrastructure automation, or cloud infrastructure in a production environment
Expertise in designing, implementing, and managing observability platforms for cloud-native environments (e.g., Prometheus, Grafana, Datadog, ELK stack, OpenTelemetry, Jaeger)
Strong understanding and practical application of SRE principles, including SLOs, error budgets, toil reduction, and blameless culture
Production experience building and operating environments in AWS and/or Azure
Strong Infrastructure as Code skills with Terraform, OpenTofu, or similar tools
Hands-on Kubernetes experience including cluster management, application deployments, and operational maintenance
Proficiency in at least one programming/scripting language (e.g., Python, Go) for automation and tool development
Proven experience participating in and improving incident management processes for critical systems
Knowledge of modern software delivery paradigms, including microservices architectures and CI/CD pipelines
Excellent problem-solving, analytical, and troubleshooting skills in complex distributed systems
Strong written and verbal communication skills, comfortable working with technical teams to understand requirements and design solutions
Track record of delivering platform capabilities that improved team productivity or system reliability
Care deeply about developer experience, automation, security, and operational excellence
Preferred
Experience at a startup or high-growth technology company
Experience with GitOps workflows (ArgoCD, Flux)
Familiarity with securing information systems and compliance frameworks (FedRAMP, IRAP, SOC 2)
Experience with service mesh technologies (Istio, Linkerd)
Experience with data engineering concepts, including building or operating reliable data pipelines, data streaming technologies, or managing large-scale data infrastructure
BS or MS degree in Computer Science, or equivalent experience
Benefits
Flexible Time Off
Paid Parental Leave
Health Benefits
Supplemental Benefits
401(k) Savings
Commuter Benefits
Wellness
Pet Insurance
Employee Assistance Program
Dependent Care FSA
Company
Altana
Altana is the only Product Network connecting buyers, suppliers, logistics providers & government agencies across the global supply chain.
H1B Sponsorship
Altana has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (8)
2024 (4)
2023 (3)
2022 (1)
Funding
Current Stage
Growth StageTotal Funding
$322MKey Investors
US Innovative Technology FundActivate Capital PartnersGoogle Ventures
2024-07-29Series C· $200M
2022-10-03Series B· $100M
2021-09-20Series A· $15M
Recent News
Company data provided by crunchbase