Senior Production Engineer (REMOTE) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Upbound · 2 months ago

Senior Production Engineer (REMOTE)

Upbound is the company behind Crossplane, leading the shift toward agentic infrastructure. They are hiring a Senior Production Engineer to enhance the reliability and availability of Upbound Cloud, collaborating with engineering and product teams to ensure a performant and scalable platform.

Cloud ComputingInformation ServicesInformation TechnologySoftware

Responsibilities

Contribute to the production engineering strategy for Upbound Cloud, ensuring high availability, scalability, and efficiency of all customer-facing systems. This includes internalizing the product strategy and developing levels of system resiliency to support product growth
Own reliability metrics — including uptime, latency, and error budgets — and champion service-level objectives (SLOs) across teams
Design and implement automation for provisioning, observability, and incident response to minimize human intervention and increase operational maturity
Collaborate with development teams to build reliability into the software lifecycle through proactive architectural reviews, chaos testing, and performance profiling
Operate and improve multi-tenant Kubernetes-based systems, leveraging Crossplane, and other cloud-native tooling
Drive incident management — leading blameless postmortems, root cause analyses, and systemic remediation efforts
Mentor engineers in production engineering practices, fostering a culture of ownership, reliability, and continuous improvement
Contribute to the evolution of our cloud platform through design input, tool selection, and scalable systems thinking

Qualification

KubernetesGoInfrastructure-as-CodeDistributed systemsMonitoringIncident responseCapacity planningChange managementObservabilityCommunicationCollaboration skillsMentoring

Required

5+ years of experience in software, infrastructure, or site reliability engineering roles
Strong background in distributed systems, service-oriented architectures, and cloud-native technologies
Proficiency in Kubernetes, Go, and Infrastructure-as-Code strategies
Expertise in observability and monitoring preferably Honeycomb and OpenTelemetry
Experience managing large-scale SaaS systems in production with multi-region and high-availability requirements
Strong understanding of incident response, capacity planning, and change management
Excellent communication skills and ability to collaborate across functions

Preferred

Experience with Crossplane, multi-cloud infrastructure, or control-plane architectures
Prior leadership experience driving reliability initiatives at scale

Company

Upbound

twittertwittertwitter
company-logo
Upbound is an infrastructure management platform that runs, scales, and optimizes services across multiple cloud environments.

Funding

Current Stage
Growth Stage
Total Funding
$69M
Key Investors
Altimeter CapitalGoogle Ventures
2021-11-29Series B· $60M
2018-05-02Series A· $9M

Leadership Team

leader-logo
Bassam Tabbara
Founder and CEO
linkedin
leader-logo
Sarah Strobhar
Chief Revenue Officer (CRO)
linkedin
Company data provided by crunchbase