Upbound · 2 months ago
Senior Software Engineer (REMOTE)
Upbound is redefining how modern infrastructure is built and is hiring a Senior Software Engineer to help build and operate Upbound Spaces. This role involves scaling Upbound to support multiple control planes and extending enterprise control plane management and operations.
Cloud ComputingInformation ServicesInformation TechnologySoftware
Responsibilities
Actively build and operate Upbound Spaces in production, troubleshooting and resolving issues across multi-tenant SaaS environments, as well as contributing to Upbound's open-source projects, including Crossplane
Take ownership of building features in high demand by Upbound's customers and deliver new functionality that will delight and amaze our users
Investigate and debug complex issues in customer environments, including multi-control plane scenarios, resource reconciliation problems, and performance bottlenecks
Communicate through thoughtful and thorough design documents for new initiatives and detailed post-incident reviews that drive system improvements
Support the full project lifecycle for highly scalable and reliable services running in a cloud environment – discovery, analysis, architecture, design, review, documentation, building, migration, automation, deployment, production-readiness, and ongoing operational support
Write and maintain Go code that interfaces with the Kubernetes API, such as operators, controllers, add-ons, etc., with a focus on observability, debuggability, and operational excellence
Deploy, manage, and troubleshoot our Kubernetes services in production, using metrics, logs, and traces to identify and resolve issues quickly
Build and maintain operational tooling for debugging customer environments, analyzing control plane health, and automating incident response
Author documentation, user guides, runbooks, and blog posts to support and promote new features that you release
Support the software release cycle for Spaces self-hosted distributions, including diagnosing issues in customer-managed deployments
Participate in on-call rotation to support Upbound Cloud, responding to incidents and driving them to resolution
Qualification
Required
Experience operating production cloud services at scale: monitoring, alerting, incident response, post-mortems, and continuous improvement of service reliability
Strong debugging skills across distributed systems, including experience with observability tools (Prometheus, Grafana, OpenTelemetry, distributed tracing) and techniques for diagnosing issues in production environments
Experience building and operating controllers that interact with the Kubernetes API server, including troubleshooting reconciliation loops, managing API rate limits, and optimizing controller performance
Comfortable working directly with customers to understand, reproduce, and resolve complex technical issues in their environments
Take responsibility and ownership for solving problems even if they are outside your lane, especially during incidents affecting customer workloads
Demonstrate excellence in your work, constantly trying to improve your skills and the operational posture of the systems you build
Empathy for customers and keep them in mind as you build solutions, understanding that reliability and debuggability are features
Realize the importance of clear communication and effective collaboration to work as a team, deliver great results, and support customers through technical challenges
Help create a safe environment where everyone can contribute, learn from failures, share on-call knowledge, and help each other grow as operators and engineers
Company
Upbound
Upbound is an infrastructure management platform that runs, scales, and optimizes services across multiple cloud environments.
Funding
Current Stage
Growth StageTotal Funding
$69MKey Investors
Altimeter CapitalGoogle Ventures
2021-11-29Series B· $60M
2018-05-02Series A· $9M
Recent News
The Motley Fool
2026-01-06
2025-11-07
Company data provided by crunchbase