hackajob · 6 hours ago
Principal DevOps Engineer - Hybrid
hackajob is collaborating with BMC Software to connect them with exceptional tech professionals for this role. We are looking for a Principal DevOps Engineer to help build and operate our next-generation Agentic-AI Data Management platform from 0-1, focusing on reliability and automation in production systems.
Artificial Intelligence (AI)Generative AIHuman ResourcesRecruitingSoftware
Responsibilities
Design, build, and operate the core cloud and Kubernetes-based platform that underpins a 0-1 data automation and management product, taking infrastructure and operational capabilities from concept through production
Write production-grade automation in Python, Go, or similar languages to eliminate manual work across provisioning, deployment, scaling, monitoring, and incident response
Design and evolve Kubernetes-based platforms using Docker, Helm, and cloud-native services, balancing speed of delivery with long-term operability and cost control
Establish and enforce SRE best practices including SLIs/SLOs, alerting strategies, error budgets, incident management, and post-incident reviews to ensure enterprise-grade reliability
Build and maintain robust CI/CD pipelines (e.g., GitHub Actions, Jenkins) to support frequent, safe, and repeatable deployments across multiple environments
Manage cloud environments in accordance with company security guidelines, embedding security, compliance, and access controls directly into infrastructure and pipelines
Build and maintain internal tools, services, and automation that support deployment, observability, debugging, and operational excellence while reducing human error
Support deployments across AWS including integrations with enterprise systems and geographically redundant, highly available services
Work closely with product engineering teams to design operable systems, influence architectural decisions, and ensure production realities inform development choices early
Act with strong ownership: identify operational gaps, propose pragmatic solutions, and move work forward without waiting for perfect requirements or ideal conditions
Qualification
Required
Experience in building and operating cloud and Kubernetes-based platforms
Hands-on experience with automation in Python, Go, or similar languages
Knowledge of Docker, Helm, and cloud-native services
Experience with SRE best practices including SLIs/SLOs, alerting strategies, error budgets, incident management, and post-incident reviews
Experience in building and maintaining CI/CD pipelines (e.g., GitHub Actions, Jenkins)
Ability to manage cloud environments in accordance with security guidelines
Experience in building and maintaining internal tools, services, and automation for operational excellence
Experience with AWS deployments and integrations with enterprise systems
Strong collaboration skills with product engineering teams
Demonstrated ownership and ability to identify operational gaps and propose solutions
Company
hackajob
The AI-native tech hiring platform trusted by enterprises, scale-ups, and 1M+ tech professionals worldwide.
Funding
Current Stage
Growth StageTotal Funding
$33MKey Investors
Volition CapitalDowning VenturesTechstars
2023-05-03Series B· $25M
2018-10-25Series A· $6.7M
2017-03-31Seed· $0.58M
Recent News
2025-10-23
2025-09-26
2025-09-12
Company data provided by crunchbase