Principal DevSecOps Engineer - AI Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

IR · 3 weeks ago

Principal DevSecOps Engineer - AI Infrastructure

IR Labs is the innovation lab inside Integrated Research, focused on turning cutting-edge AI research into impactful products. The Principal DevSecOps Engineer will be responsible for building the core infrastructure platform, establishing secure patterns, and driving operational excellence across teams.

Information TechnologyInfrastructureSoftware

Responsibilities

Serve as the founding infrastructure engineer, building the core platform that scales the company and raises the reliability bar
Establish secure, repeatable IaC/GitOps patterns (Terraform/CloudFormation) and automated delivery (GitHub Actions, ArgoCD)
Partner with teams pre-GA on design reviews, capacity planning, and readiness
Define and drive SLIs/SLOs/SLAs and an error-budget culture for services and ops
Eliminate toil with end-to-end automation across provisioning, config, testing, and operations
Co-design platforms with ML, backend, and security to safely power AI/ML workloads
Architect multi-region resilience—backup, DR, and failover—balancing availability, consistency, and cost
Advance observability and incident excellence; make smart bets on emerging infra tools
Codify production engineering standards and coach teams toward operational excellence

Qualification

IaC/GitOpsKubernetes/EKSAWS primitivesObservability toolsGo/Python/RustSecurity fundamentalsSRE practicesClear communicatorMentoring

Required

8+ years operating high-availability, fault-tolerant distributed systems with IaC and GitOps
Strong coding in Go/Python/Rust plus solid shell skills; comfortable extending Kubernetes via CRDs
Deep Kubernetes/EKS expertise; mastery of containerization and service networking
Hands-on with AWS primitives (VPC, EC2, S3, IAM, RDS) and multi-region traffic/failover
Observability pro (Prometheus, Grafana, OpenTelemetry, Fluentd, Jaeger) with strong RCA/incident chops
Security fundamentals: IAM, secrets management, and compliance guardrails (SOC2/HIPAA/GDPR)
Experience building secure, self-service platforms (SDKs/APIs/portals, e.g., Backstage/TypeScript)
Proven SRE practice—SLIs/SLOs, error budgets—and strong testing, reviews, and CI/CD habits
Clear communicator and mentor who thrives in fast-moving environments and collaborates across ML, data, and backend teams

Benefits

Medical, Dental, Vision Insurance
401k with Employer Contributions
Paid Time Off & Birthday Leave
Health Savings Account (HSA) Contributions with High Deductible Health Plan
Short-Term/Long-Term Disability Insurance

Company

IR

twittertwittertwitter
company-logo
IR simplifies the complexity of managing modern communications, payments and infrastructure environments.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Michael Tomkins
Chief Technology Officer
linkedin
Company data provided by crunchbase