TensorWave · 4 months ago
Principal Kubernetes Platform Engineer
Tensorwave is focused on building secure and resilient AI infrastructure at scale. They are seeking a Principal Platform Engineer to lead the design, development, and deployment of a next-generation Kubernetes platform, ensuring production excellence and supporting millions of users.
AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureGenerative AIIaaS
Responsibilities
Architect and implement end-to-end Kubernetes infrastructure for large-scale, cloud-native applications
Design and build serverless platforms on top of Kubernetes using technologies such as Knative, OpenFaaS, or KEDA
Develop and maintain Kubernetes custom resources (CRDs), controllers, operators, and admission controllers in Go or Python
Define multi-tenant, multi-region architecture supporting millions of users with high availability and low latency
Lead Kubernetes cluster lifecycle management - provisioning, upgrades, scaling, monitoring, troubleshooting
Collaborate closely with engineering teams to containerize applications, write Helm charts or Kustomize overlays, and standardize deployment practices
Implement infrastructure as code using tools like Terraform, Pulumi, or Crossplane
Lead efforts around observability, policy enforcement, cost optimization, and RBAC/security hardening within the cluster
Evaluate and integrate Kubernetes ecosystem tools - Istio/Linkerd, ArgoCD, Flux, Prometheus, Grafana, OPA
Mentor and upskill DevOps engineers and SREs in Kubernetes best practices
Qualification
Required
Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
8+ years of experience in cloud infrastructure, DevOps, or platform engineering roles
8+ years of hands-on Kubernetes experience, including deep knowledge of the Kubernetes API, internals, networking, and storage
Proficiency in writing Kubernetes manifests, Helm charts, and custom Kubernetes controllers/operators
Proven experience designing cloud-native systems that scale globally - multi-region, multi-cloud or hybrid setups
Experience with serverless technologies in production - Knative, OpenFaaS, AWS Lambda
Strong knowledge of cloud platforms such as AWS, GCP, or Azure
Experience with GitOps tools - ArgoCD, Flux
Deep understanding of security, compliance, and resilience in containerized workloads
Preferred
Contributions to Kubernetes open-source projects or CNCF-related tooling
Experience with service mesh design (Istio, Linkerd)
Familiarity with eBPF, Cilium, or network-level observability
Background in building PaaS or developer platforms on top of Kubernetes
Benefits
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Flexible PTO
Paid Holidays
401(k)
Parental Leave
Flexible Spending Account
Short Term Disability Insurance
Life and Voluntary Supplemental Insurance
Mental Health Benefits through Spring Health
Company
TensorWave
TensorWave is an AMD GPU exclusive Cloud that supports training and inference at scale
Funding
Current Stage
Growth StageTotal Funding
$146.71MKey Investors
Nexus Venture PartnersFundNV
2025-05-14Series A· $100M
2024-10-08Seed· $43M
2024-04-23Seed· $0.89M
Recent News
ReviewJournal
2025-12-19
2025-11-05
Company data provided by crunchbase