Senior Kubernetes Platform Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

TensorWave · 2 months ago

Senior Kubernetes Platform Engineer

TensorWave is on a mission to build seamless, secure, reliable, and resilient AI infrastructure at scale. They are seeking a Senior Kubernetes Platform Engineer to maintain the stability and reliability of their bare-metal Kubernetes infrastructure, focusing on troubleshooting, incident response, and day-to-day operations across multi-tenant workloads.

AI InfrastructureArtificial Intelligence (AI)Cloud ComputingCloud InfrastructureGenerative AIIaaS

Responsibilities

Own and troubleshoot operational issues within Kubernetes environments
Maintain and monitor core services (e.g., Cilium, HAProxy, Prometheus, etc.)
Ensure uptime, performance, and reliability of multi-tenant clusters
Assist with Ingress/Egress connectivity and network debugging
Support internal and customer teams in secure, isolated VPC environments
Collaborate with senior engineers on automation and cluster lifecycle improvements

Qualification

KubernetesDevOpsLinux infrastructureInfrastructure-as-codeMonitoring toolsNetworkingTroubleshootingCollaborationAdaptability

Required

Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
5+ years experience in DevOps, SRE, or Linux infrastructure roles
4+ years of hands-on experience with Kubernetes in production
Familiarity with networking, CNI plugins, and core Linux troubleshooting
Strong infrastructure-as-code mindset - Helm, Terraform, Ansible
Solid experience with monitoring and logging tools - Prometheus, Grafana, Loki
Understanding of secure infrastructure design principles and least-privilege access

Preferred

Experience with RKE2, Rancher, or similar platforms
Experience troubleshooting or supporting AI or GPU-based workloads
Familiarity with HAProxy, Cilium, or other Kubernetes ingress/networking tools

Benefits

Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Flexible PTO
Paid Holidays
401(k)
Parental Leave
Flexible Spending Account
Short Term Disability Insurance
Life and Voluntary Supplemental Insurance
Mental Health Benefits through Spring Health

Company

TensorWave

twittertwittertwitter
company-logo
TensorWave is an AMD GPU exclusive Cloud that supports training and inference at scale

Funding

Current Stage
Growth Stage
Total Funding
$146.71M
Key Investors
Nexus Venture PartnersFundNV
2025-05-14Series A· $100M
2024-10-08Seed· $43M
2024-04-23Seed· $0.89M

Leadership Team

leader-logo
Darrick Horton
Co-Founder / CEO
linkedin
leader-logo
Piotr Tomasik
Co-Founder, President & COO
linkedin
Company data provided by crunchbase