Engineering Manager, HPC Kubernetes Platform jobs in United States
cer-icon
Apply on Employer Site
company-logo

NorthMark Strategies · 3 months ago

Engineering Manager, HPC Kubernetes Platform

NorthMark Strategies LLC is a company focused on enhancing high-performance computing and cloud infrastructure. They are seeking an experienced Engineering Manager for their HPC Kubernetes Platform to lead a team in designing and scaling their bare-metal Kubernetes environment, ensuring performance, reliability, and automation for machine-learning and HPC workloads.

AdviceFinancial ServicesVenture Capital
badNo H1Bnote

Responsibilities

Lead and mentor engineers designing and scaling NMC²’s bare-metal Kubernetes platform for HPC and ML workloads
Architect and optimize GPU/CPU scheduling, resource management, and performance across multi-tenant compute clusters
Drive automation and observability using Infrastructure-as-Code, CI/CD, and SRE best practices
Collaborate with Research, Storage, and Network teams to integrate distributed filesystems, high-speed interconnects (InfiniBand, RoCE), and custom runtimes
Partner with hardware and software vendors to improve tooling, influence product roadmaps, and streamline deployment
Oversee platform reliability, capacity forecasting, and performance KPIs across thousands of nodes

Qualification

KubernetesHPCLinux systemsAutomationNetworkingResource managementObservability toolsStakeholder managementMulti-tenant clustersOpen-source contributionsCommunication skillsTechnical leadership

Required

7+ years in infrastructure, platform, or SRE engineering, including 2+ in technical leadership
Proven experience operating Kubernetes environments tailored for HPC or ML training workloads—GPU scheduling, resource isolation, and workload optimization
Deep knowledge of Linux systems, networking, and performance engineering on bare-metal hardware
Experience managing large-scale, multi-tenant clusters and integrating distributed storage or high-speed networking
Strong automation experience (Terraform, Ansible, or similar) and familiarity with observability tools (Prometheus, Grafana, Loki)
Excellent communication and stakeholder management skills; ability to translate complex technical direction into clear, actionable plans
Bachelor's Degree or equivalent experience
Must be legally authorized to work in the United States without the need for employer sponsorship, now or at any time in the future

Preferred

Familiarity with HPC schedulers (Slurm, Flux) and container runtimes (containerd, CRI-O)
Contributions to open-source Kubernetes or ML infrastructure projects

Benefits

Company-Paid Lunch Stipend: Lunch is provided via GrubHub
Company-Paid Benefits: 100% Employer-Paid Medical in our High Deductible Health Plan, Dental and Vision benefits for employees and their families, 16 weeks of Paid Parental Leave, Employee Assistance Program, Life insurance, Short-Term Disability and Long-Term Disability
401(k): Company will match 100% of your contributions up to 6%
Optional Employee-Paid Benefits: Medical insurance in our PPO plan and a variety of other benefits such as Health Savings Accounts (with Company Contribution!), Flexible Spending Accounts, Supplemental Life Insurance, Wellhub and more.
Time Off: 25 days of Paid Time Off plus 12 company holidays

Company

NorthMark Strategies

twittertwitter
company-logo
NorthMark Strategies is a multi-strategy investment firm managing diverse portfolios and offering advisory services across sectors.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase