HPC-Kubernetes Solutions Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

INSPYR Solutions · 3 months ago

HPC-Kubernetes Solutions Architect

INSPYR Solutions is a national expert in delivering flexible technology and talent solutions. They are seeking an HPC Kubernetes Solutions Architect who will guide customers in designing and integrating GPU-accelerated Kubernetes platforms for high-performance computing, AI/ML training, and scientific workloads.

Information TechnologyProfessional ServicesStaffing Agency
badNo H1BnoteU.S. Citizen Onlynote

Responsibilities

Act as the primary architectural point of contact for customers adopting GPU-accelerated Kubernetes platforms for HPC and AI/ML workloads
Partner with customers to capture workload requirements, performance objectives, scaling needs, and integration constraints, translating them into reference architectures and actionable solution designs
Architect and operate Kubernetes clusters optimized for GPU workloads, leveraging NVIDIA
Integrate and tune Multi-Instance GPU (MIG), GPU sharing, and scheduler extensions (e.g., Volcano, Slurm integration, kube-scheduler plugins) to maximize efficiency in multi-tenant environments
Develop or extend custom Kubernetes operators and controllers in Go/Python to automate HPC infrastructure services
Design and recommend secure multi-tenant Kubernetes environments, implementing RBAC, OPA/Gatekeeper policies, namespace isolation, and workload quotas
Lead proof-of-concept and benchmarking engagements, using profiling tools, workload characterization, and telemetry to validate solution performance and scalability
Define and document integration strategies across compute, storage, networking, and orchestration layers, including CNI plugins (NVIDIA CNI, Multus, Cilium), storage systems (Lustre, GPFS, Ceph, VAST), and container runtimes (containerd, NVIDIA Container Toolkit)
Drive observability and monitoring solutions with Prometheus, Grafana, DCGM Exporter, and OpenTelemetry, ensuring visibility into GPU health, cluster utilization, and workload performance
Support GitOps-driven CI/CD pipelines for Kubernetes infrastructure using ArgoCD, FluxCD, Helm, and Kustomize
Collaborate with HPC, ML, and DevOps teams to validate performance and scalability in hybrid or on-premise environments
Provide architectural leadership during onboarding and deployment, ensuring successful integration of Kubernetes clusters with HPC schedulers and enterprise IT systems
Build and maintain strategic relationships with ecosystem vendors (e.g., NVIDIA, Cisco, storage partners), incorporating emerging technologies into customer environments
Share future insights with customers on GPU roadmaps, interconnect advancements (e.g., InfiniBand, RoCE, NVLink), and container orchestration trends
Represent the organization in customer design sessions, technical workshops, and industry conferences, positioning yourself as a thought leader in Kubernetes for HPC

Qualification

Kubernetes architectureNVIDIA GPU stackKubernetes internalsHigh-performance networkingGoPythonWorkload profilingCustomer engagementCollaborative mindset

Required

Extensive experience in Kubernetes architecture and operations for HPC or GPU-intensive environments
Strong technical expertise in NVIDIA GPU stack (GPU Operator, device plugins, MIG, NVML, DCGM)
Kubernetes internals (CRDs, RBAC, scheduler extensions, custom operators/controllers)
Distributed and parallel storage integration with Kubernetes for HPC workloads
High-performance networking (InfiniBand, RDMA, RoCE) in containerized environments
Proven ability to design scalable, secure, and resilient Kubernetes-based architectures for HPC and AI/ML use cases
Proficiency in Go or Python for Kubernetes operator or controller development
Experience with workload profiling, benchmarking, and performance tuning
Strong customer engagement skills, capable of translating requirements into actionable architectures and presenting solutions effectively
Collaborative mindset with experience working across engineering, product, and operations teams

Preferred

Demonstrated success in end-to-end customer solution delivery, from requirements discovery to deployment and adoption
Familiarity with containerized HPC environments (e.g., Singularity/Apptainer)
Exposure to automation and GitOps practices for Kubernetes platform management (e.g., ArgoCD, FluxCD)
Contributions to open-source projects in the Kubernetes or NVIDIA ecosystem
Experience advising on future adoption strategies, helping customers prepare for emerging GPU, interconnect, and orchestration technologies
Bachelor's or Master's degree in Computer Science, Engineering, Physics, or related technical field
Relevant Kubernetes and container certifications such as CKA, CKAD, or CKS, alongside cloud certifications like AWS Solutions Architect or Azure Solutions Architect Expert

Company

INSPYR Solutions

twittertwitter
company-logo
INSPYR Solutions is a information technology staffing service providers.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Gregg Straus
Executive Vice President & Chief Financial Officer
linkedin
leader-logo
Michelle Wren
Chief Operating Officer
linkedin
Company data provided by crunchbase