Technical Product Owner – GPU/TPU | AI/ML Infrastructure | NVIDIA A100/H100 jobs in United States
cer-icon
Apply on Employer Site
company-logo

TestingXperts · 17 hours ago

Technical Product Owner – GPU/TPU | AI/ML Infrastructure | NVIDIA A100/H100

TestingXperts is seeking a Technical Product Owner to lead GPU/TPU monitoring and optimization initiatives for a large global bank's core technology unit. The role involves driving performance improvements, cost reduction, and resource efficiency across AI/ML infrastructure through technical expertise and product ownership.

DevOpsInformation TechnologyPenetration TestingQuality AssuranceSoftwareUsability Testing
check
H1B Sponsor Likelynote
Hiring Manager
Mahesh Kumar
linkedin

Responsibilities

Own product roadmap for GPU/TPU monitoring solutions aligned with AI infrastructure strategy
Define monitoring strategies covering utilization, performance, power, memory, and thermal metrics
Analyze accelerator usage patterns and identify optimization opportunities to improve efficiency
Collaborate with ML engineers to tune workloads and reduce training/inference latency
Implement cost optimization strategies targeting 20-30% reduction through better resource allocation
Manage product backlog, write user stories, and lead agile development cycles
Partner with stakeholders across data science, infrastructure, and business units
Ensure solutions meet banking security, compliance, and governance requirements
Report on KPIs: utilization rates, cost savings, performance improvements, and SLA adherence

Qualification

GPU/TPU infrastructureNVIDIA GPUsMonitoring toolsProduct Owner experienceKubernetesCloud platformsCUDAPyTorchTensorFlowMLOps platform experienceFinOps expertiseLLM optimization knowledgeCommunication skills

Required

5+ years experience with GPU/TPU infrastructure for AI/ML workloads
Deep knowledge of NVIDIA GPUs (A100, H100) and Google TPUs
Proficiency with monitoring tools: DCGM, nvidia-smi, Prometheus, Grafana
Strong understanding of CUDA, PyTorch, TensorFlow, and distributed training
3+ years as Product Owner/Manager, preferably in infrastructure/platform products
Experience with Kubernetes, containerization, and cloud platforms (GCP)
Background in regulated industries (banking/financial services preferred)
Excellent communication skills with ability to bridge technical and business stakeholders

Preferred

Advanced degree in Computer Science or related field
Cloud/ML certifications (GCP ML Engineer, NVIDIA DLI)
MLOps platform experience (MLflow, Kubeflow)
FinOps and cost management expertise
LLM training/inference optimization knowledge

Company

TestingXperts

company-logo
Next Gen QA & Software Testing Company

H1B Sponsorship

TestingXperts has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (10)
2024 (1)
2020 (1)

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Manish Gupta
Founder & CEO
linkedin
leader-logo
Archana Gupta
CFO
linkedin
Company data provided by crunchbase