Be an early applicantLess than 25 applicants

Company

NVIDIA · 5 days ago

Senior On-Device Model Inference Optimization Engineer

United States

Full-time

Remote

Senior Level, Lead/Staff

$220K/yr - $339K/yr

10+ years exp

Maximize your interview chances

Artificial Intelligence (AI)GPU

Growth Opportunities

H1B Sponsor Likely

Insider Connection @NVIDIA

Discover valuable connections within the company who might provide insights and potential referrals.
Get 3x more responses when you reach out via email instead of LinkedIn.

Responsibilities

Develop and implement strategies to optimize AI model inference for on-device deployment.

Employ techniques like pruning, quantization, and knowledge distillation to minimize model size and computational demands.

Optimize performance-critical components using CUDA and C++.

Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.

Benchmark inference performance, identify bottlenecks, and implement solutions.

Research and apply innovative methods for inference optimization.

Adapt models for diverse hardware platforms and operating systems with varying capabilities.

Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.

Recommend and implement model architecture changes to improve the accuracy-latency balance.

Qualification

Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.

AI model inference optimizationCUDAC++PythonPyTorchONNXTensorRTQuantizationPruningKnowledge distillationCloud-based inference systemsNeural architecture searchOpen-source contributionsReal-time systemsCollaboration skills

Required

MSc or PhD in Computer Science, Engineering, or a related field, or equivalent professional experience.

Over 5 years of confirmed experience specializing in model inference and optimization.

10+ years of work experience in a relevant area.

Expertise in modern machine learning frameworks, particularly PyTorch, ONNX, and TensorRT.

Proven experience in optimizing inference for transformer and convolutional architectures.

Strong programming proficiency in CUDA, Python, and C++.

In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.

Skilled in building and deploying scalable, cloud-based inference systems.

Passionate about developing efficient, production-ready solutions with a strong focus on code quality and performance.

Meticulous attention to detail, ensuring precision and reliability in safety-critical systems.

Strong collaboration and communication skills for working optimally across multidisciplinary teams.

A proactive, diligent mentality with a drive to tackle complex optimization challenges.

Preferred

Publications or industry experience in optimizing and deploying model inference at scale.

Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.

Active contributions to open-source projects focused on inference optimization or machine learning frameworks.

Experience in designing and deploying inference pipelines for real-time or autonomous systems.

Benefits

Equity

Benefits

Company

NVIDIA

Glassdoor

4.6

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.

Founded in 1993

Santa Clara, California, USA

10001+ employees

https://www.nvidia.com

H1B Sponsorship

NVIDIA has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2023 (735)

2022 (892)

2021 (696)

2020 (534)