Apply on Employer Site

d-Matrix · 20 hours ago

Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference

Santa Clara, CA

Internship

Hybrid

Intern

d-Matrix is focused on unleashing the potential of generative AI to transform technology. They are seeking a motivated and innovative Machine Learning Intern to develop a dynamic Key-Value (KV) cache solution for Large Language Model (LLM) inference, aimed at enhancing memory utilization and execution efficiency on D-Matrix hardware.

AI InfrastructureArtificial Intelligence (AI)Cloud InfrastructureData CenterSemiconductor

H1B Sponsor Likely

Responsibilities

Research and analyze existing KV-Cache implementations used in LLM inference, particularly those utilizing lists of past-key-values PyTorch tensors

Investigate “Paged Attention” mechanisms that leverage dedicated CUDA data structures to optimize memory for variable sequence lengths

Design and implement a torch-native dynamic KV-Cache model that can be integrated seamlessly within PyTorch

Model KV-Cache behavior within the PyTorch compute graph to improve compatibility with torch.compile and facilitate the export of the compute graph

Conduct experiments to evaluate memory utilization and inference efficiency on D-Matrix hardware

Qualification

PyTorchCUDA programmingPythonDeep learning conceptsModel optimizationAnalytical mindsetMemory managementData structuresHardware optimizationCreative problem solving

Required

Currently pursuing a degree in Computer Science, Electrical Engineering, Machine Learning, or a related field

Familiarity with PyTorch and deep learning concepts, particularly regarding model optimization and memory management

Understanding of CUDA programming and hardware-accelerated computation (experience with CUDA is a plus)

Strong programming skills in Python, with experience in PyTorch

Analytical mindset with the ability to approach problems creatively

Preferred

Experience with deep learning model inference optimization

Knowledge of data structures used in machine learning for memory and compute efficiency

Experience with hardware-specific optimization, especially on custom hardware like D-Matrix, is an advantage

Company

d-Matrix

D-Matrix is a platform that enables data centers to handle large-scale generative AI inference with high throughput and low latency.

Founded in 2019

Santa Clara, California, USA

51-200 employees

https://www.d-matrix.ai

H1B Sponsorship

d-Matrix has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (20)

2024 (15)

2023 (8)

2022 (7)

Funding

Current Stage

Growth Stage

Total Funding

$429M

Key Investors

Temasek HoldingsTSVC

2025-11-12Series C· $275M

2023-09-06Series B· $110M

2022-04-20Series A· $44M

Leadership Team

Peter Buckingham

Senior Vice President, Software Engineering

Recent News

EE Times

As Demand for Fast AI Tokens Grows, D-Matrix Develops Fast NIC

2026-01-24

The New Stack

Why d-Matrix bets on in-memory compute to break the AI inference bottleneck

2026-01-22

Crowdfund Insider

AI Adoption Trends : 2025 Saw Emergence of 1000+ Agentic AI Offerings

2025-12-22

Company data provided by crunchbase