Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference jobs in United States
cer-icon
Apply on Employer Site
company-logo

d-Matrix · 20 hours ago

Machine Learning Intern - Dynamic KV-Cache Modeling for Efficient LLM Inference

d-Matrix is focused on unleashing the potential of generative AI to transform technology. They are seeking a motivated and innovative Machine Learning Intern to develop a dynamic Key-Value (KV) cache solution for Large Language Model (LLM) inference, aimed at enhancing memory utilization and execution efficiency on D-Matrix hardware.

AI InfrastructureArtificial Intelligence (AI)Cloud InfrastructureData CenterSemiconductor
check
H1B Sponsor Likelynote

Responsibilities

Research and analyze existing KV-Cache implementations used in LLM inference, particularly those utilizing lists of past-key-values PyTorch tensors
Investigate “Paged Attention” mechanisms that leverage dedicated CUDA data structures to optimize memory for variable sequence lengths
Design and implement a torch-native dynamic KV-Cache model that can be integrated seamlessly within PyTorch
Model KV-Cache behavior within the PyTorch compute graph to improve compatibility with torch.compile and facilitate the export of the compute graph
Conduct experiments to evaluate memory utilization and inference efficiency on D-Matrix hardware

Qualification

PyTorchCUDA programmingPythonDeep learning conceptsModel optimizationAnalytical mindsetMemory managementData structuresHardware optimizationCreative problem solving

Required

Currently pursuing a degree in Computer Science, Electrical Engineering, Machine Learning, or a related field
Familiarity with PyTorch and deep learning concepts, particularly regarding model optimization and memory management
Understanding of CUDA programming and hardware-accelerated computation (experience with CUDA is a plus)
Strong programming skills in Python, with experience in PyTorch
Analytical mindset with the ability to approach problems creatively

Preferred

Experience with deep learning model inference optimization
Knowledge of data structures used in machine learning for memory and compute efficiency
Experience with hardware-specific optimization, especially on custom hardware like D-Matrix, is an advantage

Company

d-Matrix

twittertwittertwitter
company-logo
D-Matrix is a platform that enables data centers to handle large-scale generative AI inference with high throughput and low latency.

H1B Sponsorship

d-Matrix has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (20)
2024 (15)
2023 (8)
2022 (7)

Funding

Current Stage
Growth Stage
Total Funding
$429M
Key Investors
Temasek HoldingsTSVC
2025-11-12Series C· $275M
2023-09-06Series B· $110M
2022-04-20Series A· $44M

Leadership Team

leader-logo
Peter Buckingham
Senior Vice President, Software Engineering
linkedin
Company data provided by crunchbase