Member of Technical Staff (Research Engineer - LLM Systems & Performance) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Contextual AI · 1 month ago

Member of Technical Staff (Research Engineer - LLM Systems & Performance)

Contextual AI is revolutionizing how AI Agents work by focusing on the critical challenge of context. As a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will build and optimize LLM systems and collaborate with researchers to enhance AI performance and infrastructure.

Artificial Intelligence (AI)Foundational AIGenerative AISoftware
check
H1B Sponsor Likelynote

Responsibilities

Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation
Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations
Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks
Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters
Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference
Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate
Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production
Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products)

Qualification

PythonPyTorchGPU computingDistributed trainingPerformance engineeringCollaborationCommunicationFast-paced environment

Required

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience)
Strong programming skills in Python
Experience with at least one major ML framework: PyTorch or JAX
Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.)
Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching)
Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency
Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers

Benefits

Equity
Benefits

Company

Contextual AI

twittertwitter
company-logo
Contextual AI specializes in customizable generative AI applications with RAG 2.0 technology for the banking and media industries.

H1B Sponsorship

Contextual AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (5)

Funding

Current Stage
Growth Stage
Total Funding
$100M
Key Investors
GreycroftBain Capital Ventures
2024-08-01Series A· $80M
2023-06-07Seed· $20M

Leadership Team

leader-logo
Douwe Kiela
CEO / Co-Founder
linkedin
Company data provided by crunchbase