Apply on Employer Site

Contextual AI · 1 month ago

Member of Technical Staff (Research Engineer - LLM Systems & Performance)

Mountain View, CA

Full-time

Onsite

Mid Level

$170K/yr - $200K/yr

Contextual AI is revolutionizing how AI Agents work by focusing on the critical challenge of context. As a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will build and optimize LLM systems and collaborate with researchers to enhance AI performance and infrastructure.

Artificial Intelligence (AI)Foundational AIGenerative AISoftware

H1B Sponsor Likely

Responsibilities

Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation

Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations

Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks

Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters

Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference

Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate

Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production

Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products)

Qualification

PythonPyTorchGPU computingDistributed trainingPerformance engineeringCollaborationCommunicationFast-paced environment

Required

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience)

Strong programming skills in Python

Experience with at least one major ML framework: PyTorch or JAX

Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.)

Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching)

Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency

Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers

Benefits

Equity

Benefits

Company

Contextual AI

Contextual AI specializes in customizable generative AI applications with RAG 2.0 technology for the banking and media industries.

Founded in 2023

Mountain View, California, USA

51-200 employees

https://contextual.ai

H1B Sponsorship

Contextual AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (16)

2024 (5)

Funding

Current Stage

Growth Stage

Total Funding

$100M

Key Investors

GreycroftBain Capital Ventures

2024-08-01Series A· $80M

2023-06-07Seed· $20M

Leadership Team

Douwe Kiela

CEO / Co-Founder

Recent News

PR Newswire

Contextual AI's State-of-the-Art Reranker Coming to Snowflake Cortex AI

2025-06-03

The New Stack

No, MCP Hasn’t Killed RAG — in Fact, They’re Complementary

2025-05-29

Silicon Canals

These are the richest young self-made Dutch tech millionaires in 2025, according to Quote

2025-05-29

Company data provided by crunchbase