Contextual AI · 1 month ago
Member of Technical Staff (Research Engineer - LLM Systems & Performance)
Contextual AI is revolutionizing how AI Agents work by focusing on the critical challenge of context. As a Member of Technical Staff specializing in Research Engineer – LLM Systems & Performance, you will build and optimize LLM systems and collaborate with researchers to enhance AI performance and infrastructure.
Artificial Intelligence (AI)Foundational AIGenerative AISoftware
Responsibilities
Implement and improve components of our SFT and RL training pipelines (e.g., Verl, SkyRL), including data loading, training loops, logging, and evaluation
Contribute to LLM inference infrastructure (e.g., vLLM, SGLang), including batching, KV-cache management, scheduling, and serving optimizations
Profile and optimize end-to-end performance (throughput, latency, compute/memory/bandwidth), using tools like Nsight and profilers to identify and fix bottlenecks
Work with distributed training and inference setups using NCCL, NVLink, and data/tensor/pipeline/expert/context parallelism on multi-GPU clusters
Help experiment with and productionize quantization (e.g., INT8, FP8, FP4, mixed-precision) for both training and inference
Write and optimize GPU kernels using tools like CUDA or Triton, and leverage techniques such as FlashAttention and Tensor Cores where appropriate
Collaborate with researchers to take ideas from paper → prototype → scaled experiments → production
Write clean, well-tested, and well-documented code that can be shared across multiple teams (Research, Platform and Products)
Qualification
Required
Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related technical field (or equivalent practical experience)
Strong programming skills in Python
Experience with at least one major ML framework: PyTorch or JAX
Solid understanding of GPU computing fundamentals (threads/warps/blocks, memory hierarchy, bandwidth vs compute, etc.)
Familiarity with distributed training or inference concepts (e.g., model parallelism, collective communication, disaggregated serving, KV caching)
Interest in performance engineering: profiling, kernel fusion, memory layout, and end-to-end system efficiency
Ability to work in a fast-paced environment, communicate clearly, and collaborate closely with other engineers and researchers
Benefits
Equity
Benefits
Company
Contextual AI
Contextual AI specializes in customizable generative AI applications with RAG 2.0 technology for the banking and media industries.
H1B Sponsorship
Contextual AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (5)
Funding
Current Stage
Growth StageTotal Funding
$100MKey Investors
GreycroftBain Capital Ventures
2024-08-01Series A· $80M
2023-06-07Seed· $20M
Recent News
2025-06-03
2025-05-29
Silicon Canals
2025-05-29
Company data provided by crunchbase