SIGN IN
Backend Engineer – Inference Optimization jobs in United States
cer-icon
Apply on Employer Site
company-logo

Vercept · 4 months ago

Backend Engineer – Inference Optimization

Vercept is a high-energy, impact-driven team known for its academic excellence and transformative research in AI. They are seeking a Backend Engineer – Inference Optimization to design and optimize inference pipelines for large-scale models, collaborating with researchers and infrastructure engineers to enhance AI performance.
Artificial Intelligence (AI)Computer

Responsibilities

Own the design and optimization of inference pipelines for large-scale models
Work closely with researchers and infrastructure engineers to identify bottlenecks
Implement advanced techniques like quantization and KV caching
Deploy high-performance serving systems in production

Qualification

Model inference optimizationBackend systems programmingQuantization techniquesKV cachingDistributed servingGPU accelerationLarge-scale systemsDebugging performance issuesFast-moving environmentsVLLMSimilar frameworksGPU kernel optimizationScaling inferenceModel compilation

Required

Deep experience in optimizing model inference pipelines, model quantization and KV caching
Proficiency in backend systems and high-performance programming (Python, C++, or Rust)
Familiarity with distributed serving, GPU acceleration, and large-scale systems
Ability to debug complex performance issues across model, runtime, and hardware layers
Comfort working in fast-moving environments with ambitious technical goals

Preferred

Hands-on experience with vLLM or similar inference frameworks
Background in GPU kernel optimization (CUDA, Triton, ROCm)
Experience scaling inference across multi-node or heterogeneous clusters
Prior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)
Hands-on experience with model quantization

Benefits

Health benefits
A 401(k) plan
Meaningful equity

Company

Vercept

twittertwittertwitter
company-logo
Vercept is a software development company.

Funding

Current Stage
Early Stage
Total Funding
$16M
2025-06-04Seed· $16M
Company data provided by crunchbase