Vercept · 4 months ago
Backend Engineer – Inference Optimization
Vercept is a high-energy, impact-driven team known for its academic excellence and transformative research in AI. They are seeking a Backend Engineer – Inference Optimization to design and optimize inference pipelines for large-scale models, collaborating with researchers and infrastructure engineers to enhance AI performance.
Artificial Intelligence (AI)Computer
Responsibilities
Own the design and optimization of inference pipelines for large-scale models
Work closely with researchers and infrastructure engineers to identify bottlenecks
Implement advanced techniques like quantization and KV caching
Deploy high-performance serving systems in production
Qualification
Required
Deep experience in optimizing model inference pipelines, model quantization and KV caching
Proficiency in backend systems and high-performance programming (Python, C++, or Rust)
Familiarity with distributed serving, GPU acceleration, and large-scale systems
Ability to debug complex performance issues across model, runtime, and hardware layers
Comfort working in fast-moving environments with ambitious technical goals
Preferred
Hands-on experience with vLLM or similar inference frameworks
Background in GPU kernel optimization (CUDA, Triton, ROCm)
Experience scaling inference across multi-node or heterogeneous clusters
Prior work in model compilation (e.g., TensorRT, TVM, ONNX Runtime)
Hands-on experience with model quantization
Benefits
Health benefits
A 401(k) plan
Meaningful equity
Company
Vercept
Vercept is a software development company.
Funding
Current Stage
Early StageTotal Funding
$16M2025-06-04Seed· $16M
Recent News
Company data provided by crunchbase