Software Engineering – Inference Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Virtue AI · 3 weeks ago

Software Engineering – Inference Engineer

Virtue AI is an early-stage startup focused on advanced AI security platforms. The Inference Engineer will be responsible for optimizing and serving models in production, ensuring they are fast, stable, and cost-efficient.

Artificial Intelligence (AI)Information TechnologySoftware
check
H1B Sponsor Likelynote

Responsibilities

Serve and optimize LLM, embedding, and other ML models' inference across multiple model families
Design and operate inference APIs with clear contracts, versioning, and backward compatibility
Build routing and load-balancing logic for inference traffic
Package inference services into production-ready Docker images
Implement logging and metrics for inference systems
Analyze server uptime and failure modes
Design GPU and model placement strategies
Work closely with backend, platform (Cloud, DevOps), and ML teams to align inference behavior with product requirements

Qualification

Serving LLMsInference APIs designLoad balancingDockerPrometheus metricsGPU behavior understandingAutoscaling servicesKubernetes GPU schedulingStructured loggingDebugging inference failuresFast-paced environment

Required

Bachelor's degree or higher in CS, CE, or related field
Strong experience serving LLMs and embedding models in production
Hands-on experience designing: Inference APIs, Load balancing and routing logic
Experience with SGLang, vLLM, TensorRT, or similar inference frameworks
Strong understanding of GPU behavior: Memory limits, batching, fragmentation, utilization
Experience with: Docker, Prometheus metrics, Structured logging
Ability to debug and fix real inference failures in production
Experience with autoscaling inference services
Familiarity with Kubernetes GPU scheduling
Experience supporting production systems with real SLAs
Proven ability to debug and fix inference failures in production
Comfortable operating in a fast-paced startup environment with high ownership

Preferred

Experience with GPU-level optimization: Memory planning and reuse, Kernel launch efficiency, Reducing fragmentation and allocator overhead
Experience with kernel- or runtime-level optimization: CUDA kernels, Triton kernels, or custom ops
Experience with model-level inference optimization: Quantization (FP8 / INT8 / BF16), KV-cache optimization, Speculative decoding or batching strategies
Experience pushing inference efficiency boundaries (latency, throughput, or cost)

Benefits

Competitive salary + equity

Company

Virtue AI

twittertwittertwitter
company-logo
Virtue AI operates as an AI security and compliance platform for applications.

H1B Sponsorship

Virtue AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)

Funding

Current Stage
Early Stage
Total Funding
$30M
2025-04-15Series A· $30M
2025-04-15Seed

Leadership Team

leader-logo
Bo Li
CEO
linkedin
leader-logo
Carlos Guestrin
Co-Founder & Chief Scientist
linkedin
Company data provided by crunchbase