Apply on Employer Site

Virtue AI · 3 weeks ago

Software Engineering – Inference Engineer

San Francisco, CA

Full-time

Onsite

Mid Level

$200K/yr - $500K/yr

Virtue AI is an early-stage startup focused on advanced AI security platforms. As an Inference Engineer, you will be responsible for optimizing and serving machine learning models in production, ensuring they are fast, stable, and cost-efficient.

Artificial Intelligence (AI)Information TechnologySoftware

H1B Sponsor Likely

Responsibilities

Serve and optimize LLM, embedding, and other ML models' inference across multiple model families

Design and operate inference APIs with clear contracts, versioning, and backward compatibility

Build routing and load-balancing logic for inference traffic

Package inference services into production-ready Docker images

Implement logging and metrics for inference systems

Analyze server uptime and failure modes

Design GPU and model placement strategies

Work closely with backend, platform (Cloud, DevOps), and ML teams to align inference behavior with product requirements

Qualification

Serving LLMsInference APIs designLoad balancingDockerPrometheus metricsStructured loggingGPU behavior understandingAutoscaling servicesKubernetes GPU schedulingDebugging inference failuresFast-paced environment

Required

Bachelor's degree or higher in CS, CE, or related field

Strong experience serving LLMs and embedding models in production

Hands-on experience designing: Inference APIs, Load balancing and routing logic

Experience with SGLang, vLLM, TensorRT, or similar inference frameworks

Strong understanding of GPU behavior: Memory limits, batching, fragmentation, utilization

Experience with: Docker, Prometheus metrics, Structured logging

Ability to debug and fix real inference failures in production

Experience with autoscaling inference services

Familiarity with Kubernetes GPU scheduling

Experience supporting production systems with real SLAs

Proven ability to debug and fix inference failures in production

Comfortable operating in a fast-paced startup environment with high ownership

Preferred

Experience with GPU-level optimization

Memory planning and reuse

Kernel launch efficiency

Reducing fragmentation and allocator overhead

Experience with kernel- or runtime-level optimization

CUDA kernels, Triton kernels, or custom ops

Experience with model-level inference optimization

Quantization (FP8 / INT8 / BF16)

KV-cache optimization

Speculative decoding or batching strategies

Experience pushing inference efficiency boundaries (latency, throughput, or cost)

Benefits

Competitive base salary compensation + equity commensurate with skills and experience.

Impact at scale – Help define the category of AI security and partner with Fortune 500 enterprises on their most strategic AI initiatives.

Work on the frontier – Engage with bleeding-edge AI/ML and deploy AI security solutions for use cases that don't yet exist anywhere else yet.

Collaborative culture – Join a team of builders, problem-solvers, and innovators who are mission-driven and collaborative.

Opportunity for growth – Shape not only our customer engagements, but also the processes and culture of an early lean team with plans for scale.

Company

Virtue AI

Virtue AI operates as an AI security and compliance platform for applications.

Founded in 2024

San Francisco, California, USA

11-50 employees

https://www.virtueai.com/

H1B Sponsorship

Virtue AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

Funding

Current Stage

Early Stage

Total Funding

$30M

2025-04-15Series A· $30M

2025-04-15Seed

Leadership Team

Bo Li

CEO

Carlos Guestrin

Co-Founder & Chief Scientist

Recent News

Financial IT

Five Rising Fintech Stars Take Center Stage at Money20/20 USA

2025-10-31

Business Wire

Money20/20 USA 2025 Closes With a Future-Ready Fintech Ecosystem: AI, Stablecoins, and Infrastructure Take the Lead

2025-10-31

Company data provided by crunchbase