Virtue AI · 3 weeks ago
Software Engineering – Inference Engineer
Virtue AI is an early-stage startup focused on advanced AI security platforms, seeking passionate builders to join their core team. As an Inference Engineer, you will be responsible for optimizing and serving machine learning models in production, ensuring efficiency and stability under various workloads.
Artificial Intelligence (AI)Information TechnologySoftware
Responsibilities
Serve and optimize LLM, embedding, and other ML models' inference across multiple model families
Design and operate inference APIs with clear contracts, versioning, and backward compatibility
Build routing and load-balancing logic for inference traffic
Multi-model routing
Fallback and degradation strategies
VLLM or SGLang
Package inference services into production-ready Docker images
Implement logging and metrics for inference systems
Latency, throughput, token counts, GPU utilization
Prometheus-based metrics
Analyze server uptime and failure modes
GPU OOMs, hangs, slowdowns, fragmentation
Recovery and restart strategies
Design GPU and model placement strategies
Model sharding, replication, and batching
Tradeoffs between latency, cost, and availability
Work closely with backend, platform (Cloud, DevOps), and ML teams to align inference behavior with product requirements
Qualification
Required
Bachelor's degree or higher in CS, CE, or related field
Strong experience serving LLMs and embedding models in production
Hands-on experience designing: Inference APIs, Load balancing and routing logic
Experience with SGLang, vLLM, TensorRT, or similar inference frameworks
Strong understanding of GPU behavior: Memory limits, batching, fragmentation, utilization
Experience with: Docker, Prometheus metrics, Structured logging
Ability to debug and fix real inference failures in production
Experience with autoscaling inference services
Familiarity with Kubernetes GPU scheduling
Experience supporting production systems with real SLAs
Proven ability to debug and fix inference failures in production
Comfortable operating in a fast-paced startup environment with high ownership
Preferred
Experience with GPU-level optimization: Memory planning and reuse, Kernel launch efficiency, Reducing fragmentation and allocator overhead
Experience with kernel- or runtime-level optimization: CUDA kernels, Triton kernels, or custom ops
Experience with model-level inference optimization: Quantization (FP8 / INT8 / BF16), KV-cache optimization, Speculative decoding or batching strategies
Experience pushing inference efficiency boundaries (latency, throughput, or cost)
Benefits
Competitive base salary compensation + equity commensurate with skills and experience.
Impact at scale – Help define the category of AI security and partner with Fortune 500 enterprises on their most strategic AI initiatives.
Work on the frontier – Engage with bleeding-edge AI/ML and deploy AI security solutions for use cases that don't yet exist anywhere else yet.
Collaborative culture – Join a team of builders, problem-solvers, and innovators who are mission-driven and collaborative.
Opportunity for growth – Shape not only our customer engagements, but also the processes and culture of an early lean team with plans for scale.
Company
Virtue AI
Virtue AI operates as an AI security and compliance platform for applications.
H1B Sponsorship
Virtue AI has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
Funding
Current Stage
Early StageTotal Funding
$30M2025-04-15Series A· $30M
2025-04-15Seed
Recent News
2025-10-31
Company data provided by crunchbase