Apply on Employer Site

Gray Swan AI · 4 months ago

Machine Learning Infrastructure Engineer

Pittsburgh, PA

Full-time

Hybrid

Mid, Senior Level

$134K/yr - $231K/yr

4+ years exp

Gray Swan AI is focused on protecting organizations from emerging AI security threats by building security models and tools for safe AI deployment. The ML Infra Engineer will be responsible for building and scaling infrastructure for distributed inference and training, transforming specialized language models into reliable services for enterprise deployment.

Artificial Intelligence (AI)Cyber SecurityDeveloper Tools

H1B Sponsored

Responsibilities

Build and scale GPU inference with vLLM (and similar) for high‑throughput, low latency LLM serving

Optimize for performance and cost, implementing batching and caching strategies, quantization, and hardware-specific optimizations to maximize tokens per dollar

Create robust deployment pipelines with automated testing, progressive rollouts, and instant rollbacks

Establish observability with comprehensive metrics, distributed tracing, and intelligent alerting that catches issues before customers notice

Design for multi-environment deployment supporting both our cloud platform and secure on-premises installations with reproducible, hardened builds

Drive operational excellence through clear SLOs, thorough runbooks, and a culture of continuous improvement

Shape our ML infrastructure vision as we scale, mentoring teammates and establishing patterns that will serve us for years

Qualification

GPU inferencePythonContainerizationInference optimizationDistributed systemsCloud-native architecturesTechnical communicationGoRustC++CUDATriton

Required

Several years building and operating production backend systems, with hands-on experience optimizing distributed inference and training

Strong proficiency in Python plus at least one systems language (Go, Rust, C++)

Deep expertise with containerization, orchestration, and cloud-native architectures

Practical understanding of GPU performance characteristics, memory management, and inference optimization

Track record of building observable, secure systems with strong operational practices

Ability to work from first principles, whether modeling costs, designing for scale, or debugging performance

Preferred

Direct experience with LLM serving frameworks (e.g., vLLM, SGLang) and Transformer model optimization

Past experience implementing a full stack LLM model (from high level model description to low-level optimizations)

Experience with low-level GPU optimization for ML workloads, using both CUDA and higher-level libraries like Triton

Contributions to open-source ML infrastructure projects or have published ML system research papers

Experience with rate limiting/quotas, per‑tenant isolation, metering, attribution, and cost allocation

A knack for clear technical communication through writing, talks, or mentorship

Benefits

Health, dental, and vision coverage

401(k) with 4% company match

28 days combined PTO

Learning & development budget

Top-tier equipment and home office support

Company

Gray Swan AI

Gray Swan AI is an AI safety and security company.

Founded in 2023

Pittsburgh, Pennsylvania, USA

11-50 employees

https://www.grayswan.ai/

Funding

Current Stage

Early Stage

Recent News

EIN Presswire

Gray Swan Appoints Rob Jenks as Chief Strategy Officer to Lead Global AI Security Market Expansion

2026-01-12

Company data provided by crunchbase