smallest.ai · 4 hours ago
Senior GPU Optimisation Engineer | San Francisco
smallest.ai is seeking a Senior GPU Optimization Engineer who has a deep understanding of GPUs and can optimize model architectures for real-time performance. The role involves working on CUDA kernels, model graph optimizations, and tuning models across various GPU architectures to enhance the performance of real-time speech models.
Artificial Intelligence (AI)Generative AIInformation TechnologySaaSSoftware
Responsibilities
Optimize model architectures (ASR, TTS, SLMs) for maximum performance on specific GPU hardware
Profile models end-to-end to identify GPU bottlenecks — memory bandwidth, kernel launch overhead, fusion opportunities, quantization constraints
Design and implement custom kernels (CUDA/Triton/Tinygrad) for performance-critical model sections
Perform operator fusion, graph optimization, and kernel-level scheduling improvements
Tune models to fit GPU memory limits while maintaining quality
Benchmark and calibrate inference across NVIDIA, AMD, and potentially emerging accelerators
Port models across GPU chipsets (NVIDIA → AMD / edge GPUs / new compute backends)
Work with TensorRT, ONNX Runtime, and custom runtimes for deployment
Partner with the research and infra teams to ensure the entire stack is optimized for real-time workloads
Qualification
Required
Strong understanding of GPU architecture — SMs, warps, memory hierarchy, occupancy tuning
Hands-on experience with CUDA, kernel writing, and kernel-level debugging
Experience with kernel fusion and model graph optimizations
Familiarity with TensorRT, ONNX, Triton, tinygrad, or similar inference engines
Strong proficiency in PyTorch and Python
Deep understanding of model architectures (transformers, convs, RNNs, attention, diffusion blocks)
Experience profiling GPU workloads using Nsight, nvprof, or similar tools
Strong problem-solving abilities with a performance-first mindset
3-5 years of specialized experience in GPU Optimization through academia or industry
Master's or PhD in GPU Programming or related field
Preferred
Experience with quantization (INT8, FP8, hybrid formats)
Experience with audio/speech models (ASR, TTS, SSL, vocoders)
Contributions to open-source GPU stacks or inference runtimes
Published work related to systems-level model optimization
Company
smallest.ai
Smallest.ai is a Software Development developing a voice AI foundation models for enterprise deployment, sales and support.
Funding
Current Stage
Early StageTotal Funding
$9MKey Investors
Amazon Web ServicesSierra Ventures
2025-10-09Non Equity Assistance· $1M
2025-09-22Seed· $8M
2025-03-20Pre Seed
Recent News
2026-01-25
2026-01-22
2026-01-03
Company data provided by crunchbase