Sciforium · 1 month ago
Lead Software Engineer, Model Serving Platform
Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. The Lead Software Engineer will architect and lead the development of the model serving platform, guiding engineering execution while building core components and mentoring other engineers.
Artificial Intelligence (AI)
Responsibilities
Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution
Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems
Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes
Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference
Build Python APIs and services that expose model capabilities to downstream applications
Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance
Drive performance profiling, benchmarking, and observability across the inference stack
Ensure high reliability and maintainability through testing, monitoring, and engineering best practices
Troubleshoot and resolve complex issues across GPU, runtime, and service layers
Qualification
Required
Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience
5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure
Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)
Experience with Kubernetes/Ray, Containerization
Strong proficiency in C++, Python
Strong debugging, profiling, and performance optimization skills at the system level
Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems
Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality
Comfortable working from the office and contributing to a fast-moving, high-ownership team culture
Preferred
Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM
Experience in building large scale ML/MLOps infrastructure
Proficiency in CUDA or ROCm and experience with GPU profiling tools
Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team
Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques
Contributions to open-source ML or HPC infrastructure
Benefits
Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity
Company
Sciforium
Sciforium builds the next generation of AI models with unprecedented efficiency, privacy, and versatility.
Funding
Current Stage
Early StageTotal Funding
$15.9M2025-10-27Seed· $12M
2024-06-01Pre Seed· $3.9M
Company data provided by crunchbase