Apply on Employer Site

Sciforium · 1 month ago

Lead Software Engineer, Model Serving Platform

San Francisco, CA

Full-time

Onsite

Senior Level, Lead/Staff

$230K/yr - $300K/yr

5+ years exp

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. The Lead Software Engineer will architect and lead the development of the model serving platform, guiding engineering execution while building core components and mentoring other engineers.

Artificial Intelligence (AI)

Responsibilities

Lead the technical direction of the model serving platform, owning architecture decisions and guiding engineering execution

Build core serving components including execution runtimes, batching, scheduling, and distributed inference systems

Develop high-performance C++ and CUDA/HIP modules, including custom GPU kernels and memory-optimized runtimes

Collaborate with ML researchers to productionize new multimodal models and ensure low-latency, scalable inference

Build Python APIs and services that expose model capabilities to downstream applications

Mentor and support other engineers through code reviews, design discussions, and hands-on technical guidance

Drive performance profiling, benchmarking, and observability across the inference stack

Ensure high reliability and maintainability through testing, monitoring, and engineering best practices

Troubleshoot and resolve complex issues across GPU, runtime, and service layers

Qualification

C++PythonDistributed systemsKubernetes/RayPerformance optimizationCUDAML systems engineeringDebugging skillsEffective communicationMentoring

Required

Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience

5+ years of experience designing and building scalable, reliable backend systems or distributed infrastructure

Strong understanding of LLM inference mechanics (prefill vs decode, batching, KV cache)

Experience with Kubernetes/Ray, Containerization

Strong proficiency in C++, Python

Strong debugging, profiling, and performance optimization skills at the system level

Ability to collaborate closely with ML researchers and translate model or runtime requirements into production-grade systems

Effective communication skills and the ability to lead technical discussions, mentor engineers, and drive engineering quality

Comfortable working from the office and contributing to a fast-moving, high-ownership team culture

Preferred

Experience with ML systems engineering, distributed GPU scheduling, open source inference engine like vLLM, Sglang, or TRT-LLM

Experience in building large scale ML/MLOps infrastructure

Proficiency in CUDA or ROCm and experience with GPU profiling tools

Experience at an AI/ML startup, research lab, or Big Tech infrastructure/ML team

Familiarity with multimodal model architectures, raw-byte models, or efficient inference techniques

Contributions to open-source ML or HPC infrastructure

Benefits

Medical, dental, and vision insurance

401k plan

Daily lunch, snacks, and beverages

Flexible time off

Competitive salary and equity

Company

Sciforium

Sciforium builds the next generation of AI models with unprecedented efficiency, privacy, and versatility.

Founded in 2024

San Francisco, California, USA

2-10 employees

https://sciforium.com

Funding

Current Stage

Early Stage

Total Funding

$15.9M

2025-10-27Seed· $12M

2024-06-01Pre Seed· $3.9M

Company data provided by crunchbase