Sciforium · 1 month ago
GPU Kernel Engineer
Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. They are seeking a highly skilled GPU Kernel Engineer to design and optimize custom GPU kernels for large-scale AI systems, working across the hardware-software stack to enhance performance and scalability.
Artificial Intelligence (AI)
Responsibilities
Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas
Profile and optimize end-to-end performance of ML operations, with a focus on large-scale LLM training and inference
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and custom internal runtimes
Develop performance models, identify bottlenecks, and deliver kernel-level improvements that significantly accelerate AI workloads
Collaborate with ML researchers, distributed systems engineers, and model-serving teams to optimize compute performance across the stack
Work closely with hardware vendors (NVIDIA/AMD) and stay current on the latest GPU architecture capabilities and compiler/toolchain improvements
Contribute to tooling, documentation, benchmarking suites, and testing frameworks to ensure correctness and performance reproducibility
Qualification
Required
5+ years of industry or research experience in GPU kernel development or high-performance computing
Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related field
Strong programming skills in C++, Python, and familiarity with ML frameworks
Deep expertise in CUDA/ROCm, GPU memory models, and performance optimization strategies
Hands-on experience with Triton and/or JAX Pallas for custom kernel development
Strong understanding of PTX, GPU ASM, and low-level GPU execution
Extensive experience writing and optimizing custom GPU kernels in C++ and PTX
Proven ability to integrate low-level kernels into PyTorch, JAX, or similar frameworks
Experience working with large-scale LLM workloads (training or inference)
Preferred
Experience with AMD GPUs and ROCm optimization
Familiarity with JAX FFI and custom ML operator development
Experience with efficient model serving frameworks (e.g., vLLM, TensorRT)
Experience with TPUs, XLA, or similar accelerator programming environments
Contributions to open-source ML systems, compilers, or GPU kernels
Benefits
Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity
Company
Sciforium
Sciforium builds the next generation of AI models with unprecedented efficiency, privacy, and versatility.
Funding
Current Stage
Early StageTotal Funding
$15.9M2025-10-27Seed· $12M
2024-06-01Pre Seed· $3.9M
Company data provided by crunchbase