GPU Kernel Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Sciforium · 1 month ago

GPU Kernel Engineer

Sciforium is an AI infrastructure company developing next-generation multimodal AI models and a proprietary, high-efficiency serving platform. They are seeking a highly skilled GPU Kernel Engineer to design and optimize custom GPU kernels for large-scale AI systems, working across the hardware-software stack to enhance performance and scalability.

Artificial Intelligence (AI)

Responsibilities

Design, implement, and optimize custom GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas
Profile and optimize end-to-end performance of ML operations, with a focus on large-scale LLM training and inference
Integrate low-level GPU kernels into frameworks such as PyTorch, JAX, and custom internal runtimes
Develop performance models, identify bottlenecks, and deliver kernel-level improvements that significantly accelerate AI workloads
Collaborate with ML researchers, distributed systems engineers, and model-serving teams to optimize compute performance across the stack
Work closely with hardware vendors (NVIDIA/AMD) and stay current on the latest GPU architecture capabilities and compiler/toolchain improvements
Contribute to tooling, documentation, benchmarking suites, and testing frameworks to ensure correctness and performance reproducibility

Qualification

GPU kernel developmentCUDA/ROCmC++ programmingPython programmingML frameworksPerformance optimizationCollaboration skillsProblem-solving skillsDocumentation skills

Required

5+ years of industry or research experience in GPU kernel development or high-performance computing
Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, or a related field
Strong programming skills in C++, Python, and familiarity with ML frameworks
Deep expertise in CUDA/ROCm, GPU memory models, and performance optimization strategies
Hands-on experience with Triton and/or JAX Pallas for custom kernel development
Strong understanding of PTX, GPU ASM, and low-level GPU execution
Extensive experience writing and optimizing custom GPU kernels in C++ and PTX
Proven ability to integrate low-level kernels into PyTorch, JAX, or similar frameworks
Experience working with large-scale LLM workloads (training or inference)

Preferred

Experience with AMD GPUs and ROCm optimization
Familiarity with JAX FFI and custom ML operator development
Experience with efficient model serving frameworks (e.g., vLLM, TensorRT)
Experience with TPUs, XLA, or similar accelerator programming environments
Contributions to open-source ML systems, compilers, or GPU kernels

Benefits

Medical, dental, and vision insurance
401k plan
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity

Company

Sciforium

twittertwitter
company-logo
Sciforium builds the next generation of AI models with unprecedented efficiency, privacy, and versatility.

Funding

Current Stage
Early Stage
Total Funding
$15.9M
2025-10-27Seed· $12M
2024-06-01Pre Seed· $3.9M
Company data provided by crunchbase