Databricks · 3 months ago
Staff Software Engineer - GenAI Performance and Kernel
Databricks is a leading data and AI company, known for its innovative Data Intelligence Platform utilized by numerous organizations globally. The Staff Software Engineer for GenAI Performance and Kernel will be responsible for designing, implementing, and optimizing high-performance GPU kernels for GenAI inference, while collaborating with various teams to enhance inference performance at scale.
AnalyticsArtificial Intelligence (AI)Data StorageInformation TechnologyMachine Learning
Responsibilities
Lead the design, implementation, benchmarking, and maintenance of core compute kernels (e.g. attention, MLP, softmax, layernorm, memory management) optimized for various hardware backends (GPU, accelerators)
Drive the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc
Integrate kernel optimizations with higher-level ML systems
Build and maintain profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps
Lead performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation
Establish coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability
Influence system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)
Mentor and guide other engineers working on lower-level performance, provide code reviews, help set best practices
Collaborate with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitor their impact
Qualification
Required
BS/MS/PhD in Computer Science, or a related field
Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads
Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc
Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning
Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels
Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)
Experience reasoning about numerical stability, mixed precision, quantization, and error propagation
Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems
Experience building high-performance products leveraging GPU acceleration
Excellent communication and leadership skills — able to drive design discussions, mentor colleagues, and make trade-offs visible
A track record of shipping performance-critical, high-quality production software
Preferred
Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP)
Experience with custom accelerators or FPGA
Experience with sparsity or model compression techniques
Benefits
Eligibility for annual performance bonus
Equity
Comprehensive benefits and perks
Company
Databricks
Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture.
H1B Sponsorship
Databricks has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (385)
2024 (319)
2023 (227)
2022 (222)
2021 (166)
2020 (64)
Funding
Current Stage
Late StageTotal Funding
$25.81BKey Investors
Counterpoint GlobalFranklin TempletonAndreessen Horowitz
2025-12-16Series Unknown· $4B
2025-09-08Series Unknown· $1B
2025-01-13Debt Financing· $5.25B
Recent News
Crunchbase News
2026-01-09
2026-01-08
Destination CRM
2026-01-07
Company data provided by crunchbase