Staff Software Engineer - GenAI Performance and Kernel jobs in United States
cer-icon
Apply on Employer Site
company-logo

Databricks · 3 months ago

Staff Software Engineer - GenAI Performance and Kernel

Databricks is a leading data and AI company, known for its innovative Data Intelligence Platform utilized by numerous organizations globally. The Staff Software Engineer for GenAI Performance and Kernel will be responsible for designing, implementing, and optimizing high-performance GPU kernels for GenAI inference, while collaborating with various teams to enhance inference performance at scale.

AnalyticsArtificial Intelligence (AI)Data StorageInformation TechnologyMachine Learning
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Lead the design, implementation, benchmarking, and maintenance of core compute kernels (e.g. attention, MLP, softmax, layernorm, memory management) optimized for various hardware backends (GPU, accelerators)
Drive the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc
Integrate kernel optimizations with higher-level ML systems
Build and maintain profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps
Lead performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation
Establish coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability
Influence system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries)
Mentor and guide other engineers working on lower-level performance, provide code reviews, help set best practices
Collaborate with infrastructure, tooling, and ML teams to roll out kernel-level optimizations into production, and monitor their impact

Qualification

GPU kernel optimizationCUDAML workloadsPerformance profilingAdvanced optimization techniquesGPU architecture knowledgeML kernel librariesDebugging skillsCommunication skillsLeadership skills

Required

BS/MS/PhD in Computer Science, or a related field
Deep hands-on experience writing and tuning compute kernels (CUDA, Triton, OpenCL, LLVM IR, assembly or similar sort) for ML workloads
Strong knowledge of GPU/accelerator architecture: warp structure, memory hierarchy (global, shared, register, L1/L2 caches), tensor cores, scheduling, SM occupancy, etc
Experience with advanced optimization techniques: tiling, blocking, software pipelining, vectorization, fusion, loop transformations, auto-tuning
Familiarity with ML-specific kernel libraries (cuBLAS, cuDNN, CUTLASS, oneDNN, etc.) or open kernels
Strong debugging and profiling skills (Nsight, NVProf, perf, vtune, custom instrumentation)
Experience reasoning about numerical stability, mixed precision, quantization, and error propagation
Experience in integrating optimized kernels into real-world ML inference systems; exposure to distributed inference pipelines, memory management, and runtime systems
Experience building high-performance products leveraging GPU acceleration
Excellent communication and leadership skills — able to drive design discussions, mentor colleagues, and make trade-offs visible
A track record of shipping performance-critical, high-quality production software

Preferred

Bonus: published in systems/ML performance venues (e.g. MLSys, ASPLOS, ISCA, PPoPP)
Experience with custom accelerators or FPGA
Experience with sparsity or model compression techniques

Benefits

Eligibility for annual performance bonus
Equity
Comprehensive benefits and perks

Company

Databricks

company-logo
Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture.

H1B Sponsorship

Databricks has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (385)
2024 (319)
2023 (227)
2022 (222)
2021 (166)
2020 (64)

Funding

Current Stage
Late Stage
Total Funding
$25.81B
Key Investors
Counterpoint GlobalFranklin TempletonAndreessen Horowitz
2025-12-16Series Unknown· $4B
2025-09-08Series Unknown· $1B
2025-01-13Debt Financing· $5.25B

Leadership Team

leader-logo
Ali Ghodsi
CEO and Co-founder
linkedin
leader-logo
David Conte
Chief Financial Officer
linkedin
Company data provided by crunchbase