Apply on Employer Site

Databricks · 8 hours ago

Software Engineer - GenAI inference

San Francisco, CA

Full-time

Onsite

Mid Level

$142K/yr - $205K/yr

3+ years exp

Databricks is the data and AI company that empowers organizations worldwide to unify and democratize data, analytics, and AI. As a software engineer for GenAI inference, you will design, develop, and optimize the inference engine to support the Foundation Model API, ensuring high performance and scalability for large language model serving systems.

AnalyticsArtificial Intelligence (AI)Data StorageInformation TechnologyMachine Learning

Growth Opportunities

H1B Sponsor Likely

Responsibilities

Contribute to the design and implementation of the inference engine, and collaborate on model-serving stack optimized for large-scale LLMs inference

Collaborate with researchers to bring new model architectures or features (sparsity, activation compression, mixture-of-experts) into the engine

Optimize for latency, throughput, memory efficiency, and hardware utilization across GPUs, and accelerators

Build and maintain instrumentation, profiling, and tracing tooling to uncover bottlenecks and guide optimizations

Develop and enhance scalable routing, batching, scheduling, memory management, and dynamic loading mechanisms for inference workloads

Support reliability, reproducibility, and fault tolerance in the inference pipelines, including A/B launches, rollback, and model versioning

Integrate with federated, distributed inference infrastructure – orchestrate across nodes, balance load, handle communication overhead

Collaborate cross-functionally: with platform engineers, cloud infrastructure, and security/compliance teams

Document and share learnings, contributing to internal best practices and open-source efforts when possible

Qualification

ML inference internalsCUDA programmingDistributed systems designPerformance optimizationInstrumentation toolsCollaboration with researchersOpen-source contributionsOwnership mindset

Required

BS/MS/PhD in Computer Science, or a related field

Strong software engineering background (3+ years or equivalent) in performance-critical systems

Solid understanding of ML inference internals: attention, MLPs, recurrent modules, quantization, sparse operations, etc

Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS, cuDNN, NCCL, etc.)

Comfortable designing and operating distributed systems, including RPC frameworks, queuing, RPC batching, sharding, memory partitioning

Demonstrated ability to uncover and solve performance bottlenecks across layers (kernel, memory, networking, scheduler)

Experience building instrumentation, tracing, and profiling tools for ML models

Ability to work closely with ML researchers, translate novel model ideas into production systems

Ownership mindset and eagerness to dive deep into complex system challenges

Preferred

Bonus: published research or open-source contributions in ML systems, inference optimization, or model serving

Benefits

Annual performance bonus

Equity

Company

Databricks

Glassdoor4.4

Databricks is a data and AI platform that unifies data engineering, analytics, and machine learning on a lakehouse architecture.

Founded in 2013

San Francisco, California, USA

5001-10000 employees

https://www.databricks.com

H1B Sponsorship

Databricks has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (385)

2024 (319)

2023 (227)

2022 (222)

2021 (166)

2020 (64)

Funding

Current Stage

Late Stage

Total Funding

$25.81B

Key Investors

Counterpoint GlobalFranklin TempletonAndreessen Horowitz

2025-12-16Series Unknown· $4B

2025-09-08Series Unknown· $1B

2025-01-13Debt Financing· $5.25B

Leadership Team

Ali Ghodsi

CEO and Co-founder

David Conte

Chief Financial Officer

Recent News

Crunchbase News

These Were The Largest Funding Rounds Of 2025

2026-01-02

City A.M.

OpenAI, SpaceX, Anthropic: the tech giants lining up blockbuster IPOs

2025-12-29

globalventuring.com

AI supercharged 2025’s biggest deals – both directly and indirectly

2025-12-27

Company data provided by crunchbase