Staff Machine Learning Engineer, Oncology foundation Model jobs in United States
cer-icon
Apply on Employer Site
company-logo

Tempus AI · 1 month ago

Staff Machine Learning Engineer, Oncology foundation Model

Tempus AI is focused on precision medicine and advancing the healthcare industry through AI. They are seeking an experienced Staff Machine Learning Engineer to design, build, and optimize data infrastructure for multimodal AI models that enhance patient care and accelerate medical research.

Artificial Intelligence (AI)BiotechnologyHealth CareMachine LearningMedicalPrecision Medicine
check
H1B Sponsor Likelynote

Responsibilities

Architect and build sophisticated data processing workflows responsible for ingesting, processing, and preparing multimodal training data that seamlessly integrate with large-scale distributed ML training frameworks and infrastructure (GPU clusters)
Develop strategies for efficient, compliant data ingestion from diverse sources, including internal databases, third-party APIs, public biomedical datasets, and Tempus's proprietary data ecosystem
Utilize, optimize, and contribute to frameworks specialized for large-scale ML data loading and streaming (e.g., MosaicML Streaming, Ray Data, HF Datasets)
Collaborate closely with infrastructure and platform teams to leverage and optimize cloud-native services (primarily GCP) for performance, cost-efficiency, and security
Engineer efficient connectors and data loaders for accessing and processing information from diverse knowledge sources, such as knowledge graphs, internal structured databases, biomedical literature repositories (e.g., PubMed), and curated ontologies
Optimize data storage for efficient large scale training training and knowledge access
Orchestrate, monitor, and troubleshoot complex data workflows using tools like Airflow, Kubeflow Pipelines
Establish robust monitoring, logging, and alerting systems for data pipeline health, data drift detection, and data quality assurance, providing feedback loops for continuous improvement
Analyze and optimize data I/O performance bottlenecks considering storage systems, network bandwidth and compute resources
Actively manage and seek optimizations for the costs associated with storing and processing massive datasets in the cloud

Qualification

Large-scale data pipelinesDistributed data processingMachine Learning frameworksCloud-native servicesPythonData ingestion strategiesData workflow orchestrationLeadershipCollaborationCommunication skills

Required

Master's degree in Computer Science, Artificial Intelligence, Software Engineering, or a related field. A strong academic background with a focus on AI data engineering
Proven track record (8+ years of industry experience) in designing, building, and operating large-scale data pipelines and infrastructure in a production environment
Strong experience working with massive, heterogeneous datasets (TBs+) and modern distributed data processing tools and frameworks such as Apache Spark, Ray, or Dask
Strong, hands-on experience with tools and libraries specifically designed for large-scale ML data handling, such as Hugging Face Datasets, MosaicML Streaming, or similar frameworks (e.g., WebDataset, Petastorm). Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow, SageMaker Pipelines)
Understanding of the data challenges specific to training large models (Foundation Models, LLMs, Multimodal Models)
Proficiency in programming languages like Python and experience with modern distributed data processing tools and frameworks
Proven ability to bring thought leadership to the product and engineering teams, influencing technical direction and data strategy
Experience mentoring junior engineers and collaborating effectively with cross-functional teams (Research Scientists, ML Engineers, Platform Engineers, Product Managers, Clinicians)
Excellent communication skills, capable of explaining complex technical concepts to diverse audiences
Strong bias-to-action and ability to thrive in a fast-paced, dynamic research and development environment
A pragmatic approach focused on delivering rapid, iterative, and measurable progress towards impactful goals

Preferred

Advanced degree (PhD) in Computer Science, Engineering, Bioinformatics, or a related field
Contributions to relevant open-source projects
Direct experience working with clinical or biological data (EHR, genomics, medical imaging)

Benefits

Incentive compensation
Restricted stock units
Medical and other benefits depending on the position

Company

Tempus AI

company-logo
Tempus is making precision medicine a reality by applying AI in healthcare, deriving insights from our expansive library of clinical data and molecular data.

H1B Sponsorship

Tempus AI has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2021 (3)

Funding

Current Stage
Public Company
Total Funding
$2.29B
Key Investors
Ares ManagementGoogleBaillie Gifford
2025-06-30Post Ipo Debt· $650M
2025-02-19Post Ipo Debt· $300M
2024-06-14IPO

Leadership Team

leader-logo
Eric Lefkofsky
Founder and CEO
linkedin
leader-logo
Shane Colley
CTO
linkedin
Company data provided by crunchbase