Apply on Employer Site

Sandia National Laboratories · 13 hours ago

Postdoctoral Appointee - Artificial Intelligence Data Science - Hybrid

Albuquerque, NM

Full-time

Hybrid

New Grad, Entry Level

Sandia National Laboratories is the nation’s premier science and engineering lab for national security and technology innovation. The Postdoctoral Appointee will join the AI team to design and operate an AI-ready data ecosystem, transforming various data types into governed datasets that support AI models and workflows.

National DefenseGovernmentInformation TechnologyNational Security

Growth Opportunities

No H1B

Security Clearance Required

U.S. Citizen Only

Responsibilities

Build and operate an AI-Ready Lakehouse

Design and maintain a federated data lakehouse with full provenance/versioning, attribute-based access control, license/consent automation, and agent telemetry services

Implement automated, AI-mediated ingestion pipelines for heterogeneous sources (HPC simulation outputs, experimental instruments, robotics, sensor streams, satellite imagery, production logs)

Enforce Data Security & Assurance

Develop a Data Health & Threat program: dataset fingerprinting, watermarking, poisoning/anomaly detection, red-team sampling, and reproducible training manifests

Configure secure enclaves and egress processes for CUI, Restricted Data, and other sensitive corpora with attestation and differential-privacy where required

Define and Implement Data Governance

Establish FAIR-compliant metadata standards, data catalogs, and controlled-vocabulary ontologies

Automate lineage tracking, quality checks, schema validation, and leak controls at record-level granularity

Instrument AI Workflows with Standardized Telemetry

Deploy Agent Trace Schema (ATS) and Agent Run Record (ARR) frameworks to log tool calls, decision graphs, human hand-offs, and environment observations

Treat agent-generated artifacts (plans, memory, configurations) as first-class data objects

Collaborate Across Pillars

Work with Models and Interfaces teams to integrate data services into training, evaluation, and inference pipelines

Partner with Infrastructure engineers to optimize data movement, tiered storage, and high-bandwidth networking (ESnet) between HPC, cloud, and edge

Engage domain scientists and mission leads for agile deterrence, energy grid, and critical materials use cases to curate problem-specific datasets

Support Continuous Acquisition & Benchmarking

Design edge-to-exascale data acquisition systems with robotics and instrument integration

Develop data/AI benchmarks—datasets, tools, and metrics—for pipeline performance, model evaluation, and mission KPIs

Author an AI-mediated parser for a new experimental instrument, automatically extracting and cataloging metadata

Implement an attribute-based policy that blocks unapproved data combinations in a classified enclave

Prototype a streaming pipeline that feeds live sensor data from a nuclear facility into an HPC training queue

Develop a dashboard that alerts on data drift, pipeline failures, or anomalous records

Collaborate with MLOps engineers to version datasets alongside model artifacts in CI/CD

Qualification

Data pipelines (ETL/ELT)Programming PythonProgramming SQLData security principlesCloud platforms AWSCloud platforms AzureCloud platforms GCPData governance toolsMLOpsCI/CDData architectureAgile principlesInterdisciplinary collaborationCommunication skills

Required

Possess, or are pursuing, a PhD in Computer Science, Data Science, Statistics, or a related science or engineering field, PhD must be conferred within five years prior to employment

Experience or knowledge in these areas:

Building and maintaining production data pipelines (ETL/ELT) and data warehouses or data lakes

Programming languages such as Python, SQL, and experience with frameworks like Apache Spark or Dask

Data security and zero-trust principles, including secure enclaves, attribute-based access control, and data masking or differential privacy

Cloud platforms (AWS, Azure, or GCP) and container orchestration (Kubernetes

Ability to acquire and maintain a DOE Q-level security clearance

Preferred

Significant data research experience

Background in AI-mediated data curation: automated annotation, feature extraction, and dataset certification

Experience implementing data governance and metadata management tools (e.g., Apache Atlas, DataHub, Collibra)

Experience developing and refining data architectures and data flows

Hands-on background in MLOps and CI/CD for data and ML workflows (e.g., Jenkins, GitLab CI, MLflow)

Knowledge of human-factors engineering and UX design principles for data platforms

Knowledge of agile principles and practices and experience working as part of agile teams

Ability to work effectively in a dynamic, interdisciplinary environment, guiding technical decisions and mentoring junior staff

Strong written and verbal communication skills, with the ability to present complex data concepts to diverse audiences

Ability to obtain and maintain a SCI clearance, which may require a polygraph test

Curating and managing scientific or engineering datasets

Designing and enforcing data policies for classified, export-controlled, or proprietary data

Data architecture for HPC and edge-computing environments

Advanced data 'munging' fusion techniques for heterogeneous and streaming data sources

Building data pipelines for feature stores, experiment tracking, and model drift monitoring

Designing and enforcing data policies for classified, export-controlled, or proprietary data

Collaborating on public private partnerships or multi-lab federated data efforts

Benefits

Generous vacation

Strong medical and other benefits

Competitive 401k

Learning opportunities

Relocation assistance

Amenities aimed at creating a solid work/life balance

Company

Sandia National Laboratories

Glassdoor4.1

Sandia is a conducts research and development into the non-nuclear components of nuclear weapons.

Founded in 1945

Albuquerque, New Mexico, USA

10001+ employees

http://www.sandia.gov/

Funding

Current Stage

Late Stage

Total Funding

$4.4M

Key Investors

US Department of EnergyARPA-E

2023-09-21Grant· $0.5M

2023-07-27Grant

2023-01-10Grant· $3.7M

Leadership Team

Laura McGill

Deputy Laboratories Director - Nuclear Deterrence, and Chief Technology Officer

Maria Gallardo

CFO Enterprise Risk Management Program Lead

Recent News

WebProNews

When Diplomacy Fails, Try a Warhead: NASA’s Nuclear Option for Planetary Defense Just Got Real

2026-02-10

Inside HPC & AI News | High-Performance Computing & Artificial Intelligence

HPC News Bytes 20260112: Big Chip News at CES, TSMC’s Arizona Expansion, Sandia’s Neuromorphic News

2026-01-13

The Register

Artificial brains could point the way to ultra-efficient supercomputers

2026-01-11

Company data provided by crunchbase