Postdoctoral Research Associate, Data Readiness jobs in United States
cer-icon
Apply on Employer Site
company-logo

CHEManager International · 10 hours ago

Postdoctoral Research Associate, Data Readiness

CHEManager International is seeking a postdoctoral research associate to advance the state of scientific AI by addressing challenges in data readiness for AI. This role focuses on researching, designing, and deploying innovative data pipelines and readiness frameworks to enhance AI-driven discovery across various scientific domains.

Responsibilities

Conduct and publish original research focused on data readiness methodologies and frameworks for scalable AI applications across fluid dynamics, fusion, materials, life sciences, and other strategic domains
Investigate novel approaches for balancing efficient I/O, interoperability, and scientific validity in AI-ready datasets
Design, prototype, and optimize preprocessing pipelines using HPC resources, targeting scalable execution and automation
Collaborate with domain scientists to integrate pipelines into end-to-end AI workflows specific to scientific domains
Publish research outcomes in peer-reviewed journals and conference venues, setting benchmarks and proposing methodologies for cross-disciplinary readiness challenges
Aid in the development and adoption of open standards for scientific dataset processing, including contributing to open-source tools
Mentor interns, students, and peers in cross-domain data readiness approaches
Present findings at technical workshops, scientific meetings, and in outreach efforts to improve awareness around the importance of data readiness for scientific AI

Qualification

Data preprocessing pipelinesAI-ready dataset designHPC environmentsModern data frameworksScalable I/O solutionsPrivacy regulations knowledgeIndependent research abilityPublications in peer-reviewed venuesCollaborative mindset

Required

Ph.D. earned in Computer Science, Data Science, Computational Science, a scientific domain relevant to AI (e.g., physics, biology, chemistry, climate), or a closely related field (within the last 5 years or near completion)
Demonstrated expertise in data preprocessing pipelines, AI-ready dataset design, or scientific workflows in HPC environments
Proven experience with modern data frameworks (e.g., PyTorch, TensorFlow), scalable I/O solutions (e.g., HDF5, ADIOS2), and distributed computing tools relevant to data preparation
Evidence of ability to conduct independent research and publish in peer-reviewed venues

Preferred

Hands-on experience prototyping and scaling data pipelines in HPC environments (Frontier-scale or similar)
Strong familiarity with domain-specific formats such as NetCDF, CSV/Parquet, FASTA/MMCIF, or graph-based encodings in materials and molecular AI
Familiarity with frameworks for automated and reproducible workflows
Knowledge of governing regulations around privacy (e.g., HIPAA, ITAR), including secure enclave architectures and federated learning approaches
Background in developing reproducible pipelines with validation, provenance tracking, and schema consistency checks
Publications in relevant conferences (e.g., NeurIPS, SC, AAAI, or domain-specific venues like Fusion Science or Computational Materials)
Collaborative mindset in team environments and across disciplines

Company

CHEManager International

twitter
company-logo
Wiley’s leading media brand providing first-hand information on the global chemical, life science and process industries

Funding

Current Stage
Growth Stage
Company data provided by crunchbase