Data Engineer – Data Architecture for Data Science & Machine Learning jobs in United States
cer-icon
Apply on Employer Site
company-logo

Penn State University · 1 month ago

Data Engineer – Data Architecture for Data Science & Machine Learning

Penn State University is seeking a Senior Data Engineer with expertise in database design and data access strategies to support data science and machine learning initiatives. The role involves architecting and optimizing data systems to empower data scientists in their research and model training efforts.

Higher Education
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Design and maintain scalable, high-performance database solutions to support data science workflows and ML experimentation
Partner with data scientists to understand data access patterns and develop storage strategies that accelerate analysis and model training
Serve as the internal subject matter expert on PostgreSQL—including schema design, indexing, partitioning, and query optimization
Evaluate and integrate alternative database technologies (e.g., MongoDB, Neo4j, Redis, Cassandra) where they provide clear advantages
Lead efforts to optimize data pipelines for both structured and unstructured data used in algorithm development
Ensure data integrity, security, and governance across storage systems
Implement monitoring, automation, and performance-tuning tools for all database environments
Advise on data lifecycle management—balancing accessibility for R&D with efficiency and compliance requirements

Qualification

PostgreSQLData modelingNon-relational databasesCloud database servicesSQLPythonCommunication skillsCollaboration skills

Required

5+ years of experience in data engineering, database architecture, or related technical roles
Expert-level proficiency in PostgreSQL (query tuning, schema design, indexing, partitioning, replication)
Strong understanding of data modeling, normalization vs. denormalization tradeoffs, and query optimization
Experience with non-relational databases (e.g., MongoDB, Cassandra, Neo4j, Redis, or DynamoDB)
Familiarity with machine learning workflows and how data is consumed for training, evaluation, and deployment
Experience with cloud database services (AWS RDS/Aurora, GCP Cloud SQL, Azure Database)
Proficiency in SQL and one or more scripting languages (Python preferred)
Excellent communication and collaboration skills—comfortable working closely with data scientists, ML engineers, and software developers

Preferred

Experience architecting hybrid data ecosystems spanning relational, NoSQL, and analytical databases
Knowledge of data lake, warehouse, and feature store architectures (e.g., Snowflake, Redshift, BigQuery, Feast)
Familiarity with ETL/ELT frameworks and data orchestration tools (e.g., Airflow, dbt)
Bachelor's or Master's degree in Computer Science, Data Engineering, or a related field

Benefits

Comprehensive medical, dental, and vision coverage
Robust retirement plans
Substantial paid time off which includes holidays, vacation and sick time
Generous 75% tuition discount, available to employees as well as eligible spouses and children

Company

Penn State University

twitter
company-logo
There’s a reason Penn State consistently ranks among the top one percent of the world’s universities.

Funding

Current Stage
Late Stage

Leadership Team

leader-logo
Hamza Jamjoom
Co-Founder - Arts & Architecture Student Council
linkedin
leader-logo
Kara Pytko
Co-founder of Virtual Scientist Webinar Series
linkedin
Company data provided by crunchbase