Senior Data Engineer / Data Curator jobs in United States
cer-icon
Apply on Employer Site
company-logo

TSMC · 1 day ago

Senior Data Engineer / Data Curator

TSMC Arizona offers an opportunity to work at the most advanced semiconductor fab in the United States. As a Senior Data Engineer in the AI Data Curation track, you will ensure that the data powering AI models is high-quality and well-organized, playing a key role in designing and maintaining scalable data pipelines.

Consumer ElectronicsElectronicsManufacturingProduct DesignSemiconductor
check
H1B Sponsor Likelynote

Responsibilities

Design and implement data pipelines for processing, cleaning, and curating large datasets used in model training and fine-tuning
Automate data cleaning processes (e.g., removing noise, duplicates, irrelevant content) and ensure datasets are appropriately labeled and structured
Collaborate with model teams to ensure data aligns with model requirements and performance goals
Assess and mitigate bias in datasets, ensuring that models are trained on diverse and representative data
Manage data storage and retrieval strategies, ensuring scalability and data consistency across different environments
Conduct regular audits to ensure data integrity, privacy, and security compliance

Qualification

Data engineeringPythonSQLCloud storageData pipelinesAI ethicsCommunicationTeamwork

Required

Bachelor's degree in Computer Science, Data Science, or a related field
5+ years of experience in data engineering, data wrangling, or data curation, particularly in machine learning or AI-driven environments
Strong proficiency in Python (Pandas, NumPy) and SQL for data manipulation and querying
Familiarity with cloud-based data storage (AWS S3, Google Cloud Storage, etc.) and distributed systems for managing large datasets
Experience with data annotation tools and platforms for manual or semi-automated labeling
Experience with NLP data formats, such as JSONL, text, or embeddings, and an understanding of tokenization
Experience managing data pipelines with tools like Apache Kafka, Apache Airflow, or similar ETL tools
Strong knowledge of AI ethics, data privacy, and compliance standards (GDPR, CCPA, etc.)
Candidates must be willing and able to work on-site at our Phoenix Arizona facility
Communication
Computer proficiency
Presentation skills
Listening
Teamwork

Preferred

Experience with vector databases and indexing for LLMs (e.g., FAISS, Pinecone)

Benefits

Medical, dental and vision plan offerings
Income-protection programs
401(k)-retirement savings plan
Competitive paid time-off programs
Paid holidays

Company

Established in 1987, TSMC is the world's first dedicated semiconductor foundry.

H1B Sponsorship

TSMC has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (49)
2024 (27)
2023 (30)
2022 (17)
2021 (31)
2020 (35)

Funding

Current Stage
Public Company
Total Funding
$14.2B
Key Investors
U.S. Department of CommerceBerkshire Hathaway
2024-04-08Grant· $6.6B
2024-01-01Acquired
2022-09-30Post Ipo Equity· $4.1B

Leadership Team

leader-logo
Morris Chang
Founder, Chairman & CEO
leader-logo
Peter M. Cleveland
Senior Vice President
linkedin
Company data provided by crunchbase