Senior NLP Data Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

iManage · 12 hours ago

Senior NLP Data Engineer

iManage is dedicated to Making Knowledge WorkTM, providing an intelligent cloud-enabled platform for knowledge work. As a Senior NLP Data Engineer, you will be responsible for designing and optimizing large-scale text data pipelines that power AI and machine learning solutions, collaborating closely with various teams to enhance AI capabilities across the platform.

AppsiOSSoftwareVideo
check
Growth Opportunities
check
H1B Sponsor Likelynote

Responsibilities

Designing, developing and maintaining scalable pipelines in MSFT Azure to ingest and transform large volumes of text data from multiple sources
Designing automated workflows for text normalization, deduplication, language identification, PII redaction and metadata enrichment
Building automated data validation processes to ensure accuracy and consistency
Supporting model fine-tuning, semantic search and Gen AI evaluations tuning through dataset curation, prompt dataset preparation, labeling coordination, and text quality validation
Partnering with the Applied AI team to gather data requirements and build data interfaces for developing and maintaining machine learning systems
Maintaining data lineage and following data privacy, security and governance best practices
Implementing data versioning and lineage tracking for machine learning experiments

Qualification

PythonNLP conceptsData engineeringMSFT AzureDataOps knowledgeGitCuriosityKnowledge graph implementationLegal domain experienceLarge-scale text architectureProblem solvingCreativityCollaborative mindset

Required

A Bachelor's degree or higher in Computer Science, Data Engineering, Applied Mathematics, Computational Linguistics, or a quantitative related field
4+ years of data engineering experience, with at least 2 years working with unstructured data in a business setting
Strong proficiency in Python, PySpark, and data manipulation for large unstructured text datasets
Strong understanding of NLP concepts such as tokenization, embeddings, semantic search, and experience with standard text libraries such as SpaCy, HuggingFace Datasets, NLTK
Solid dataOps knowledge and experience orchestrating advanced NLP data pipelines using cloud based data infrastructure
Proficiency with Git and collaborative development frameworks
A passion for enabling AI capabilities through scalable, reliable data architecture
Problem solving, creativity, curiosity, and a collaborative mindset

Preferred

Exposure to Microsoft Azure Services such as Fabric, ADLS, AI Foundry, Azure ML, MLflow
Experience with knowledge graph implementation for NLP applications
Experience working with data for the legal domain
Experience designing architectures for large-scale text corpora

Benefits

Comprehensive Health/Vision/Dental/Life Insurance
401k Retirement Savings Plan with a company match up to 4%
HealthJoy, a healthcare concierge service
Enhanced leave for expecting parents; 20 weeks 100% paid for primary leave, and 10 weeks 100% paid for secondary leave
Flexible time off policy
Multiple company wellness days
Free access to the Healthy Minds app for mindfulness, meditation and more

Company

iManage provides work product management solutions.

H1B Sponsorship

iManage has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (16)
2024 (16)
2023 (18)
2022 (18)
2021 (20)
2020 (12)

Funding

Current Stage
Late Stage
Total Funding
unknown
Key Investors
Bain Capital Tech Opportunities
2023-04-11Series Unknown
2003-08-18Acquired
1998-01-01Series Unknown

Leadership Team

leader-logo
Mohit Mutreja
Chief Technology Officer
linkedin
leader-logo
Arvind Agarwal
Vice President Of Engineering
linkedin
Company data provided by crunchbase