Lead Applied Scientist, NLP/GenAI jobs in United States
cer-icon
Apply on Employer Site
company-logo

Thomson Reuters ยท 18 hours ago

Lead Applied Scientist, NLP/GenAI

Thomson Reuters is a global provider of trusted content and technology solutions for professionals across various sectors. They are seeking a Lead Applied Scientist to innovate and deliver AI solutions for complex document understanding tasks in the legal domain, ensuring the development of systems that enhance the capabilities of their legal AI platform.

AdviceAnalyticsFinancial ServicesManagement ConsultingProfessional ServicesRisk ManagementSoftware
check
H1B Sponsor Likelynote
Hiring Manager
Lukas Pasvianskas
linkedin

Responsibilities

Lead the design, build, test, and deployment of end-to-end AI solutions for complex document understanding tasks in the legal domain
Direct the execution of large-scale projects including: advanced semantic chunking models for lengthy, non-uniformly structured legal documents with adjustable granularity; document enrichment systems with legal and customer-defined taxonomies; LLM-based knowledge graph construction pipelines that extract and link heterogeneous legal knowledge; and scalable synthetic data generation systems
Serve as the technical lead and primary point of reference, ensuring full accountability for all research deliverables
Partner with engineering to guarantee well-managed software delivery and reliability at scale across multiple product lines
Design comprehensive evaluation strategies for both component-level and end-to-end quality, leveraging expert annotation and synthetic data
Apply robust training methodologies that balance performance with latency requirements
Lead knowledge distillation initiatives to compress large models into production-ready SLMs
Maintain scientific and technical expertise through product deliverables, published research, and intellectual property contributions
Inform Labs shared capabilities and research themes through novel approaches to challenging business problems
Independently determine appropriate architectures for complex document understanding challenges, balancing accuracy, efficiency, and scalability
Make critical technical decisions on semantic chunking strategies, document classification approaches, LLM-based knowledge extraction methods, and multi-document reasoning architectures
Provide input to business stakeholders, mid-to-senior level leadership, and Labs leadership on long-term AI strategy
Develop in-depth knowledge of TR customers and data infrastructure across multiple products to shape technical roadmaps
Partner closely with Engineering and Product teams to translate complex legal document understanding challenges into scalable, production-ready solutions
Engage stakeholders across multiple product lines to deeply understand use case requirements, shaping objectives that align document understanding capabilities with diverse business needs including next-generation search and deep legal research
Mentor and coach team members with varied ML/NLP abilities, building technical capability across the organization

Qualification

Document understanding systemsDeep learning frameworksKnowledge graph constructionNLP methodsSemantic chunkingInformation extractionSynthetic data generationModel optimizationProgramming skillsStakeholder engagementTechnical leadershipMentoringCommunication skills

Required

PhD in Computer Science, AI, NLP, or a related field, or a Master's degree with equivalent research/industry experience
7+ years of hands-on experience building and deploying document understanding systems, information extraction pipelines, or knowledge graph construction using deep learning, LLMs, and NLP methods
Proven ability to translate complex document understanding problems into innovative AI applications that balance accuracy and efficiency
Demonstrated ability to provide technical leadership, mentor team members, and influence without formal authority in an applied research setting
Strong programming skills (e.g., Python) and experience with modern deep learning frameworks (e.g., PyTorch, Hugging Face Transformers, DeepSpeed)
Publications at relevant venues such as ACL, EMNLP, ICLR, NeurIPS, SIGIR, or KDD
Deep understanding of document understanding fundamentals: document layout analysis, semantic chunking approaches beyond fixed-size or paragraph-based methods, document classification handling hierarchical taxonomies, imbalanced multi-label classification, and adapting to domain-specific schemas
Expertise in knowledge extraction and knowledge graph construction: entity recognition and linking, relation extraction, citation parsing, and building graph representations from unstructured text
Expertise in LLM-based information extraction, few-shot and multi-task learning, post-training, and knowledge distillation
Solid understanding of synthetic data generation techniques for NLP, including query-answer generation with verification and scalable data augmentation for training specialized models
Solid understanding of efficiency optimization including knowledge distillation, model compression, and designing SLM-based solutions that balance performance with computational constraints
Solid understanding of DL/ML approaches used for NLP tasks
Experience designing annotation workflows, creating high-quality labeled datasets with clear guidelines, and developing evaluation frameworks for document understanding tasks

Preferred

Prior work on legal document understanding, legal information extraction, knowledge representation including legal citations and legal domain concepts, or legal AI applications
Prior work handling complex document structures common in legal documents: non-uniform formatting, nested hierarchies, cross-references, and embedded elements
Experience building systems that perform analysis, question answering, or retrieval across large document collections
Experience with knowledge graph frameworks and methodologies for legal or enterprise applications
Understanding of RAG and agentic workflows for enterprise knowledge
Experience working with AzureML or AWS SageMaker

Benefits

Flexibility & Work-Life Balance
Career Development and Growth
Industry Competitive Benefits
Culture
Social Impact
Making a Real-World Impact
Market competitive health, dental, vision, disability, and life insurance programs
Competitive 401k plan with company match
Competitive vacation, sick and safe paid time off
Paid holidays (including two company mental health days off)
Parental leave
Sabbatical leave
Optional hospital, accident and sickness insurance paid 100% by the employee
Optional life and AD&D insurance paid 100% by the employee
Flexible Spending and Health Savings Accounts
Fitness reimbursement
Access to Employee Assistance Program
Group Legal Identity Theft Protection benefit paid 100% by employee
Access to 529 Plan
Commuter benefits
Adoption & Surrogacy Assistance
Tuition Reimbursement
Access to Employee Stock Purchase Plan

Company

Thomson Reuters

company-logo
Thomson Reuters delivers critical information from the financial, legal, accounting, intellectual property, science, and media markets.

H1B Sponsorship

Thomson Reuters has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (13)
2024 (12)
2023 (5)

Funding

Current Stage
Public Company
Total Funding
unknown
1995-11-20IPO

Leadership Team

leader-logo
Steve Hasker
President and CEO
linkedin
leader-logo
Michael Eastwood
Chief Financial Officer
linkedin
Company data provided by crunchbase