Senior Software Engineer, AI Evaluation Infra jobs in United States
cer-icon
Apply on Employer Site
company-logo

nTop · 1 day ago

Senior Software Engineer, AI Evaluation Infra

nTop is pioneering the future of engineering design with its advanced software focused on Aerospace & Defense. They are seeking a Senior Software Engineer to develop evaluation frameworks and automated tools for AI systems, ensuring their accuracy and reliability in production environments.

3D PrintingCADMechanical EngineeringProduct DesignSoftware
check
H1B Sponsor Likelynote

Responsibilities

Design evaluation frameworks: Develop metrics and benchmarks to systematically measure AI model performance, including accuracy, robustness, safety, and reliability
Develop automated tools: Build automated evaluation pipelines that run tests at scale to assess AI performance under various conditions, including adversarial, edge-case scenarios and/or integrate with 3rd party eval platforms/tools
Implement human feedback loops: Design human annotation protocols and quality control mechanisms to incorporate human judgment into the evaluation process, especially for subjective tasks
Analyze model behavior: Conduct in-depth analysis to understand AI model performance, identify weaknesses, and pinpoint failure modes
Build production systems: Extend or integrate external tools for evaluation process to production environments by creating dashboards, alerts, and observability tools to monitor models after deployment
Golden Dataset Management: Collaborate with domain experts to curate and manage high-quality "Golden Question-Answer-Context" datasets essential for ground-truth RAG evaluation
Prompt and System Optimization: Translate evaluation results into clear, actionable recommendations for Engineers to optimize the LLM integration, prompt templates, and data chunking strategies
Collaborate across teams: Work closely with product managers and software engineers to ensure that evaluation methodologies align with business goals and to communicate technical findings to stakeholders

Qualification

Machine LearningMLOpsPythonDockerNLP LibrariesCI/CDCollaborationProblem SolvingCommunication

Required

2-3 years of professional experience in machine learning, MLOps, or software quality assurance, specifically focused on modern LLM applications
Experience building, testing, or evaluating production-grade RAG systems or other complex information retrieval/NLP systems
Proven experience with Docker for containerizing applications, setting up consistent evaluation environments, and managing dependencies
Expert proficiency in Python and experience with NLP/ML libraries and data processing tools
Practical experience integrating evaluation steps into automated testing and deployment pipelines for LLM-based applications

Preferred

Experience with AI/ML applications in CAD, simulation, engineering design, optimization, or manufacturing
Experience with classic information retrieval metrics, search engine optimization, or search relevance engineering
Experience deploying and scaling RAG components and evaluation pipelines using container orchestration tools like Kubernetes on cloud platforms (e.g., AWS, Azure, GCP)
Experience designing and validating LLM-based evaluation metrics for subjective quality assessment
Familiarity with ETL processes specifically for unstructured document ingestion and metadata enrichment

Benefits

Outstanding PTO and leave policy
ISO options
Healthcare: Medical Dental and Vision plans
401k with generous matching
Annual stipend for continued career learning/ development
Commuter benefits for NY based hires

Company

nTop

twittertwittertwitter
company-logo
Ntop is an advanced engineering design software that’s bringing additive manufacturing to mainstream production.

H1B Sponsorship

nTop has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (9)
2023 (2)
2022 (5)
2021 (6)

Funding

Current Stage
Growth Stage
Total Funding
unknown
2016-10-28Series B

Leadership Team

leader-logo
Bradley Rothenberg
CEO
linkedin
leader-logo
Blanca Aguado Sierra
Chief Operating Officer
linkedin
Company data provided by crunchbase