Apply on Employer Site

nTop · 1 day ago

Senior Software Engineer, AI Evaluation Infra

New York, NY

Full-time

Hybrid

Mid, Senior Level

$145K/yr - $190K/yr

2+ years exp

nTop is pioneering the future of engineering design with its advanced software focused on Aerospace & Defense. They are seeking a Senior Software Engineer to develop evaluation frameworks and automated tools for AI systems, ensuring their accuracy and reliability in production environments.

3D PrintingCADMechanical EngineeringProduct DesignSoftware

H1B Sponsor Likely

Responsibilities

Design evaluation frameworks: Develop metrics and benchmarks to systematically measure AI model performance, including accuracy, robustness, safety, and reliability

Develop automated tools: Build automated evaluation pipelines that run tests at scale to assess AI performance under various conditions, including adversarial, edge-case scenarios and/or integrate with 3rd party eval platforms/tools

Implement human feedback loops: Design human annotation protocols and quality control mechanisms to incorporate human judgment into the evaluation process, especially for subjective tasks

Analyze model behavior: Conduct in-depth analysis to understand AI model performance, identify weaknesses, and pinpoint failure modes

Build production systems: Extend or integrate external tools for evaluation process to production environments by creating dashboards, alerts, and observability tools to monitor models after deployment

Golden Dataset Management: Collaborate with domain experts to curate and manage high-quality "Golden Question-Answer-Context" datasets essential for ground-truth RAG evaluation

Prompt and System Optimization: Translate evaluation results into clear, actionable recommendations for Engineers to optimize the LLM integration, prompt templates, and data chunking strategies

Collaborate across teams: Work closely with product managers and software engineers to ensure that evaluation methodologies align with business goals and to communicate technical findings to stakeholders

Qualification

Machine LearningMLOpsPythonDockerNLP LibrariesCI/CDCollaborationProblem SolvingCommunication

Required

2-3 years of professional experience in machine learning, MLOps, or software quality assurance, specifically focused on modern LLM applications

Experience building, testing, or evaluating production-grade RAG systems or other complex information retrieval/NLP systems

Proven experience with Docker for containerizing applications, setting up consistent evaluation environments, and managing dependencies

Expert proficiency in Python and experience with NLP/ML libraries and data processing tools

Practical experience integrating evaluation steps into automated testing and deployment pipelines for LLM-based applications

Preferred

Experience with AI/ML applications in CAD, simulation, engineering design, optimization, or manufacturing

Experience with classic information retrieval metrics, search engine optimization, or search relevance engineering

Experience deploying and scaling RAG components and evaluation pipelines using container orchestration tools like Kubernetes on cloud platforms (e.g., AWS, Azure, GCP)

Experience designing and validating LLM-based evaluation metrics for subjective quality assessment

Familiarity with ETL processes specifically for unstructured document ingestion and metadata enrichment

Benefits

Outstanding PTO and leave policy

ISO options

Healthcare: Medical Dental and Vision plans

401k with generous matching

Annual stipend for continued career learning/ development

Commuter benefits for NY based hires

Company

nTop

Ntop is an advanced engineering design software that’s bringing additive manufacturing to mainstream production.

Founded in 2015

New York, New York, USA

51-200 employees

https://www.ntop.com

H1B Sponsorship

nTop has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (4)

2024 (9)

2023 (2)

2022 (5)

2021 (6)

Funding

Current Stage

Growth Stage

Total Funding

unknown

2016-10-28Series B

Leadership Team

Bradley Rothenberg

CEO

Blanca Aguado Sierra

Chief Operating Officer

Recent News

PC Gamer

Highly intricate water blocks like this one may become the norm as server CPU power consumption soars, and could even trickle down into gaming PCs

2025-08-29

Business Wire

Additive Manufacturing Software Markets Report 2025: Analysis, Data and Forecast - Revenues Expected to Hit $6.78B by 2033 - ResearchAndMarkets.com

2025-05-06

Research and Markets

Additive Manufacturing Software Market Analysis, Data and Forecast Report 2025: AI Integration Drives Rapid Evolution in 3D Printing Software Market Dynamics

2025-04-29

Company data provided by crunchbase