nTop · 1 day ago
Senior Software Engineer, AI Evaluation Infra
nTop is pioneering the future of engineering design with its advanced software focused on Aerospace & Defense. They are seeking a Senior Software Engineer to develop evaluation frameworks and automated tools for AI systems, ensuring their accuracy and reliability in production environments.
3D PrintingCADMechanical EngineeringProduct DesignSoftware
Responsibilities
Design evaluation frameworks: Develop metrics and benchmarks to systematically measure AI model performance, including accuracy, robustness, safety, and reliability
Develop automated tools: Build automated evaluation pipelines that run tests at scale to assess AI performance under various conditions, including adversarial, edge-case scenarios and/or integrate with 3rd party eval platforms/tools
Implement human feedback loops: Design human annotation protocols and quality control mechanisms to incorporate human judgment into the evaluation process, especially for subjective tasks
Analyze model behavior: Conduct in-depth analysis to understand AI model performance, identify weaknesses, and pinpoint failure modes
Build production systems: Extend or integrate external tools for evaluation process to production environments by creating dashboards, alerts, and observability tools to monitor models after deployment
Golden Dataset Management: Collaborate with domain experts to curate and manage high-quality "Golden Question-Answer-Context" datasets essential for ground-truth RAG evaluation
Prompt and System Optimization: Translate evaluation results into clear, actionable recommendations for Engineers to optimize the LLM integration, prompt templates, and data chunking strategies
Collaborate across teams: Work closely with product managers and software engineers to ensure that evaluation methodologies align with business goals and to communicate technical findings to stakeholders
Qualification
Required
2-3 years of professional experience in machine learning, MLOps, or software quality assurance, specifically focused on modern LLM applications
Experience building, testing, or evaluating production-grade RAG systems or other complex information retrieval/NLP systems
Proven experience with Docker for containerizing applications, setting up consistent evaluation environments, and managing dependencies
Expert proficiency in Python and experience with NLP/ML libraries and data processing tools
Practical experience integrating evaluation steps into automated testing and deployment pipelines for LLM-based applications
Preferred
Experience with AI/ML applications in CAD, simulation, engineering design, optimization, or manufacturing
Experience with classic information retrieval metrics, search engine optimization, or search relevance engineering
Experience deploying and scaling RAG components and evaluation pipelines using container orchestration tools like Kubernetes on cloud platforms (e.g., AWS, Azure, GCP)
Experience designing and validating LLM-based evaluation metrics for subjective quality assessment
Familiarity with ETL processes specifically for unstructured document ingestion and metadata enrichment
Benefits
Outstanding PTO and leave policy
ISO options
Healthcare: Medical Dental and Vision plans
401k with generous matching
Annual stipend for continued career learning/ development
Commuter benefits for NY based hires
Company
nTop
Ntop is an advanced engineering design software that’s bringing additive manufacturing to mainstream production.
H1B Sponsorship
nTop has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (4)
2024 (9)
2023 (2)
2022 (5)
2021 (6)
Funding
Current Stage
Growth StageTotal Funding
unknown2016-10-28Series B
Recent News
Company data provided by crunchbase