Apply on Employer Site

Robots & Pencils · 1 day ago

AI Engineer (AI System Calibration & Optimization)

United States

Full-time

Remote

Senior Level

7+ years exp

Robots & Pencils is seeking an outcome-oriented AI Engineer to partner with a strategic client on a high-impact AI system calibration and optimization engagement. The role involves embedding directly with the client's teams to improve their AI model's accuracy and reliability through systematic prompt optimization and calibration workflows.

AppsInformation TechnologySoftware

H1B Sponsor Likely

Responsibilities

Embed with strategic client as their technical partner for AI system calibration and prompt optimization

Build production-grade calibration systems using Python within the client's Azure environment

Implement DSPy framework and GEPA optimizer to systematically improve prompt quality and retrieval performance

Design and develop Golden Dataset curation workflows using Azure Data Labeling, establishing gold/silver data tier schemas

Create evaluation frameworks to measure model accuracy, precision/recall, latency, and hallucination rates

Architect prompt optimization pipelines for retrieval, context synthesis, and answer generation tailored to client needs

Own the path to production - evaluation pipelines, Azure ML workflows, KPI dashboards, and optimization automation

Iterate rapidly based on client feedback and KPI results, translate business goals into technical calibration improvements

Own end-to-end delivery of calibration systems from initial baseline to production-ready optimization workflows

Establish measurable KPIs and demonstrate accuracy improvements, latency reduction, and hallucination mitigation

Provide strategic guidance on RAG architecture improvements and retrieval parameter optimization

Accelerate client time-to-value through hands-on development and comprehensive knowledge transfer

Deliver operational playbooks and documentation enabling the client team to maintain calibration systems independently

Lead complex, multi-stakeholder calibration initiatives on-site and remotely; drive clarity, remove blockers, and keep execution on track

Set coding standards and architectural patterns for calibration components; write clear docs, runbooks, and technical specifications

Mentor client engineers through code reviews, pairing sessions, and technical workshops on DSPy, GEPA, and evaluation best practices

Make sound tradeoffs under real-world constraints - Azure cost optimization, data quality, performance requirements, and security

Align delivery with Robots & Pencils' responsible AI practices and client governance requirements

Work closely with client's AI SMEs and product engineering teams to understand product catalog structure and validation workflows

Collaborate with internal R&P product, engineering, and delivery teams on calibration methodology and best practices

Share insights from client engagement to improve R&P's prompt optimization frameworks and tooling

Contribute reusable patterns, evaluation frameworks, and documentation back to R&P's core platform

Collaborate across time zones with distributed teams

Qualification

PythonGenerative AIAzure MLDSPy frameworkRAG architecturesEvaluation metricsData curationIaC (Terraform)SDLC practicesAccountabilityUpper-intermediate EnglishCommunication skillsAdaptabilityCollaboration

Required

Bachelor's degree in computer science, Engineering, or equivalent experience

7+ years of professional software development with significant ownership of architecture and delivery

3+ years of Python in ML/AI systems with a strong focus on data processing and evaluation pipelines

2+ years building with Generative AI including hands-on prompt engineering and optimization work

Experience with prompt optimization frameworks - DSPy strongly preferred, or similar systematic approaches to prompt improvement

Deep understanding of RAG architectures - retrieval quality, latency/cost tuning, hallucination mitigation, and evaluation methods

Hands-on experience designing evaluation metrics and building assessment frameworks for LLM systems

Knowledge of systematic experimentation methods - A/B testing, parameter tuning, performance benchmarking

Experience with data curation, labeling workflows, and dataset quality management for AI systems

Strong Azure cloud experience with focus on AI/ML services - Azure Machine Learning, Azure AI Search, Azure OpenAI Service

Experience with Azure Data Labeling, Azure Blob Storage, and Azure infrastructure fundamentals

Understanding vector search platforms and retrieval optimization (Azure AI Search, Weaviate, Qdrant, Pinecone)

Strong IaC background (Terraform or ARM templates) plus containerization and distributed systems knowledge

Solid SDLC practices - testing strategies, CI/CD, code reviews, observability, and operational excellence

Upper-intermediate English for client communication

Experience leading complex technical projects with multiple stakeholders

Strong communication skills for technical and executive audiences

Ability to context-switch and adapt to client environments

Willingness to travel to client sites

Preferred

Direct hands-on experience with DSPy framework and GEPA optimizer

Understanding systematic optimization principles: evolutionary algorithms, Bayesian optimization, multi-objective optimization, and Pareto efficiency concepts

Familiarity with prompt optimization frameworks and methods - experience with any of: MIPROv2, TextGrad, EvoPrompt, AutoPrompt, or reinforcement learning approaches (GRPO, PPO)

Experience with LLM-as-judge patterns and automated evaluation pipelines

Knowledge of advanced RAG patterns - Adaptive RAG, Self-RAG, Corrective RAG - and retrieval evaluation methods (MRR, NDCG, precision@k)

Understanding of agentic AI patterns - ReAct, Chain-of-Thought, Tool Use - and their application in RAG systems

Experience building evaluation dashboards with Azure Monitor, Application Insights, or similar observability tools

Familiarity with MLOps practices - model versioning, experiment tracking, metric logging for evaluation systems

Experience with AWS or GCP AI/ML platforms (Bedrock, SageMaker, Vertex AI) and cross-cloud architecture patterns

Experience with product catalog systems, cross-reference matching, or e-commerce search optimization

Background in manufacturing, industrial equipment, or technical specification systems

Prior consulting or professional services experience with enterprise clients

Company

Robots & Pencils

Robots & Pencils develops digital strategies and products that deliver exponential impact to our clients.

Founded in 2009

Calgary, Alberta, CAN

51-200 employees

https://robotsandpencils.com/

H1B Sponsorship

Robots & Pencils has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (1)

2022 (1)

2021 (1)

Funding

Current Stage

Growth Stage

Total Funding

unknown

Key Investors

Slack Fund

2022-04-26Series Unknown

Leadership Team

Leonard Pagon

Chairman & CEO

Nathan Carmon

Chief Operating Officer

Recent News

PR Newswire

Robots & Pencils Plans Seattle-area Expansion with Studio for Generative & Agentic AI

2025-12-02

PR Newswire

Robots & Pencils Brings Its Applied AI Engineering Expertise to AWS re:Invent 2025

2025-11-07

PR Newswire

Robots & Pencils Launches Rewired: The New AI Architecture of Higher Education

2025-10-21

Company data provided by crunchbase