Apply on Employer Site

Prove AI · 8 hours ago

Principal Engineer - Applied AI

United States

Full-time

Remote

Lead/Staff

10+ years exp

Prove AI is a company focused on machine learning systems and AI infrastructure. They are seeking a Principal Engineer to drive the technical strategy and architecture of their AI/ML systems and lead engineering teams in the deployment and improvement of these systems.

BlockchainEnterprise SoftwareInformation Technology

Hiring Manager

Richard Morrow

Responsibilities

Define and own the architecture for scalable AI/ML systems, including training, fine-tuning, inference, evaluation, and monitoring pipelines

Translate ambiguous business and product requirements into robust AI/ML system designs and staged delivery plans

Make strategic decisions on model selection, LLM integrations, evaluation frameworks, model gateways, guardrails, and safety mechanisms

Lead design reviews, architecture forums, and technical decision-making across teams

Build and deploy production-grade AI/ML/LLM models, transformers, and generative AI features—from initial concept through production rollout

Establish standards for model readiness, evaluation gates, rollout/rollback, drift detection, observability, and ongoing performance management

Partner with engineering teams to integrate models into distributed systems with clear SLOs, telemetry, and error-budget mechanisms

Design and improve data pipelines, feature stores, and data quality/lineage workflows supporting model training and inference

Develop scalable AI/MLOps/AIOps practices for automation of training, testing, deployment, and monitoring

Evaluate and implement AI/ML workflow orchestration platforms (e.g., AI/MLflow, Kubeflow, Vertex AI) and CI/CD for AI/ML

Own evaluation pipelines—latency, accuracy, cost, hallucination metrics, prompt versioning, and model performance insights

Instrument tracing and model observability using best-practice frameworks and telemetry standards

Implement guardrails and safety systems to ensure consistent, controlled behaviour of LLM-powered features

Partner closely with product, engineering, and leadership to shape platform strategy and AI feature roadmap

Provide trade-off analyses that incorporate model performance, security, compliance, scalability, and long-term maintainability

Write clear technical documents, proposals, and mechanism-based recommendations to guide executive decision-making

Mentor senior/junior engineers in AI/ML best practices, distributed systems, experimentation, and model governance

Support hiring, leveling, performance feedback, and the growth of a high-calibre engineering team

Qualification

Machine LearningLarge Language ModelsAI InfrastructurePythonAI/MLOpsCloud PlatformsData EngineeringModel GovernanceTechnical LeadershipSoft Skills

Required

10+ years of engineering experience with significant recent hands-on AI/ML/AI development

Bachelor's degree in CS or related field

Deep technical expertise in machine learning, LLMs, transformers, and modern AI frameworks (PyTorch, TensorFlow, JAX, Scikit-learn)

Proven experience deploying production AI/ML or LLM systems at scale (not prototypes)

Strong programming expertise in Python; additional experience in Java, C++, or JavaScript is a plus

Experience with data engineering workflows, feature stores, and scalable data pipelines

Expertise with cloud platforms (AWS/GCP/Azure), containerization, orchestration (Kubernetes), and distributed systems

Hands-on AI/MLOps: model deployment, monitoring, CI/CD for AI/ML, experiment tracking, and evaluation frameworks

Demonstrated technical leadership managing teams of 10+ engineers and influencing cross-functional architectures

Strong ability to translate ambiguous business needs into clear technical requirements and production outcomes

Expertise with LLM productionization including finetuning, retrieval-augmented generation (RAG), safety/guardrails, and evaluation

Experience with AI/ML flow, Kubeflow, Vertex AI, SageMaker, or similar platforms

Background in model governance, drift detection, fairness/bias evaluation, and compliance

Domain specialization (NLP, computer vision, recommender systems, or agentic systems)

Preferred

Master's or PhD in Computer Science, Machine Learning, or related discipline

Cloud platform expertise (AWS, GCP, Azure) with experience deploying AI/ML workloads at scale

Strong product mindset with ability to translate business requirements into technical solutions

Contributions to AIops/MLOps platforms (MLflow, Kubeflow, Vertex AI) and CI/CD for ML workflows

Domain expertise in specific AI application areas such as computer vision, NLP, or recommendation systems

Experience with model monitoring, drift detection, and model governance in production environments

Previous experience with AI observability and troubleshooting

Benefits

Fully remote, work from home environment

Employee Share Option Plan

Flexible working hours

Paid Time-Off

Periodic in-person offsites globally (travel permitting)

Continued education support

Advancement opportunity

Company

Prove AI

AI debugging & remediation

Founded in 2018

Zug, Zug, CHE

11-50 employees

https://www.casperlabs.io

Funding

Current Stage

Early Stage

Total Funding

$48.6M

Key Investors

Evangelion CapitalDraper Goren HolmAcuitas Group Holdings

2022-01-01Series Unknown

2021-05-15Series Unknown

2020-10-01Series Unknown· $0.1M

Leadership Team

Mrinal Manohar

CEO

Greg Whalen

Chief Technology Officer

Recent News

IBM Newsroom

Casper Labs to Build a Blockchain-Powered Solution with IBM Consulting to Help Improve Transparency and ...

2024-06-04

Cryptonews

Casper Labs and IBM Reveal Prove AI Blockchain Solution for AI Governance

2024-06-04

Cointelegraph

Casper Labs, IBM launch Prove AI auditing solution on watsonx platform

2024-06-04

Company data provided by crunchbase