Prove AI · 8 hours ago
Principal Engineer - Applied AI
Prove AI is a company focused on machine learning systems and AI infrastructure. They are seeking a Principal Engineer to drive the technical strategy and architecture of their AI/ML systems and lead engineering teams in the deployment and improvement of these systems.
Responsibilities
Define and own the architecture for scalable AI/ML systems, including training, fine-tuning, inference, evaluation, and monitoring pipelines
Translate ambiguous business and product requirements into robust AI/ML system designs and staged delivery plans
Make strategic decisions on model selection, LLM integrations, evaluation frameworks, model gateways, guardrails, and safety mechanisms
Lead design reviews, architecture forums, and technical decision-making across teams
Build and deploy production-grade AI/ML/LLM models, transformers, and generative AI features—from initial concept through production rollout
Establish standards for model readiness, evaluation gates, rollout/rollback, drift detection, observability, and ongoing performance management
Partner with engineering teams to integrate models into distributed systems with clear SLOs, telemetry, and error-budget mechanisms
Design and improve data pipelines, feature stores, and data quality/lineage workflows supporting model training and inference
Develop scalable AI/MLOps/AIOps practices for automation of training, testing, deployment, and monitoring
Evaluate and implement AI/ML workflow orchestration platforms (e.g., AI/MLflow, Kubeflow, Vertex AI) and CI/CD for AI/ML
Own evaluation pipelines—latency, accuracy, cost, hallucination metrics, prompt versioning, and model performance insights
Instrument tracing and model observability using best-practice frameworks and telemetry standards
Implement guardrails and safety systems to ensure consistent, controlled behaviour of LLM-powered features
Partner closely with product, engineering, and leadership to shape platform strategy and AI feature roadmap
Provide trade-off analyses that incorporate model performance, security, compliance, scalability, and long-term maintainability
Write clear technical documents, proposals, and mechanism-based recommendations to guide executive decision-making
Mentor senior/junior engineers in AI/ML best practices, distributed systems, experimentation, and model governance
Support hiring, leveling, performance feedback, and the growth of a high-calibre engineering team
Qualification
Required
10+ years of engineering experience with significant recent hands-on AI/ML/AI development
Bachelor's degree in CS or related field
Deep technical expertise in machine learning, LLMs, transformers, and modern AI frameworks (PyTorch, TensorFlow, JAX, Scikit-learn)
Proven experience deploying production AI/ML or LLM systems at scale (not prototypes)
Strong programming expertise in Python; additional experience in Java, C++, or JavaScript is a plus
Experience with data engineering workflows, feature stores, and scalable data pipelines
Expertise with cloud platforms (AWS/GCP/Azure), containerization, orchestration (Kubernetes), and distributed systems
Hands-on AI/MLOps: model deployment, monitoring, CI/CD for AI/ML, experiment tracking, and evaluation frameworks
Demonstrated technical leadership managing teams of 10+ engineers and influencing cross-functional architectures
Strong ability to translate ambiguous business needs into clear technical requirements and production outcomes
Expertise with LLM productionization including finetuning, retrieval-augmented generation (RAG), safety/guardrails, and evaluation
Experience with AI/ML flow, Kubeflow, Vertex AI, SageMaker, or similar platforms
Background in model governance, drift detection, fairness/bias evaluation, and compliance
Domain specialization (NLP, computer vision, recommender systems, or agentic systems)
Preferred
Master's or PhD in Computer Science, Machine Learning, or related discipline
Cloud platform expertise (AWS, GCP, Azure) with experience deploying AI/ML workloads at scale
Strong product mindset with ability to translate business requirements into technical solutions
Contributions to AIops/MLOps platforms (MLflow, Kubeflow, Vertex AI) and CI/CD for ML workflows
Domain expertise in specific AI application areas such as computer vision, NLP, or recommendation systems
Experience with model monitoring, drift detection, and model governance in production environments
Previous experience with AI observability and troubleshooting
Benefits
Fully remote, work from home environment
Employee Share Option Plan
Flexible working hours
Paid Time-Off
Periodic in-person offsites globally (travel permitting)
Continued education support
Advancement opportunity
Company
Prove AI
AI debugging & remediation
Funding
Current Stage
Early StageTotal Funding
$48.6MKey Investors
Evangelion CapitalDraper Goren HolmAcuitas Group Holdings
2022-01-01Series Unknown
2021-05-15Series Unknown
2020-10-01Series Unknown· $0.1M
Recent News
2024-06-04
2024-06-04
Company data provided by crunchbase