Atlassian · 7 hours ago
Senior Principal Machine Learning Engineer
Atlassian is seeking a Senior Principal Machine Learning Engineer to join their GenAI Platform organization, focusing on the quality and reliability of Rovo Chat. In this role, you will be the technical driver behind making Rovo Chat exceptionally accurate, trustworthy, observable, and reliable at scale.
CollaborationEnterprise SoftwareSaaSSoftware
Responsibilities
Define and evolve a north‑star quality and reliability framework for Rovo Chat, spanning: Answer correctness, faithfulness, and grounding, Safety and policy adherence, Latency, robustness, and uptime, Incident, Disturbed, and DoS impact
Translate these into measurable metrics, SLAs/SLOs, and dashboards that are adopted across product and platform teams
Design and lead implementation of end‑to‑end evaluation pipelines for Rovo Chat, including: Offline evals (benchmarks, synthetic data, golden sets, human‑in‑the‑loop labeling), Online evals (A/B tests, interleaving, guardrail metrics), LLM‑as‑a‑judge and other automated evaluation techniques
Drive observability and debuggability improvements (e.g., tracing, attribution, feature logging, and model behavior introspection) so engineers can quickly root‑cause regressions and incidents
Partner with SRE/TechOps to connect evaluation and observability signals into incident management, improving: % of incidents successfully root‑caused, Disturbed ticket and DoS resolution efficiency
Define and own technical roadmaps for GenAI platform features that directly impact Rovo Chat quality and reliability (e.g., retrieval quality, RAG orchestration, guardrails, safety filters, fallback strategies, model selection/routing)
Make high‑impact architecture decisions across: LLM and RAG architectures, Knowledge ingestion and retrieval, Evaluation & monitoring infra, Trust & Safety layers
Identify and prioritize cross‑pillar investments (e.g., shared eval frameworks, reusable prompt libraries, safety and policy enforcement) that raise the bar across Atlassian AI
Use data from incidents, Disturbed tickets, DoS escalations, and product telemetry to identify systemic quality and reliability gaps
Lead multi‑team initiatives to: Reduce production incidents and regressions, Improve “first‑try success” rate of answers, Decrease hallucinations and unsafe outputs, Improve CSAT/NPS and key adoption/retention metrics for Rovo Chat
Work closely with PMs and designers to ensure quality and reliability are visible, explainable, and trustworthy to customers
Mentor senior/principal ML engineers and ML systems engineers across GenAI Platform and Rovo Chat
Act as a technical thought partner to engineering and product leadership on GenAI quality and reliability strategy
Contribute to AI best practices across Atlassian via design reviews, internal talks, and cross‑org forums
Qualification
Required
10+ years of industry experience in machine learning / applied AI, including shipping production systems at scale
Deep hands‑on expertise with LLMs and/or large‑scale NLP systems, including at least one of: Retrieval‑augmented generation (RAG), Search & ranking / relevance, Conversational AI / assistants / agents, Evaluation and quality frameworks for LLM applications
Strong coding skills in Python (and/or Java) with the ability to write performant, production‑quality code, plus: Solid experience with Java/Kotlin and large‑scale data processing (e.g., Spark), Familiarity with cloud environments (e.g., AWS, Databricks) and modern ML tooling
Demonstrated experience designing and operating ML systems end‑to‑end, including: Data pipelines and feature generation, Training, evaluation, and deployment, Monitoring, incident response, and iterative improvement
A track record of technical leadership beyond a single team, such as: Driving cross‑team/platform initiatives, Making high‑impact architecture decisions, Influencing roadmaps and org‑level priorities
Ability to communicate complex ML concepts clearly to engineers, PMs, designers, and leadership, and to tell a compelling story with data
A strong product sense and bias for pragmatism and iteration (80/20 mindset: knowing when “good and measurable now” beats “perfect later”)
Preferred
Master's degree or PhD in Computer Science, Machine Learning, Statistics, or a related technical field
Experience with: LLM fine‑tuning, post‑training, and optimization (instruction tuning, preference optimization, safety tuning)
Model evaluation and guardrails (LLM‑as‑a‑judge, red‑teaming, safety frameworks)
High‑reliability systems in SaaS (SLOs, error budgets, incident command, post‑incident analysis)
Prior work on AI assistants or conversational experiences in a B2B SaaS or productivity setting
Experience partnering with SRE / incident management / support to reduce MTTR, improve root‑cause coverage, and lower ticket volume through better tooling and automation
Experience building observability and debuggability tools for ML or GenAI systems (e.g., tracing, experiment management, evaluation platforms)
Benefits
Health and wellbeing resources
Paid volunteer days
Company
Atlassian
Atlassian is a software company that offers proprietary software products for teamwork, project management, and software development.
H1B Sponsorship
Atlassian has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (351)
2024 (184)
2023 (190)
2022 (259)
2021 (156)
2020 (162)
Funding
Current Stage
Public CompanyTotal Funding
$210MKey Investors
T. Rowe PriceAccel
2015-12-10IPO
2014-04-08Secondary Market· $150M
2010-07-14Series A· $60M
Recent News
Sramana Mitra
2026-01-08
Company data provided by crunchbase