Anomali · 1 month ago
Senior Engineer, AI Evaluation & Reliability (Agentic AI)
Anomali is a leading AI-powered security operations platform based in Silicon Valley. They are seeking a Senior Engineer to lead the evaluation and quality assurance of their agentic AI features, ensuring reliability and efficiency in real-world security operations workflows.
Responsibilities
Define quality metrics: Translate SOC use cases into measurable KPI's (e.g., precision/recall, MTTR, false-positive rate, step success, latency/cost budgets)
Build continuous evaluations: Develop offine/online evaluation pipelines, regression suites, and A/B or canary test; integrate them into CI/CD for release gating
Curate and manage datasets: Maintain gold-standard datasets and red-team scenarios; establish data governance and drift monitoring practices
Ensure safety, reliability, and explainability: Partner with Platform and Security Research to encode guardrails, policy enforcement, and runtime safety checks
Expand adversarial test coverage (prompt injection, data exfiltration, abuse scenarios)
Ensure explainability and auditability of agent decisions, maintaining traceability and compliance of AI-driven workflows
Production reliability & observability: Monitor and maintain reliability of agentic AI features post-release -- define and uphold SLIs/SLOs, establish alerting and rollback strategies, and conduct incident post-mortems
Design and implement infrastructure to scale evaluation and production pipelines for real-time SOC workflows across cloud environments
Drive agentic system engineering: Experiment with multi-agent systems, tool-using language models, retrieval-augmented workflows, and prompt orchestration
Manage model and prompt lifecycle -- track version, rollout strategies, and fallbacks; measure impact through statistically sound experiments
Collaborate cross-functionally: Work with Product, UX and Engineering to prioritize high-leverage improvements, resolve regressions quickly, and advance overall system reliability
Qualification
Required
5+ years building evaluation or testing infrastructure for ML/LLM systems or large-scale distributed systems
Proven ability to translate product requirements into measurable metrics and test plans
Strong Python skills (or similar language) and experience with modern data tooling
Hands-on experience running A/B tests, canaries, or experiment frameworks
Experience defining and maintaining operational reliability metrics (SLIs/SLOs) for AI-driven systems
Familiarity with large-scale distributed or streaming systems serving AI/agent workflows (millions of events or alerts/day)
Excellent communication skills -- able to clearly convey technical results and trade-offs to engineer, PMs, and analysts
This position is not eligible for employment visa sponsorship. The successful candidate must not now, or in the future, require visa sponsorship to work in the US
Preferred
Experience evaluating or deploying agentic or tool-using AI systems (multi-agent orchestration, retrieval-augmented reasoning, prompt lifecycle management)
Familiarity with LLM evaluation frameworks (e.g., model-graded evals, pairwise/rubric scoring, preference learning)
Exposure to AI safety testing, including prompt injection, data exfiltration, abuse taxonomies, and resilience validation
Understanding of explainability and compliance requirements for autonomous workflows, ensuring traceability and auditability of AI behavior
Background in security operations, incident response, or enterprise automation; comfortable interpreting logs, alerts, and playbooks
Startup experience delivering high-impact systems in fast-paced, evolving environments
Benefits
This position is eligible for benefits
May be eligible for equity
Company
Anomali
Anomali delivers the leading AI-Powered Security and IT Operations Platform.
Funding
Current Stage
Growth StageRecent News
2024-05-19
Company data provided by crunchbase