Lead Software Engineer II, AI Operations jobs in United States
cer-icon
Apply on Employer Site
company-logo

Best Egg · 6 days ago

Lead Software Engineer II, AI Operations

Best Egg is a market-leading, tech-enabled financial platform helping people build financial confidence through a variety of installment lending solutions and financial health tools. The Lead Software Engineer II for AI Operations will design, ship, and operate production-grade LLM applications and automations, focusing on optimizing performance and reducing costs across the business.

Financial ServicesFinTechLending
check
Comp. & Benefits
check
H1B Sponsor Likelynote

Responsibilities

Build and ship LLM apps & agents: Deliver internal copilots and customer/agent-facing automations with clear SLAs, rollbacks, and observability from day one
Own RAG pipelines: Design ingestion, chunking, embeddings, indexing, hybrid search/rerank, and retrieval evaluation; track retriever quality via offline golden sets and online metrics
AWS Infrastructure & Orchestration: Design and implement scalable AWS architectures, including AWS AI features such as Bedrock, IAM, knowledge bases, secure secrets and policy enforcement, automated provisioning, and resource-usage governance as core platform capabilities
Observability & SRE for AI: Add tracing, prompt/agent version lineage, eval dashboards, and regression alerts; establish golden datasets and canary tests
Guardrails & governance: Enforce PII redaction, safety filters, role-based access, audit logs, and human‑in‑the‑loop review paths to control quality and risk
CI/CD for AI artifacts: Version and deploy prompts, tools, agents, and retrieval pipelines; support blue/green and shadow deploys with automatic rollback triggers
Cost & performance: Cut run‑rate spend through caching, truncation, batching, autoscaling, and model routing; establish clear unit economics per workflow
Developer enablement: Provide templates, SDKs, and high‑quality abstractions that let product teams ship safely without bespoke plumbing; improve developer experience
Platform integration: Build primarily in Python and Metaflow (Outerbounds); deploy on AWS (Bedrock + core services) and OpenAI; use Cursor in daily workflows; help evaluate and, when appropriate, run on Databricks
Production posture: Participate in on‑call, author runbooks, and remove single‑thread risk for AI services; drive reliability and resilience akin to ML Ops

Qualification

AI/LLM applicationsPythonAWSMetaflowDevOpsRAG expertiseObservability toolsCollaborationCost optimizationKubernetesFastAPISnowflake familiarityMentorship

Required

5–10 years of professional software engineering (or equivalent) with 2+ years building AI/LLM applications; portfolio of shipped AI projects (links to code, demos, or case studies)
Demonstrated passion for relentless exploration of the latest AI models, frameworks, and tooling, ensuring constant adoption of state-of-the-art innovations in the workflow
Hands-on with some/all of OpenAI, Bedrock, Huggingface/Ollama/vLLM; MCP servers and function/tool calling, multi-turn orchestration, streaming, and prompt/version management
Practical experience designing and tuning retrieval systems (chunking, embeddings, hybrid search, reranking), integration with vector database, and measuring retrieval quality
Comfortable building APIs/services and simple UIs where needed; strong fundamentals in Python and modern packaging/testing
CI/CD, containers, cloud fundamentals (AWS), and runtime performance tuning; experience operating services in production
Metaflow (Outerbounds) preferred; Databricks familiarity is a plus; ability to integrate data/feature pipelines and schedule/operate flows
Tracing and logging, expertise in tools like Datadog, Dynatrace or Grafana where relevant for AI monitoring is essential
Comfortable optimizing latency/throughput/cost, and implementing guardrails for PII/safety/compliance
Partner effectively with data scientists, analysts, and engineers; promote best practices and high-leverage abstractions

Preferred

Fine-tuning or distillation experience
Kubernetes or FastAPI exposure
Familiarity with Snowflake or similar warehousing for retrieval sources

Benefits

Pre-tax and post-tax retirement savings plans with a competitive company matching program
Generous paid time-off plans including vacation, personal/sick time, paid short-term and long-term disability leaves, paid parental leave, and paid company holidays
Multiple health care plans to choose from, including dental and vision options
Flexible Spending Plans for Health Care, Dependent Care, and Health Reimbursement Accounts
Company-paid benefits such as life insurance, wellness platforms, employee assistance programs, and Health Advocate programs
Other great discounted benefits include identity theft protection, pet insurance, fitness center reimbursements, and many more!

Company

Best Egg

twittertwitter
company-logo
Best Egg is a consumer financial technology platform that aims to help people feel more confident about their everyday finances through a suite of products and resources.

H1B Sponsorship

Best Egg has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (2)

Funding

Current Stage
Late Stage
Total Funding
$2.09B
Key Investors
Healthcare of Ontario Pension Plan (HOOPP)Invus
2022-03-10Series E· $225M
2018-01-29Debt Financing· $495M
2017-11-08Debt Financing· $312M

Leadership Team

leader-logo
Alex Rhodes
Chief Operating Officer
linkedin
Company data provided by crunchbase