Apply on Employer Site

Company.ai · 19 hours ago

Artificial Intelligence Research Engineer

San Francisco Bay Area

Full-time

Hybrid

Mid, Senior Level

Company.ai is building a network of category defining AI products in stealth-mode. The role involves designing and shipping agentic capabilities for vertical products, focusing on reliability, evaluation, and safety while optimizing for quick production and iteration.

Computer Software

H1B Sponsored

Responsibilities

Invent and iterate on agent methods: tool planning, long horizon execution, retrieval and memory, preference learning, workflow decomposition, verification, and self correction

Build fast experiment loops: dataset creation, training runs, ablations, analysis, and next hypotheses

Own evaluation: automatic benchmarks, adversarial tests, human in the loop grading, reliability scoring, regression tracking

Ship to production: partner with product engineers, instrument behavior, run A B tests, improve real user success rates

Harden the system: failure mode discovery, mitigation, monitoring, and safe defaults

Qualification

ML foundationModern LLMsTuningAlignmentReliability researchTool use expertiseRetrievalMemoryEvaluation ownershipData craftsmanshipProduction readinessSystems competenceProduct taste

Required

Strong ML foundation with a real track record: shipped systems, meaningful open source, or research output that moved something forward

Deep working knowledge of modern LLMs and transformers: tokenization, context management, KV cache behavior, decoding tradeoffs, scaling behavior

Hands on experience with tuning and alignment methods such as SFT, DPO, RLHF, reward modeling, RLAIF, preference data pipelines

Practical tool use expertise: function calling, schema design, structured outputs, validation, tool routing, retries, and tool error recovery

Reliability chops: hallucination reduction, calibration, self checking, verification, constraint driven generation, and safe fallbacks

Retrieval and memory experience: embeddings, RAG, reranking, chunking strategies, long context tradeoffs, memory that stays useful over time

Evaluation ownership: automated benchmarks, adversarial tests, human in the loop scoring, regression suites, and metrics that predict real user outcomes

Data craftsmanship: cleaning, deduplication, contamination checks, eval hygiene, synthetic data when appropriate, strong dataset discipline

Production readiness: instrumentation, A B testing, monitoring, on call quality, latency and throughput tradeoffs, cost aware iteration

Systems and infra competence: data pipelines, training jobs, experiment tracking, reproducibility, and debugging under real constraints

Strong product taste: you can simplify without losing power, and you care about the user experience as much as the model curve

Preferred

Experience with agent benchmarks, tool use, computer use, or multi step workflow execution

Experience with reliability research: verification, calibration, interpretability, safety testing

Experience with retrieval, memory, and personalization systems at scale

Which vertical you chose and why you think it can become inevitable at scale?

Benefits

Competitive salary and meaningful equity

Top tier tooling and compute for serious iteration

Relocation and immigration support when needed

Company

Company.ai

Company.ai is building a network of category defining AI products in stealth mode.

2-10 employees

https://company.ai

Funding

Current Stage

Early Stage

Company data provided by crunchbase