Apply on Employer Site

Harrison Clarke · 5 hours ago

Artificial Intelligence Engineer

San Francisco Bay Area

Full-time

Hybrid

Mid, Senior Level

$300K/yr - $450K/yr

Harrison Clarke is seeking an AI Infrastructure / Inference Engineer to build and scale systems that support real-time and batch AI workloads in production. The role involves deploying, optimizing, and operating inference services to ensure models are fast, reliable, and cost-efficient.

ConsultingDevOpsHuman ResourcesInformation TechnologyStaffing Agency

Growth Opportunities

Responsibilities

Design, build, and run production inference platforms (real-time, streaming, and batch)

Optimize end-to-end inference performance: latency, throughput, memory, and cost

Build and maintain model serving and deployment pipelines (CI/CD for models, canarying, rollbacks)

Improve system reliability with monitoring, alerting, autoscaling, and incident response

Partner with ML researchers/engineers to translate models into robust production services

Implement tooling for experimentation, A/B testing, and safe rollout of model changes

Qualification

PythonKubernetesDistributed systemsInference serving stackCloud platformsGPU systemsAPI designGo/C++MLOpsObservability expertise

Required

Strong software engineering skills in Python and/or Go/C++

Experience running production services on Kubernetes and cloud platforms (AWS/GCP/Azure)

Solid understanding of distributed systems, performance profiling, and reliability engineering

Hands-on experience with at least one inference/serving stack (e.g., Triton, TorchServe, Ray Serve, KServe, BentoML, or similar)

Familiarity with GPU systems and acceleration (e.g., CUDA basics, TensorRT, mixed precision, batching)

Ability to design clear APIs and service boundaries; comfort working with product and platform teams

Preferred

Experience with LLM inference (token streaming, KV cache, batching, quantization, speculative decoding)

Knowledge of ONNX, TensorRT-LLM, vLLM, DeepSpeed, Accelerate, or similar tooling

Observability expertise (e.g., Prometheus/Grafana, OpenTelemetry, distributed tracing)

Experience with capacity planning, GPU scheduling, and cost optimization

Background in MLOps / ML Platform engineering and model lifecycle tooling

Benefits

Competitive compensation

Benefits

Flexible working options

Company

Harrison Clarke

Harrison Clarke is the Leading Staffing & Recruiting Firm in XOps & Cybersecurity.

Founded in 2016

New York, New York, USA

11-50 employees

https://www.harrisonclarke.com/

Funding

Current Stage

Early Stage

Leadership Team

Firas Sozan

Founder & CEO

Company data provided by crunchbase