Artificial Intelligence Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Harrison Clarke · 5 hours ago

Artificial Intelligence Engineer

Harrison Clarke is seeking an AI Infrastructure / Inference Engineer to build and scale systems that support real-time and batch AI workloads in production. The role involves deploying, optimizing, and operating inference services to ensure models are fast, reliable, and cost-efficient.

ConsultingDevOpsHuman ResourcesInformation TechnologyStaffing Agency
check
Growth Opportunities

Responsibilities

Design, build, and run production inference platforms (real-time, streaming, and batch)
Optimize end-to-end inference performance: latency, throughput, memory, and cost
Build and maintain model serving and deployment pipelines (CI/CD for models, canarying, rollbacks)
Improve system reliability with monitoring, alerting, autoscaling, and incident response
Partner with ML researchers/engineers to translate models into robust production services
Implement tooling for experimentation, A/B testing, and safe rollout of model changes

Qualification

PythonKubernetesDistributed systemsInference serving stackCloud platformsGPU systemsAPI designGo/C++MLOpsObservability expertise

Required

Strong software engineering skills in Python and/or Go/C++
Experience running production services on Kubernetes and cloud platforms (AWS/GCP/Azure)
Solid understanding of distributed systems, performance profiling, and reliability engineering
Hands-on experience with at least one inference/serving stack (e.g., Triton, TorchServe, Ray Serve, KServe, BentoML, or similar)
Familiarity with GPU systems and acceleration (e.g., CUDA basics, TensorRT, mixed precision, batching)
Ability to design clear APIs and service boundaries; comfort working with product and platform teams

Preferred

Experience with LLM inference (token streaming, KV cache, batching, quantization, speculative decoding)
Knowledge of ONNX, TensorRT-LLM, vLLM, DeepSpeed, Accelerate, or similar tooling
Observability expertise (e.g., Prometheus/Grafana, OpenTelemetry, distributed tracing)
Experience with capacity planning, GPU scheduling, and cost optimization
Background in MLOps / ML Platform engineering and model lifecycle tooling

Benefits

Competitive compensation
Benefits
Flexible working options

Company

Harrison Clarke

twittertwittertwitter
company-logo
Harrison Clarke is the Leading Staffing & Recruiting Firm in XOps & Cybersecurity.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Firas Sozan
Founder & CEO
linkedin
Company data provided by crunchbase