Apply on Employer Site

84.51˚ · 7 hours ago

Lead AI/ML Engineer (P4368)

Chicago, IL

Full-time

Hybrid

Senior Level, Lead/Staff

$121K/yr - $201K/yr

5+ years exp

84.51° is a retail data science, insights and media company. The Lead AI/ML Engineer will create, deploy, and maintain computationally efficient proprietary models and infrastructure, focusing on model serving and operations within the foundation models team.

Management Consulting

H1B Sponsor Likely

Hiring Manager

Sarah Small-Sadler

Responsibilities

Lead large-scale foundation model projects that can span months, focusing on model serving, inference optimization, and production deployment

Foster a collaborative and innovative team environment, encouraging professional growth and development among junior team members in foundation model technologies

Leverage known patterns, frameworks, and tools for automating & deploying foundation model serving solutions using Triton, vLLM, and other inference engines

Develop new tools, processes and operational capabilities to monitor and analyze foundation model performance, latency, throughput, and resource utilization

Work with researchers and ML engineers to optimize and scale foundation model serving using best practices in distributed systems, GPU orchestration, and MLOps

Abstract foundation model serving solutions as robust APIs, microservices, or components that can be reused across the business with high availability and low latency

Build, steward, and maintain production-grade foundation model serving infrastructure (robust, reliable, maintainable, observable, scalable, performant) to manage and serve LLMs, SLMs, and embedding models at scale

Research state-of-the-art foundation model serving technologies, inference optimization techniques, and distributed GPU architectures to identify new opportunities for implementation across the enterprise

Design and implement distributed GPU clusters for model training and inference workloads across GCP and Azure cloud environments

Understand business requirements and trade-off latency, cost, throughput, and model accuracy to maximize value and translate research into production-ready serving solutions

Reduce time to deployment, automate foundation model CI/CD pipelines, implement continuous monitoring of model serving metrics, and establish feedback loops for model performance

Responsible for code reviews, infrastructure reviews, and production readiness assessments for foundation model deployments

Apply appropriate documentation, version control, infrastructure as code practices, and other internal communication practices across channels

Make time-sensitive decisions and solve urgent production issues in foundation model serving environments without escalation

Qualification

Foundation modelsModel servingDistributed systemsGPU cluster managementMLOps best practicesCloud platformsPyTorchCI/CD PipelinesKubernetes & DockerPythonMonitoring toolsAPI developmentTerraformDatabricksCommunication skills

Required

Bachelor's degree or higher in Machine Learning, Computer Science, Computer Engineering, Applied Statistics, or related field

5+ years of experience developing cloud-based software solutions with understanding of design for scalability, performance, and reliability in distributed systems

2+ years hands-on experience with foundation models (LLMs, SLMs, embedding models) in production environments; 2+ years of experience in model serving and inference optimization preferred

Deep knowledge of foundation model serving frameworks, particularly Triton Inference Server and vLLM

Working experience with PyTorch models and optimization for inference (quantization, pruning, ONNX, TensorRT)

Knowledge of distributed GPU computing, CUDA programming, and GPU memory optimization techniques

Hands-on experience with GCP and Azure cloud platforms, including GPU instances, managed services, and networking

Experience with Databricks for large-scale data processing and model training workflows

Knowledge of vector databases and embedding model serving

Strong experience with open-source LLM fine-tuning frameworks (LoRA, QLoRA, full fine-tuning)

Experience building large-scale model serving solutions that have been successfully delivered to production with enterprise SLAs

Excellent communication skills, particularly on technical topics related to distributed systems and model serving architectures

Kubernetes & Docker experience with focus on GPU workloads and model serving deployments

CI/CD Pipeline experience with focus on ML model deployment; GitHub Actions experience preferred

Terraform experience for infrastructure as code, particularly for GPU clusters and cloud ML infrastructure

Strong skills in Python, with experience in async programming and high-performance computing

API development experience with focus on high-throughput, low-latency model serving endpoints

Experience with monitoring and observability tools for distributed systems (Prometheus, Grafana, DataDog, etc.)

Knowledge of E2E Machine Learning pipeline and MLOps tools (model registry, experiment tracking, feature stores, model monitoring) in the context of foundation models

Preferred

Experience with distributed training frameworks such as DeepSpeed, FSDP, FairScale

Knowledge of model compression techniques and hardware acceleration

Experience with multi-cloud deployments and hybrid cloud architectures

Familiarity with emerging foundation model architectures and serving optimizations

Benefits

Health: Medical: with competitive plan designs and support for self-care, wellness and mental health. Dental: with in-network and out-of-network benefit. Vision: with in-network and out-of-network benefit.

Wealth: 401(k) with Roth option and matching contribution. Health Savings Account with matching contribution (requires participation in qualifying medical plan). AD&D and supplemental insurance options to help ensure additional protection for you.

Happiness: Paid time off with flexibility to meet your life needs, including 5 weeks of vacation time, 7 health and wellness days, 3 floating holidays, as well as 6 company-paid holidays per year. Paid leave for maternity, paternity and family care instances.

Company

84.51˚

Glassdoor3.7

84.51° helps companies create sustainable growth by putting the customer at the center of everything.

Founded in 2015

Cincinnati, Ohio, USA

1001-5000 employees

http://8451.com/

H1B Sponsorship

84.51˚ has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (18)

2024 (23)

2023 (29)

2022 (39)

2021 (26)

2020 (17)

Funding

Current Stage

Late Stage

Leadership Team

Mario DiMercurio

Director, Product - Platform Components

Company data provided by crunchbase