Apply on Employer Site

ExpertsHub.ai · 4 hours ago

Senior Machine Learning Engineer

Charlotte, NC

Contract

Onsite

Senior Level

ExpertsHub.ai is seeking a Senior Machine Learning Engineer to manage and optimize LLM models and MLOps pipelines. The role focuses on hands-on troubleshooting and production support for AI systems, requiring extensive experience with containerized services and AI inference systems.

Computer Software

Responsibilities

Managing, Operations and Support MLOps/LLMOps pipelines

Troubleshooting LLM models

Model optimization

Production support engineer who focusses on LLM models/AI and uses TensorRT LLM and Triton Inference Server

Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)

Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server

Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production

Setup and operation of AI inference service monitoring for performance and availability

Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc

Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production

Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc

Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc

Managing scalable infrastructure for deploying and managing LLMs

Deploying models in production environments, including containerization, microservices, and API design

Triton Inference Server, including its architecture, configuration, and deployment

Model Optimization techniques using Triton with TRTLLM

Model optimization techniques, including pruning, quantization, and knowledge distillation

Qualification

MLOps/LLMOps pipelinesTensorRT-LLMTriton Inference ServerKubernetesModel optimizationTroubleshooting LLM modelsContainerizationMicroservicesAPI designIncident managementChange managementEvent managementPerformance monitoringLoad balancingTelemetryCustom dashboards

Required

Managing, Operations and Support MLOps/LLMOps pipelines

Troubleshooting LLM models

Model optimization

Production support engineer who focuses on LLM models/AI and uses TensorRT LLM and Triton Inference Server