RIT Solutions, Inc. · 2 months ago
AI operations platform consultant
RIT Solutions, Inc. is seeking an AI Operations Platform Consultant to manage and optimize AI inference services. The role involves deploying and troubleshooting containerized services on Kubernetes and managing MLOps/LLMOps pipelines for production environments.
Staffing & Recruiting
Responsibilities
Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server
Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Setup and operation of AI inference service monitoring for performance and availability
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc
Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc
Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc
Managing scalable infrastructure for deploying and managing LLMs
Deploying models in production environments, including containerization, microservices, and API design
Triton Inference Server, including its architecture, configuration, and deployment
Model Optimization techniques using Triton with TRTLLM
Model optimization techniques, including pruning, quantization, and knowledge distillation
Qualification
Required
LLM and Kubernetes
Experience deploying, managing, operating, and troubleshooting containerized services at scale on Kubernetes for mission-critical applications (OpenShift)
Experience with deploying, configuring, and tuning LLMs using TensorRT-LLM and Triton Inference server
Managing MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Setup and operation of AI inference service monitoring for performance and availability
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc
Operation and support of MLOps/LLMOps pipelines, using TensorRT-LLM and Triton Inference server to deploy inference services in production
Experience deploying and troubleshooting LLM models on a containerized platform, monitoring, load balancing, etc
Experience with standard processes for operation of a mission critical system – incident management, change management, event management, etc
Managing scalable infrastructure for deploying and managing LLMs
Deploying models in production environments, including containerization, microservices, and API design
Triton Inference Server, including its architecture, configuration, and deployment
Model Optimization techniques using Triton with TRTLLM
Model optimization techniques, including pruning, quantization, and knowledge distillation
Company
RIT Solutions, Inc.
Jobdiva Job Portal: https://www1.jobdiva.com/candidates/myjobs/searchjobsdone.jsp?a=xbjdnwgjodtga1y1im2g881fkkeiwd0775lbvq8yqgps8vb2q36w2vj1ga6xxork&compid=-1 Recruitment (contingency search and campus selection).
H1B Sponsorship
RIT Solutions, Inc. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (1)
2023 (2)
Funding
Current Stage
Growth StageCompany data provided by crunchbase