Devops Engineer with LLM, GPU jobs in United States
cer-icon
Apply on Employer Site
company-logo

HCLTech · 3 months ago

Devops Engineer with LLM, GPU

HCLTech is seeking a Devops Engineer with expertise in Large Language Models (LLMs) and GPU technologies. The role involves developing and maintaining an inference platform for serving large language models, working on AI and cloud engineering projects throughout the product development lifecycle, and contributing to open source inference engines.

Information and Communications Technology (ICT)IT ManagementOutsourcingSoftwareTelecommunications
badNo H1Bnote

Responsibilities

Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on
Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations
Build tooling and observability to monitor system health, and build auto tuning capabilities
Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts
Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures
Contribute to open source inference engines to make them perform better on DigitalOcean cloud

Qualification

Cloud environmentsLarge Language ModelsGPU experienceContainerization KubernetesContainerization DockerCI/CD pipelinesInference enginesDistributed inference frameworksBenchmarking toolsPerformance metricsDistributed inference optimizationCommunication skills

Required

Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
Experience working with Large Language Models (LLMs), particularly hosting them to run inference
Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation

Preferred

Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations
Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max
Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc
Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc
Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc

Company

HCLTech is a global IT company offering digital, engineering, and cloud solutions partnering with businesses for transformation.

Funding

Current Stage
Public Company
Total Funding
$220M
Key Investors
ChrysCapital
2008-07-10Post Ipo Equity· $220M
2000-01-06IPO

Leadership Team

leader-logo
Vijayakumar C.
Chief Executive Officer
linkedin
leader-logo
Alan Flower
Executive Vice President - CTO & Global Head, AI & Cloud Native Labs
linkedin
Company data provided by crunchbase