HCLTech · 3 months ago
Devops Engineer with LLM, GPU
HCLTech is seeking a Devops Engineer with expertise in Large Language Models (LLMs) and GPU technologies. The role involves developing and maintaining an inference platform for serving large language models, working on AI and cloud engineering projects throughout the product development lifecycle, and contributing to open source inference engines.
Information and Communications Technology (ICT)IT ManagementOutsourcingSoftwareTelecommunications
Responsibilities
Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on
Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations
Build tooling and observability to monitor system health, and build auto tuning capabilities
Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts
Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures
Contribute to open source inference engines to make them perform better on DigitalOcean cloud
Qualification
Required
Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)
Experience working with Large Language Models (LLMs), particularly hosting them to run inference
Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation
Preferred
Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations
Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT
Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max
Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc
Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc
Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc
Company
HCLTech
HCLTech is a global IT company offering digital, engineering, and cloud solutions partnering with businesses for transformation.
Funding
Current Stage
Public CompanyTotal Funding
$220MKey Investors
ChrysCapital
2008-07-10Post Ipo Equity· $220M
2000-01-06IPO
Leadership Team
Recent News
Hindu Business Line
2026-01-20
Business Standard India
2026-01-20
Company data provided by crunchbase