Apply on Employer Site

HCLTech · 3 months ago

Devops Engineer with LLM, GPU

Santa Clara, CA

Contract

Onsite

Mid Level

$70/hr - $80/hr

HCLTech is seeking a Devops Engineer with expertise in Large Language Models (LLMs) and GPU technologies. The role involves developing and maintaining an inference platform for serving large language models, working on AI and cloud engineering projects throughout the product development lifecycle, and contributing to open source inference engines.

Information and Communications Technology (ICT)IT ManagementOutsourcingSoftwareTelecommunications

No H1B

Responsibilities

Develop and maintain an inference platform for serving large language models optimized for the various GPU platforms they will be run on

Work on complex AI and cloud engineering projects through the entire product development lifecycle (PDLC) - ideation, product definition, experimentation, prototyping, development, testing, release, and operations

Build tooling and observability to monitor system health, and build auto tuning capabilities

Build benchmarking frameworks to test model serving performance to guide system and infrastructure tuning efforts

Build native cross platform inference support across NVIDIA and AMD GPUs for a variety of model architectures

Contribute to open source inference engines to make them perform better on DigitalOcean cloud

Qualification

Cloud environmentsLarge Language ModelsGPU experienceContainerization KubernetesContainerization DockerCI/CD pipelinesInference enginesDistributed inference frameworksBenchmarking toolsPerformance metricsDistributed inference optimizationCommunication skills

Required

Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CI/CD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.)

Experience working with Large Language Models (LLMs), particularly hosting them to run inference

Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation

Preferred

Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations

Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT

Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max

Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc

Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc

Knowledge of distributed inference optimization techniques - tensor/data parallelism, KV cache optimizations, smart routing etc

Company

HCLTech

Glassdoor3.6

HCLTech is a global IT company offering digital, engineering, and cloud solutions partnering with businesses for transformation.

Founded in 1976

Noida, Uttar Pradesh, IND

10001+ employees

https://www.hcltech.com/

Funding

Current Stage

Public Company

Total Funding

$220M

Key Investors

ChrysCapital

2008-07-10Post Ipo Equity· $220M

2000-01-06IPO

Leadership Team

Vijayakumar C.

Chief Executive Officer

Alan Flower

Executive Vice President - CTO & Global Head, AI & Cloud Native Labs

Recent News

GlobeNewswire

HCLTech and Carahsoft Partner to Accelerate Digital Transformation for the U.S. Public Sector

2026-01-20

Hindu Business Line

Europe boosts IT services growth while North America stays deal-rich

2026-01-20

Business Standard India

TCS, Infosys retain top positions in global IT brand value rankings

2026-01-20

Company data provided by crunchbase