AI & HPC Infrastructure Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Accenture · 6 days ago

AI & HPC Infrastructure Engineer

Accenture is a leading global professional services company, and they are seeking an AI & HPC Infrastructure Engineer. This role involves designing and implementing infrastructure solutions for AI and HPC, optimizing performance, and integrating these platforms with existing IT systems.

Business Information SystemsConstructionConsultingInformation ServicesInformation TechnologyInfrastructureManagement ConsultingOutsourcing
badNo H1Bnote

Responsibilities

Design and implement HPC and AI infrastructure solutions, aligning system architecture and deployment roadmaps to industry-specific performance and scalability needs
Deploy, configure, and manage XPU-based clusters (CPU/GPU/accelerators) using schedulers, VM/K8s orchestration platforms, Slurm, and containerized platforms in scalable designs to provide Metal as a Service (MaaS), GPUaaS, AIaaS, and other offerings
Optimize cluster performance, scalability, energy, and cost efficiency across on-premises, cloud, and hybrid environments
Integrate AI and HPC platforms with existing IT systems, data pipelines, and security frameworks
Monitor, troubleshoot, and tune infrastructure to ensure high availability, low-latency networking, and workload resiliency
Develop and maintain documentation including architecture diagrams, configuration baselines, and operational runbooks
Provide technical guidance and support to users, enabling efficient execution of HPC/AI workloads, large-scale models, and simulations

Qualification

HPC infrastructure designAI infrastructure solutionsXPU-based clusters managementCloud platforms expertiseAccelerated computing architecturesCluster managementOrchestrationMLOps frameworks implementationNetworkingStorage platformsTechnical guidanceDocumentation maintenanceSoft skills

Required

Minimum 4+ year of hands-on experience designing, deploying, and managing HPC and AI infrastructure across on-premises, cloud, and hybrid environments in 2 or more segments: hyperscaler, neocloud, large Enterprise, Telco/Mobile, supporting key industries such as Financial Services, Life Sciences, Manufacturing, and Retail
Minimum 4+ years' experience of accelerated computing architectures (GPUs, XPUs, DPUs), high-performance fabrics (InfiniBand, Ethernet), SONiC, networking, and modern storage/data platforms (e.g. NVMe-oF, Lustre, GPFS, BeeGFS, VAST, DDN, Weka) to build robust solutions
Minimum 4+ year experience with cluster management and orchestration (e.g. Slurm, Run:ai, Kubernetes, Docker), real-time performance monitoring, and observability frameworks
Minimum 4+ years' experience with cloud and virtualization platforms (e.g. AWS, Azure, GCP, VMware, Nutanix) and expertise in automation and optimization using scripting (Python, AI tools) with foundational Infrastructure-as-Code tools such as Terraform and Ansible
Minimum 4+ year experience implementing MLOps and DevSecOps frameworks to enable secure, automated, and reproducible workflows
Bachelor's degree or equivalent (minimum 12 years) work experience. (If Associate's Degree, must have minimum 6 years work experience)

Preferred

Experience managing the deployment of 1,000+ GPU clusters for HPC and AI workloads with various infrastructure services enabled
Experience with GPU computing libraries and accelerators (e.g., NVIDIA CUDA, Dynamo, AMD ROCm)
Experience with AI and HPC Networking (e.g., RoCE, InfiniBand, muti-planar/multi-rail designs, platform buffer architectures)
Knowledge of Machine Learning and AI frameworks (e.g., TensorFlow, PyTorch, JAX), Jupyter notebooks / Google Colab environments
Experience with HPC & AI workload management and optimization techniques
Familiarity with DevOps practices and tools (e.g., Ansible, Terraform) for infrastructure automation
Industry certifications in NVIDIA infrastructure, public cloud providers, Data Science, etc. are a plus

Company

Accenture

company-logo
Accenture is a professional services company that provides solutions in strategy, consulting, digital, technology and operations.

Funding

Current Stage
Public Company
Total Funding
$6M
Key Investors
Youth Business International
2018-10-01Grant· $6M
2001-07-27IPO

Leadership Team

leader-logo
Jeff Laue
CEO and Senior Managing Director of Accenture Digital Inside Sales
linkedin
leader-logo
Aditya Tandon
Intelligent Automation Canada Lead
linkedin
Company data provided by crunchbase