AI System Solution Architect jobs in United States
cer-icon
Apply on Employer Site
company-logo

Cango Inc. · 3 weeks ago

AI System Solution Architect

Cango Inc. is a company focused on innovative AI solutions, and they are seeking an AI System Solution Architect to design and optimize technical architecture for AI inference on GPU clusters. The role involves leading performance engineering efforts, engaging with clients, and guiding the engineering team in implementing advanced AI solutions.

Automotive
Hiring Manager
Logan Long
linkedin

Responsibilities

Design end-to-end technical architecture for LLM and Diffusion model inference on large-scale GPU clusters
Develop innovative solutions in KV Cache management, distributed scheduling, pipelining/batching strategies, memory allocation, and P2P/IB communication
Architect a multi-tenant serving framework that balances throughput, latency, and cost
Define product positioning and differentiation based on industry trends and company strategy
Develop technical evolution plans (e.g., token streaming like vLLM, syntax parsing like SGLang, Diffusion acceleration)
Align closely with internal GPU infrastructure and business teams to ensure timely product delivery
Lead performance engineering efforts including NCCL tuning, NUMA binding, CUDA kernel optimization
Drive cross-team collaboration (GPU kernel, compiler, distributed system, frontend APIs) to ensure system stability and scalability
Organize benchmarking and performance testing against industry leaders (vLLM, SGLang, TensorRT, etc.)
Guide engineering team on implementation strategies, experimental methodologies, and optimization pathways
Engage with open-source communities and contribute core components to enhance technical influence
Communicate directly with North America-based clients to understand their needs for AI inference, training, and deployment
Translate customer needs into internal implementation plans and coordinate across operations, engineering, and delivery teams

Qualification

GPU optimizationDeep learning systemsSystem architecturePyTorchCUDANCCLTritonTensorRTMPI/IB/RDMACross-functional communicationArchitectural thinkingOpen-source contributions

Required

5+ years of experience in computer infrastructure, GPU cloud, or large-scale cloud computing in the U.S., with a deep understanding of the North American tech ecosystem
Master's or Ph.D. in Computer Science, Electrical Engineering, or related fields preferred
5+ years of hands-on experience in deep learning systems or GPU optimization, including leading the design of at least one large-scale AI inference or training system
Proficiency with PyTorch, CUDA, NCCL, Triton, TensorRT, MPI/IB/RDMA, etc
Deep understanding of projects like vLLM, SGLang, DeepSpeed, FasterTransformer
Practical experience in LLM inference optimization (e.g., KV Cache, P2P vs CPU routing, batching strategies)
Ability to integrate system-level optimization with product usability (API and Serving layers)
Strong architectural thinking and cross-functional communication skills to translate complexity into clear product roadmaps

Preferred

Open-source contributions (e.g., to vLLM, DeepSpeed, Ray, Triton-Server, SGLang, etc.)
Experience launching GPU cloud or AI infrastructure products (e.g., RunPod, Lambda, Modal, SageMaker)
Familiarity with emerging LLM inference trends such as speculative decoding, continuous batching, and streaming inference

Benefits

Competitive compensation package with equity incentives.

Company

Cango Inc.

twittertwitter
company-logo
Cango Inc. (NYSE: CANG) primarily operates a leading Bitcoin mining business.

Funding

Current Stage
Public Company
Total Funding
$10.63M
Key Investors
Enduring Wealth CapitalSOSVGSMA Ecosystem Accelerator
2025-12-29Post Ipo Equity· $10.5M
2018-07-26IPO
2017-10-26Seed
Company data provided by crunchbase