XPENG · 3 weeks ago
GPGPU Software Architect/ Principal Engineer
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles. The role involves developing a comprehensive software stack compatible with CUDA and overseeing the architecture and design for GPU technologies.
Responsibilities
Develop and refine a comprehensive 3-year roadmap for a software stack compatible with CUDA, encompassing Runtime, Driver, Compiler, Profiler, Debugger, and AI acceleration libraries
Define binding specifications that link our upcoming GPU ISA to CUDA APIs, ensuring forward compatibility with CUDA 12.x features
Evaluate and integrate the latest technological advancements: CUDA Graph, Transformer Engine, virtual memory management, CUDA dynamic CUTLASS 3.x, TMA, Blackwell FP4, among others
Create a modular, layered Runtime architecture: CUDA → HAL → Kernel → Hardware, applicable across emulators, and actual silicon
Define the task launch protocol, including Queue, Stream, Event, and Graph, as well as the memory model
Design a dual-mode (JIT & offline) compiler supporting LTO, PGO, Auto-Tuning, and efficient PTX→ISA microcode caching
Develop GPU virtualization schemes(MIG) that work across processes and containers
Implement an end-to-end performance model: Python API → CUDA Runtime → Driver → ISA → Micro-architecture → Board-level interconnect
Build an observability platform: Nsys-compatible traces, real-time Metric-QPS dashboards, and an AI Advisor for identifying bottlenecks automatically
Manage internal AI benchmarks as the single source of truth. Benchmark includes MLPerf Inference, Stable Diffusion XL, and 70B LLM
Co-design ISA which compatible with CUDA Compute Capability 12.x with our hardware architecture team
Collaborate with AI framework teams (PyTorch, TensorFlow, JAX, ONNX Runtime) to build fully reusable kernel libraries
Partner with Cloud and K8s teams to co-develop Device Plugins, GPU Operators, and RDMA Network Policies
Qualification
Required
10 years + in systems software, with at least 5 years in designing CUDA Compute stacks
Led end-to-end development of a GPU Runtime or AI acceleration library generation
Comprehensive mastery of PTX/SASS, CUDA Driver API, and cuBLAS/cuDNN internals; experience with LLVM NVPTX backend
Profound understanding of GPU micro-architecture, including SM architecture, Warp Scheduler, Shared-Memory conflicts, and Tensor Core pipelines
Proficiency with PCIe/CXL/RDMA topologies, NUMA settings, and GPU Direct RDMA/Storage
Benefits
Bonus
Equity
Benefits
Company
XPENG
XPeng is a leading Chinese Smart EV company that designs, develops, manufactures, and markets Smart EVs that appeal to the large and growing base of technology-savvy middle-class consumers.
Funding
Current Stage
Public CompanyTotal Funding
$7.8BKey Investors
China CITIC BankVolkswagen GroupAgricultural Bank of China
2025-08-18Post Ipo Debt· $1.39B
2023-07-26Post Ipo Equity· $700M
2022-04-27Post Ipo Debt· $1.14B
Recent News
2025-12-17
Company data provided by crunchbase