XPENG · 3 weeks ago
Staff Software Engineer - AI Infrastructure
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles. The role involves building and optimizing the DataLoader and Dataset Management System to support large-scale data processing and model training for AI applications.
Responsibilities
Design, develop, and maintain high-performance Dataloader SDKs and Dataset Management Systems for multi-source, heterogeneous data (images, videos, point clouds, sensor streams, etc.)
Optimize multi-threaded/multi-process data pipelines for minimal I/O latency and preprocessing overhead, supporting large-scale model training and inference workloads
Contribute to AI infrastructure projects beyond data loading, including:
Distributed training and inference optimization
Custom operator development (CUDA kernels, TensorRT, ROCm) and hardware-specific acceleration for GPU/TPU
Model optimization techniques such as pruning, quantization, distillation, sparsification, and mixed-precision training
Collaborate with algorithm and platform teams to translate business needs into scalable, production-grade solutions
Continuously identify and address performance bottlenecks across the AI training and inference stack
Qualification
Required
Master's degree in Computer Science, Software Engineering, or equivalent experience
5+ years of experience in large-scale data processing or ML infrastructure
Proficient in Python with solid software engineering fundamentals, clean coding practices, and strong debugging skills
Hands-on experience with relational databases and NoSQL systems, including metadata and cache management; prior experience with large-scale VectorDB is highly desirable
Experience in at least one of the following areas: Large-scale deep learning training or inference optimization focused on scalability and model acceleration (distributed training strategies, quantization, CUDA kernel development, and related optimizations)
Columnar storage formats (Parquet/ORC) and related ecosystems, including partitioning, compression, and vectorized I/O optimization
Linux file system and network I/O optimization for NFS, (high-performance) distributed file systems, and object storage
Large-scale data loading frameworks (PyTorch Dataloader, Hugging Face Datasets)
Strong communication skills and ability to work cross-functionally in fast-paced environments
Strong ability to learn quickly, adapt to new challenges, and proactively explore and adopt new technologies
Preferred
Familiarity with the autonomous driving industry and enthusiasm for its challenges
Experience with distributed computing frameworks such as Apache Ray
Experience in building and scaling ML infrastructure in cloud-native environments
Benefits
Bonus
Equity
Benefits
Company
XPENG
XPeng is a leading Chinese Smart EV company that designs, develops, manufactures, and markets Smart EVs that appeal to the large and growing base of technology-savvy middle-class consumers.
Funding
Current Stage
Public CompanyTotal Funding
$7.8BKey Investors
China CITIC BankVolkswagen GroupAgricultural Bank of China
2025-08-18Post Ipo Debt· $1.39B
2023-07-26Post Ipo Equity· $700M
2022-04-27Post Ipo Debt· $1.14B
Recent News
2025-12-17
Company data provided by crunchbase