Staff Software Engineer - AI Infrastructure jobs in United States
cer-icon
Apply on Employer Site
company-logo

XPENG · 3 weeks ago

Staff Software Engineer - AI Infrastructure

XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles. The role involves building and optimizing the DataLoader and Dataset Management System to support large-scale data processing and model training for AI applications.

AutomotiveAutonomous VehiclesElectric VehicleManufacturing
Hiring Manager
Yuqing Wang, SHRM-CP
linkedin

Responsibilities

Design, develop, and maintain high-performance Dataloader SDKs and Dataset Management Systems for multi-source, heterogeneous data (images, videos, point clouds, sensor streams, etc.)
Optimize multi-threaded/multi-process data pipelines for minimal I/O latency and preprocessing overhead, supporting large-scale model training and inference workloads
Contribute to AI infrastructure projects beyond data loading, including:
Distributed training and inference optimization
Custom operator development (CUDA kernels, TensorRT, ROCm) and hardware-specific acceleration for GPU/TPU
Model optimization techniques such as pruning, quantization, distillation, sparsification, and mixed-precision training
Collaborate with algorithm and platform teams to translate business needs into scalable, production-grade solutions
Continuously identify and address performance bottlenecks across the AI training and inference stack

Qualification

Machine Learning InfrastructurePythonLarge-scale data processingDistributed training optimizationCUDA kernel developmentRelational databasesNoSQL systemsCommunication skillsAdaptabilityCross-functional collaboration

Required

Master's degree in Computer Science, Software Engineering, or equivalent experience
5+ years of experience in large-scale data processing or ML infrastructure
Proficient in Python with solid software engineering fundamentals, clean coding practices, and strong debugging skills
Hands-on experience with relational databases and NoSQL systems, including metadata and cache management; prior experience with large-scale VectorDB is highly desirable
Experience in at least one of the following areas: Large-scale deep learning training or inference optimization focused on scalability and model acceleration (distributed training strategies, quantization, CUDA kernel development, and related optimizations)
Columnar storage formats (Parquet/ORC) and related ecosystems, including partitioning, compression, and vectorized I/O optimization
Linux file system and network I/O optimization for NFS, (high-performance) distributed file systems, and object storage
Large-scale data loading frameworks (PyTorch Dataloader, Hugging Face Datasets)
Strong communication skills and ability to work cross-functionally in fast-paced environments
Strong ability to learn quickly, adapt to new challenges, and proactively explore and adopt new technologies

Preferred

Familiarity with the autonomous driving industry and enthusiasm for its challenges
Experience with distributed computing frameworks such as Apache Ray
Experience in building and scaling ML infrastructure in cloud-native environments

Benefits

Bonus
Equity
Benefits

Company

XPeng is a leading Chinese Smart EV company that designs, develops, manufactures, and markets Smart EVs that appeal to the large and growing base of technology-savvy middle-class consumers.

Funding

Current Stage
Public Company
Total Funding
$7.8B
Key Investors
China CITIC BankVolkswagen GroupAgricultural Bank of China
2025-08-18Post Ipo Debt· $1.39B
2023-07-26Post Ipo Equity· $700M
2022-04-27Post Ipo Debt· $1.14B

Leadership Team

leader-logo
Heng Xia
Co-Founder/ Presient
linkedin
leader-logo
Xiaopeng He
Chairman, Co-founder
linkedin
Company data provided by crunchbase