Apple · 2 weeks ago
Senior Software Engineer, Model Inference
Apple is seeking a Senior Software Engineer to join their Maps team, focusing on building advanced deep learning and large language models for high-performance inference services. This role involves designing and implementing large-scale systems, collaborating with research and product teams, and ensuring the reliability and efficiency of model deployment in production environments.
AppsArtificial Intelligence (AI)BroadcastingDigital EntertainmentFoundational AIMedia and EntertainmentMobile DevicesOperating SystemsTVWearables
Responsibilities
Own the technical architecture of large-scale ML inference platforms, defining long-term design direction for serving deep learning and large language models across Apple Maps
Lead system-level optimization efforts across the inference stack, balancing latency, throughput, accuracy, and cost through advanced techniques such as quantization, kernel fusion, speculative decoding, and efficient runtime scheduling
Design and evolve control-plane services responsible for model lifecycle management, including deployment orchestration, versioning, traffic routing, rollout strategies, capacity planning, and failure handling in production environments
Drive adoption of platform abstractions and standards that enable partner teams to onboard, deploy, and operate models reliably and efficiently at scale
Partner closely with research, product, and infrastructure teams to translate model requirements into production-ready systems, providing technical guidance and feedback to influence upstream model design
Optimize inference execution across heterogeneous compute environments, including GPUs and specialized accelerators, collaborating with runtime, compiler, and kernel teams to maximize hardware utilization
Establish robust observability and performance diagnostics, defining metrics, dashboards, and profiling workflows to proactively identify bottlenecks and guide optimization decisions
Provide technical leadership and mentorship, reviewing designs, setting engineering best practices, and raising the quality bar across teams contributing to the inference ecosystem
Continuously evaluate emerging research and industry trends in LLM inference, distributed systems, and ML infrastructure, driving the transition of high-impact ideas into production systems
Qualification
Required
Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience)
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference
Proficiency in Python, Java or C++
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks
Skilled in cloud technologies like Kubernetes, Ingress, HAProxy for scalable deployment
Preferred
Master's or PhD in Computer Science, Machine Learning, or a related field
Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models
Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed
Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference
Benefits
Comprehensive medical and dental coverage
Retirement benefits
A range of discounted products and free services
Reimbursement for certain educational expenses — including tuition
Discretionary bonuses or commission payments
Relocation
Company
Apple
Apple is a technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.
H1B Sponsorship
Apple has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (6998)
2024 (3766)
2023 (3939)
2022 (4822)
2021 (4060)
2020 (3656)
Funding
Current Stage
Public CompanyTotal Funding
$5.67BKey Investors
Berkshire HathawayMicrosoftSequoia Capital
2025-05-05Post Ipo Debt· $4.5B
2025-01-16Post Ipo Debt· $0.31M
2021-04-30Post Ipo Equity
Leadership Team
Tim Cook
CEO
Craig Federighi
SVP, Software Engineering
Recent News
Venrock
2025-12-01
2025-09-25
Mac Daily News
2025-09-25
Company data provided by crunchbase