Apple · 3 weeks ago
Site Reliability Engineer - AiDP Production Engineering
Apple is a leading technology company that focuses on innovation and efficiency. The Site Reliability Engineer within the AiDP Production Engineering team is responsible for configuring and ensuring the resilience of complex systems while managing critical data pipelines across various platforms. This role involves hands-on technical execution and strategic architectural design to enhance the overall performance and stability of Apple's data infrastructure.
AppsArtificial Intelligence (AI)BroadcastingDigital EntertainmentFoundational AIMedia and EntertainmentMobile DevicesOperating SystemsTVWearables
Responsibilities
Ability to understand the application requirements (Performance, Security, Scalability etc.) and assess the right services/topology on AWS, Baremetal & Kubernetes
Build automation to enable self-healing systems
Build tools to monitor high performance & alert the low latency applications
Ability to troubleshoot application specific, core network, system & performance issues
Involvement in challenging and fast paced projects supporting Apple’s business by delivering innovative solutions
Partner with engineering teams to prioritize and fix production defects
Take knowledge transition from engineering teams for changes being rolled out in production
Triage incidents based on the impact, devise and implement mitigation steps to unblock the business
Conduct RCA, log defects and partner with engineering team for prioritization
Support java based applications & Spark/Flink jobs on Baremetal, AWS & Kubernetes
Share on-call rotation with other team members to support apps and services in scope
Qualification
Required
4+ years experience in cloud-native services, including ETL frameworks like Apache Spark, and Flink
4+ years experience in messaging systems (Kafka) and cloud infrastructure & services, AWS, GCP, Kubernetes
4+ years of experience in modern & distributed databases such as Snowflake, Cassandra, SingleStore, and SAP HANA
4+ years of programming experience in Python, Java
BS/MS in computer science or equivalent experience
Preferred
Solid understanding of system design, data structures, and incident management best practices
Should be able to understand complex architectures and be comfortable working with multiple teams
Observability tools (e.g: Prometheus, Grafana, CloudWatch)
Ability to conduct performance analysis and troubleshoot large scale distributed systems
Should be highly proactive with a keen focus on improving uptime/availability of our mission critical services
Strong expertise in troubleshooting complex production issues
Excellent problem solving, critical thinking, and communication skills
Proven ability to resolve incidents, perform root cause analysis, and drive system reliability improvements
Experience using GenAI or automation tools for issue detection, alerting, or remediation
Experience in data visualization tools such as Tableau, Business Objects, ThoughtSpot
Company
Apple
Apple is a technology company that designs, manufactures, and markets consumer electronics, personal computers, and software.
H1B Sponsorship
Apple has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (6998)
2024 (3766)
2023 (3939)
2022 (4822)
2021 (4060)
2020 (3656)
Funding
Current Stage
Public CompanyTotal Funding
$5.67BKey Investors
Berkshire HathawayMicrosoftSequoia Capital
2025-05-05Post Ipo Debt· $4.5B
2025-01-16Post Ipo Debt· $0.31M
2021-04-30Post Ipo Equity
Leadership Team
Tim Cook
CEO
Craig Federighi
SVP, Software Engineering
Recent News
Venrock
2025-12-01
2025-09-25
Mac Daily News
2025-09-25
Company data provided by crunchbase