Ahold Delhaize USA · 10 hours ago
Senior Site Reliability Engineer
Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands including leading omnichannel grocery brands. The Senior Site Reliability Engineer is responsible for ensuring the scalability, reliability, and performance of production systems through automation, observability, and infrastructure engineering.
Responsibilities
Design and implement infrastructure solutions that ensure system availability, scalability, and reliability across cloud-native environments like AKS and Kubernetes
Develop automation for provisioning, deployment, configuration, monitoring, and incident remediation using tools such as Terraform, ArgoCD, and GitHub Actions
Collaborate with engineering teams to define and track service level objectives (SLOs) and service level indicators (SLIs)
Build and manage microservices-based platforms leveraging Spring Boot, Java, Tomcat, and Redis
Monitor production environments using Datadog and proactively address performance and reliability issues
Perform root cause analysis and lead post-incident reviews to drive continual improvement
Manage CI/CD pipelines and deployment automation using GitHub, Docker, and container orchestration technologies
Create and maintain infrastructure as code (IaC) using Terraform, with deployment pipelines integrated into GitOps workflows
Lead and support operational readiness reviews, game days, chaos engineering practices, and failure mode analysis
Build scalable observability and alerting frameworks with Datadog
Implement resilient, asynchronous architectures using Kafka for event-driven services
Reduce operational toil through self-healing automation and proactive system tuning
Troubleshoot Linux-based environments such as Ubuntu and optimize them for reliability
Provide on-call support and ensure 24/7/365 system reliability for mission-critical applications
Collaborate with the security team to enforce secure operational practices and cloud compliance
Mentor junior engineers and contribute to documentation, technical design, and knowledge-sharing across the organization
Qualification
Required
Bachelor's Degree in Computer Science, Information Systems, or a related technical field; equivalent training, certifications, or experience will be considered
5+ years of experience in a Site Reliability Engineering, or DevOps, or Java programming role
Experience managing production-grade systems and services on AKS/Kubernetes in distributed environments
Proficiency in programming and scripting languages including Python, Java, Bash, or Go
Proven experience with Spring Boot, Tomcat, Redis, and microservices architecture
Hands-on experience in managing Linux environments, particularly Ubuntu
Proficiency with observability stacks and performance monitoring using Datadog, Prometheus, and ELK
Deep understanding of containerization and orchestration using Docker, Kubernetes, and ArgoCD
Experience managing event-driven systems using Kafka
Expertise in IaC and automation using Terraform and GitHub Actions
Familiarity with networking concepts, DNS, load balancing, and cloud infrastructure (AWS, Azure, or GCP)
Strong analytical, debugging, and problem-solving skills
Excellent verbal and written communication skills and the ability to collaborate effectively across teams
Benefits
Diversity, Equity, Inclusion and Belonging
Total wellness, which encompasses a blend of physical, financial and emotional wellness
Collaboration, curiosity, and continuous learning
Company
Ahold Delhaize USA
Ahold Delhaize USA provides retail media solutions for brands to advertise to grocery shoppers across ADUSA brands using data-driven tools.
Funding
Current Stage
Late StageRecent News
2025-12-30
Access to Nutrition Initiative
2025-11-07
Company data provided by crunchbase