Jobs via Dice ยท 20 hours ago
Senior Site Reliability Engineer
Dice is the leading career destination for tech experts at every stage of their careers. Our client, VDart, Inc., is seeking a Senior Site Reliability Engineer to own the reliability and performance of API Gateway services while implementing best practices in SRE. The role involves managing Kubernetes clusters, developing CI/CD pipelines, and collaborating with security teams on compliance controls.
Computer Software
Responsibilities
Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
Drive capacity planning, performance tuning, and resilience testing
Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
Develop and maintain Helm charts, manifests, and deployment strategies
Implement rollout strategies such as blue-green, canary, and rolling deployments
Collaborate with development teams to ensure cloud-native design patterns
Build and maintain enterprise-grade observability (O11y) solutions:
Prometheus & Grafana for metrics and dashboards
Splunk for centralized logging and alerting
OpenTelemetry for distributed tracing
Define actionable alerts and dashboards for platform and application health
Improve MTTR through better visibility and automation
Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
Develop automation scripts using Python, Bash, or Groovy
Implement DevSecOps practices including secrets management, image scanning, and RBAC
Work closely with security teams on vulnerability remediation and compliance controls
Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
Stay current with emerging SRE, cloud, and AI gateway technologies
Qualification
Required
7+ Years of experience
Own reliability, availability, scalability, and performance of API Gateway services running on Kubernetes
Design and implement SRE best practices including SLIs, SLOs, SLAs, error budgets, and incident management
Lead production readiness reviews, root cause analysis (RCA), and post-incident improvements
Drive capacity planning, performance tuning, and resilience testing
Manage and optimize Kubernetes clusters (EKS / AKS / GKE / On-prem)
Develop and maintain Helm charts, manifests, and deployment strategies
Implement rollout strategies such as blue-green, canary, and rolling deployments
Collaborate with development teams to ensure cloud-native design patterns
Build and maintain enterprise-grade observability (O11y) solutions
Prometheus & Grafana for metrics and dashboards
Splunk for centralized logging and alerting
OpenTelemetry for distributed tracing
Define actionable alerts and dashboards for platform and application health
Improve MTTR through better visibility and automation
Design and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
Automate infrastructure using Infrastructure as Code (Terraform, CloudFormation, etc.)
Develop automation scripts using Python, Bash, or Groovy
Implement DevSecOps practices including secrets management, image scanning, and RBAC
Work closely with security teams on vulnerability remediation and compliance controls
Actively contribute to POCs for AI Gateway / Intelligent API Gateway initiatives
Evaluate and prototype integrations with AI/ML-driven routing, observability, and security features
Stay current with emerging SRE, cloud, and AI gateway technologies
Strong troubleshooting and problem-solving skills
Ability to work cross-functionally with developers, architects, and security teams
Proactive mindset with a passion for automation and reliability
Good documentation and communication skills
Key Skills: SRE, Devops, Java, Kubernetes, Observability
Company
Jobs via Dice
Welcome to Jobs via Dice, the go-to destination for discovering the tech jobs you want.
Funding
Current Stage
Early StageCompany data provided by crunchbase