Idexcel · 8 hours ago
Site Reliability Engineer
Idexcel is seeking a Site Reliability Engineer to enhance their operational efficiency through automation and implement CI/CD pipelines. The role involves capacity planning, incident management, and leveraging observability tools to ensure system reliability and performance.
Responsibilities
Implement CI/CD pipelines using tools such as GitHub Actions, AWS CodePipeline, and Jenkins
Automate infrastructure provisioning through Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK
Develop automation scripts and self-service tools to enhance operational efficiency
Implement operational cost optimization initiatives
Configure and maintain auto-scaling policies and thresholds
Develop Resiliency Test plans and support Performance testing
Proficient in ITIL framework and ITSM tools such as ServiceNow
Production on-call responder with strong troubleshooting capabilities
Develop RCA documentation, and Knowledge articles
Apply SRE principles, including SLIs, SLOs, and error budgets
Hands-on experience with tools such as Dynatrace, AppDynamics, ELK, and similar platforms
Implement distributed tracing with appropriate context propagation
Optimize monitoring queries, create dashboards, alerts, and anomaly detectors using Dynatrace and Kibana
Manage service accounts and access permissions
Create, deploy, and manage digital certificates
Respond to security incidents and execute remediation tasks effectively
Qualification
Required
Implement CI/CD pipelines using tools such as GitHub Actions, AWS CodePipeline, and Jenkins
Automate infrastructure provisioning through Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK
Develop automation scripts and self-service tools to enhance operational efficiency
Implement operational cost optimization initiatives
Configure and maintain auto-scaling policies and thresholds
Develop Resiliency Test plans and support Performance testing
Proficient in ITIL framework and ITSM tools such as ServiceNow
Production on-call responder with strong troubleshooting capabilities
Develop RCA documentation, and Knowledge articles
Apply SRE principles, including SLIs, SLOs, and error budgets
Hands-on experience with tools such as Dynatrace, AppDynamics, ELK, and similar platforms
Implement distributed tracing with appropriate context propagation
Optimize monitoring queries, create dashboards, alerts, and anomaly detectors using Dynatrace and Kibana
Manage service accounts and access permissions
Create, deploy, and manage digital certificates
Respond to security incidents and execute remediation tasks effectively
Bachelor's degree in Computer Science, Engineering, or related field
2 to 4 years of experience in DevOps, SRE, or infrastructure roles
Mid-level proficiency in Python or other scripting languages
Mid-level proficiency in Configuration management tool including Ansible
Practical experience with cloud platforms – AWS and Azure
Knowledge of containerization (Docker, Kubernetes/ ECS)
Knowledge of Linux systems and networking
Knowledge of relational, cloud, and NoSQL databases
Excellent written and verbal communication skills
Demonstrated ability to work independently and manage priorities
Availability to work outside of standard business hours as required
Preferred
Dynatrace is preferred
Company
Idexcel
Idexcel is a Professional Services and Technology Solutions provider specializing in Cloud Services, Cloud Native Services, Data Platforms and Intelligence, Automation & AI.
H1B Sponsorship
Idexcel has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (99)
2024 (116)
2023 (126)
2022 (198)
2021 (201)
2020 (282)
Funding
Current Stage
Late StageRecent News
Company data provided by crunchbase