Idexcel · 20 hours ago
SRE Observability Lead
Idexcel is seeking an SRE Observability Lead to enhance operational efficiency through automation and observability practices. The role involves implementing CI/CD pipelines, managing incident response, and ensuring security and compliance in a hybrid work environment.
Responsibilities
Implement CI/CD pipelines using tools such as GitHub Actions, AWS CodePipeline, and Jenkins
Automate infrastructure provisioning through Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK
Develop automation scripts and self-service tools to enhance operational efficiency
Standardized installation through automation
Integrating with CI/CD pipelines
Enforcing Tagging and Metadata standards
Use of environment-aware configuration
Implement distributed tracing with appropriate context propagation
Optimize alerts, create dashboards, alerts, and anomaly detectors
Proficient in ITIL framework and ITSM tools such as ServiceNow
Production on-call responder with strong troubleshooting capabilities
Develop RCA documentation, and Knowledge articles
Apply SRE principles, including SLIs, SLOs, and error budgets
Implement operational cost optimization initiatives
Configure and maintain auto-scaling policies and thresholds
Develop Resiliency Test plans and support Performance testing
Manage service accounts and access permissions
Create, deploy, and manage digital certificates
Respond to security incidents and execute remediation tasks effectively
Qualification
Required
Implement CI/CD pipelines using tools such as GitHub Actions, AWS CodePipeline, and Jenkins
Automate infrastructure provisioning through Infrastructure-as-Code (IaC) using Terraform, CloudFormation, or AWS CDK
Develop automation scripts and self-service tools to enhance operational efficiency
Demonstrated expertise in standardized installation through automation
Integrating with CI/CD pipelines
Enforcing Tagging and Metadata standards
Use of environment-aware configuration
Implement distributed tracing with appropriate context propagation
Optimize alerts, create dashboards, alerts, and anomaly detectors
Proficient in ITIL framework and ITSM tools such as ServiceNow
Production on-call responder with strong troubleshooting capabilities
Develop RCA documentation, and Knowledge articles
Apply SRE principles, including SLIs, SLOs, and error budgets
Implement operational cost optimization initiatives
Configure and maintain auto-scaling policies and thresholds
Develop Resiliency Test plans and support Performance testing
Manage service accounts and access permissions
Create, deploy, and manage digital certificates
Respond to security incidents and execute remediation tasks effectively
Bachelor's degree in Computer Science, Engineering, or related field
2 to 4 years of experience in DevOps, SRE, or infrastructure roles
Mid-level proficiency in Python or other scripting languages
Mid-level proficiency in Configuration management tool including Ansible
Practical experience with cloud platforms – AWS and Azure
Knowledge of containerization (Docker, Kubernetes/ECS)
Knowledge of Linux systems and networking
Knowledge of relational, cloud, and NoSQL databases
Excellent written and verbal communication skills
Demonstrated ability to work independently and manage priorities
Availability to work outside of standard business hours as required
Company
Idexcel
Idexcel is a Professional Services and Technology Solutions provider specializing in Cloud Services, Cloud Native Services, Data Platforms and Intelligence, Automation & AI.
H1B Sponsorship
Idexcel has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (99)
2024 (116)
2023 (126)
2022 (198)
2021 (201)
2020 (282)
Funding
Current Stage
Late StageRecent News
Company data provided by crunchbase