Site Reliability Engineer (SRE) jobs in United States
info-icon
This job has closed.
company-logo

MaTi Group Inc · 5 hours ago

Site Reliability Engineer (SRE)

MaTi Group Inc. is a leading organization specializing in talent acquisition and project development services. They are seeking a Site Reliability Engineer (SRE) to maintain the reliability and availability of systems, troubleshoot technical issues, and develop software to improve system operations.

Computer Software
check
H1B Sponsorednote

Responsibilities

Design, build, and operate highly scalable, reliable, and available cloud platforms using AWS and Azure
Apply SRE principles including SLIs, SLOs, and error budgets to balance system reliability and feature velocity
Architect and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, and automation best practices
Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, and AWS CDK for global infrastructure automation
Lead incident response and on-call operations, following ITIL frameworks and managing workflows in ServiceNow
Perform Root Cause Analysis (RCA) and maintain detailed post-incident documentation and knowledge bases
Drive performance, capacity planning, and resiliency testing to ensure durability of mission-critical systems
Optimize cloud cost management, autoscaling thresholds, and resource utilization across environments
Implement advanced observability, monitoring, and distributed tracing using Dynatrace and Kibana
Build intelligent dashboards and enable proactive anomaly detection to reduce MTTR
Manage Linux-based systems, networking fundamentals, and relational & NoSQL databases
Support containerized workloads using Docker and orchestration via Kubernetes or Amazon ECS
Develop automation and tooling using Python or similar scripting languages
Enforce security and compliance best practices, including service accounts, certificate management, and rapid remediation
Collaborate cross-functionally with development, operations, and security teams, demonstrating strong communication and ownership

Qualification

Site Reliability EngineeringAWSAzureTerraformCI/CDPythonDockerKubernetesLinuxDatabasesIncident ManagementProblem-solvingAttention to detailCollaboration

Required

Proficiency in Site Reliability Engineering practices and experience in troubleshooting complex technical issues
Software Development skills with a strong understanding of programming languages and frameworks
Hands-on experience with System Administration, including deploying, configuring, and maintaining systems
Knowledge of Infrastructure and cloud technologies to support scalable and reliable systems
Strong problem-solving skills and attention to detail
Ability to collaborate effectively within a hybrid work environment
Relevant certifications in cloud platforms or system administration are advantageous
AWS
Azure
Terraform
CloudFormation
GitHub Actions
CI/CD
SRE
SLIs
SLOs
Error Budgets
Dynatrace
Kibana
ServiceNow
ITIL
Python
Linux
Docker
Kubernetes
ECS
Networking
Databases
Incident Management

Preferred

Familiarity with DevOps tools and practices is a plus

Company

MaTi Group Inc

twitter
company-logo
Welcome to Mati Inc. Your premier partner in talent acquisition and project development services.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase