Success In Cloud, Inc. · 10 hours ago
Site Reliability Engineer
Success In Cloud, Inc. is a company focused on cloud infrastructure solutions. They are seeking a Site Reliability Engineer (SRE) to operate and enhance their cloud services' uptime, reliability, and performance through automation and collaboration with cross-functional teams.
Responsibilities
Design, build, and maintain highly available, scalable, and secure cloud infrastructure on platforms such as AWS, GCP, or Azure
Develop and implement automation for provisioning, monitoring, scaling, and incident response using Infrastructure-as-Code tools (e.g., Terraform, CloudFormation, Ansible )
Monitor system reliability, capacity, and performance; proactively detect and address issues before they impact users. Good experience into SRE implementation of monitoring system-Dashboard development for application reliability using Splunk, Dynatrace, Grafana, App Dynamics, Datadog, Big Panda
Collaborate with software engineering and security teams to ensure new services and features are production-ready and meet reliability standards
Build and maintain tools for deployment, monitoring, and operations; automate manual processes to reduce toil. Experience with Automation principals and tools ( Ansible etc ), should have worked with Toil Identification
Document operational processes and system architectures to ensure knowledge sharing and repeatability
Qualification
Required
Bachelor's degree in computer science, Engineering, or a related technical field, or equivalent practical experience
3+ years of experience in software development with proficiency in at least one programming language (e.g., Python, Go, Java, Curl Scripting)
Experience administrating cloud platforms (AWS, GCP, Azure), including networking, security, containerization, storage, data management, and serverless technologies
Solid understanding of Unix/Linux systems, Windows Server, Oracle, MSSQL, MongoDB, networking fundamentals, virtualized, and distributed systems, and file systems
Deep understanding of observability (monitoring, alerting, and logging) tools in cloud environments. Ability to set up and maintain monitoring dashboards, alerts, and logs. Experience with observability tools – AppDynamics, Geneos, Dyanatrace, ECS Based Internal tooling, Grafana, Prometheus, Splunk, Thousand Eye etc
Familiarity with Continuous Integration/Continuous Deployment (CI/CD) tools for automated testing, deployments, provisioning, and observability
Ability to manage and respond to incidents, perform root cause analysis, and implement post-mortem reviews
Understanding of setting, monitoring, and maintaining Service-Level Objectives (SLOs) and Service-Level Agreements (SLAs) for system reliability
Preferred
5+ years of experience in SRE, DevOps, infrastructure, or cloud engineering roles, preferably supporting large-scale, distributed systems
Excellent problem-solving, troubleshooting, and communication skills
Experience leading technical projects or mentoring junior engineers
Certifications: Certified Engineer, DevOps, SRE, CSREF
Company
Success In Cloud, Inc.
Success In Cloud, Inc. is Salesforce Cloud Alliance Partner located in Frisco Texas.
H1B Sponsorship
Success In Cloud, Inc. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (13)
2024 (2)
2023 (3)
2022 (2)
2021 (4)
2020 (3)
Funding
Current Stage
Early StageCompany data provided by crunchbase