Site Reliability Engineer jobs in United States
info-icon
This job has closed.
company-logo

MDAEdge · 1 month ago

Site Reliability Engineer

MDAEdge is a company focused on enhancing the reliability, scalability, and performance of enterprise platforms. The Site Reliability Engineer role involves hands-on engineering with automation and observability while collaborating across functions to ensure operational excellence and metrics-driven improvements.

Human Resources

Responsibilities

Design, implement, and maintain reliable, scalable, secure systems in cloud and on-prem setups
Manage distributed systems on Azure, Linux RHEL7+, and Windows Server 2019+
Build automation workflows using Python, Go, and Bash scripting
Develop Infrastructure-as-Code with Terraform and Ansible
Define, monitor, and refine SLIs, SLOs, and SLAs for service quality
Reduce operational toil through automation and process enhancements
Integrate systems with observability platforms for visibility and proactive issue detection
Troubleshoot incidents, lead response efforts, and conduct post-mortem analyses
Collaborate with software, infrastructure, and business teams for resilient services
Optimize reliability, performance, and maintainability with full ownership

Qualification

Site Reliability EngineeringCloud PlatformsAzureInfrastructure as CodeLinux RHEL7+Windows Server 2019+PythonTerraformAnsibleNetworking FundamentalsNFSSANNASDNSLDAPKerberosCentrifyGoBashObservability PlatformsSLIsSLOsSLAsTOIL ReductionIncident ResponsePost-MortemsAutomationMetrics-Driven EngineeringSystem ReliabilityCross-Functional Collaboration

Required

Demonstrate proven experience as Site Reliability Engineer from software engineering, infrastructure, or operations background
Show hands-on expertise with Azure and enterprise OS like Linux RHEL7+ and Windows Server 2019+
Possess strong knowledge of networking and storage including NFS, SAN, and NAS
Understand authentication and naming services such as DNS, LDAP, Kerberos, and Centrify
Exhibit proficiency in Python, Go, Bash scripting, Terraform, and Ansible IaC tools
Design and monitor SLIs/SLOs/SLAs to drive reliability via metrics and automation
Integrate with observability platforms for logs, metrics, and tracing
Remain calm and structured during high-pressure incidents
Display strong communication and collaboration to influence cross-functional stakeholders
Maintain proactive, ownership mindset for continuous improvement

Company

MDAEdge

twitter
company-logo
At MDAEdge, we help our clients reinvent innovation, optimize operations, and reshape perceptions—ensuring they remain at the forefront in today’s fast-evolving world.

Funding

Current Stage
Growth Stage
Company data provided by crunchbase