Site Reliability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

DMSi Software ยท 1 day ago

Site Reliability Engineer

DMSi Software is seeking a Site Reliability Engineer to enhance their monitoring and alerting systems for applications. The role involves collaborating with various teams to optimize system observability and provide actionable insights to improve user experience.

Information TechnologyRoboticsSoftware

Responsibilities

Evaluate existing monitoring systems and implement improvements to ensure comprehensive observability across all systems and environments
Develop and maintain dashboards and reports that provide real-time visibility into system health, capacity/utilization trends, and performance
Ensure that the overall system environment operates nominally by monitoring critical performance indicators
Provide insights into system status that help maintain a smooth and uninterrupted user experience
Review and refine alerting mechanisms to minimize false positives and ensure timely and accurate notifications for critical issues
Develop escalation processes and response playbooks to streamline incident management
Analyze monitoring data to identify trends, anomalies, and potential areas of improvement
Provide actionable insights to relevant teams and drive data-driven decision-making leveraging machine learning and normal versus abnormal system behaviors
Work closely with software engineers, DevOps teams, and other stakeholders to ensure monitoring and alerting systems are aligned with business goals and technical requirements
Develop and maintain automation scripts and tools to streamline monitoring and alerting processes, reducing manual effort and improving efficiency
Document monitoring and alerting systems, processes, and best practices
Provide training and guidance to teams on how to use monitoring tools and interpret data
Continuously assess and improve monitoring and alerting strategies to adapt to changing technologies and business needs
Stay updated with industry trends and emerging tools in the observability space

Qualification

Monitoring toolsScripting languagesCloud platformsInfrastructure-as-codeCI/CD pipelinesNetworkingSecuritySystem administration

Required

Strong experience with monitoring and observability tools (e.g., Nagios, Prometheus, Grafana, ELK Stack, Datadog, New Relic)
Proficiency in scripting languages (e.g., Python, Bash, PowerShell) for automation
Familiarity with cloud platforms (AWS, Azure, GCP) and hybrid cloud environments
Understanding of infrastructure-as-code tools (e.g., Terraform, Ansible)
Knowledge of CI/CD pipelines and version control systems (e.g., Git, Jenkins)
Basic understanding of networking, security, and system administration
Bachelor's degree in Computer Science, Engineering, a related field, or equivalent experience
Minimum of 3 years of experience in a Site Reliability Engineering or similar role, with a focus on monitoring and alerting in a SaaS environment

Company

DMSi Software

twittertwittertwitter
company-logo
DMSi is bringing new technology and new ideas to the building materials industry through our specialized business management software.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Brent Heavican
Vice President and CTO
linkedin
leader-logo
Michael Limas
Executive Vice President and Chief Financial Officer
linkedin
Company data provided by crunchbase