Scalence L.L.C. · 3 months ago
Dev Ops Engineer - Lead
Scalence L.L.C. is a company seeking a Lead DevOps Engineer to implement and manage full-stack observability using Datadog. The role involves designing and deploying key service monitoring, collaborating with various teams, and automating monitoring configurations.
Information Technology & Services
Responsibilities
Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services
Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring
Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications
Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools
Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces
Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry
Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues
Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals
Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection
Qualification
Required
Implement and manage full-stack observability using Datadog, ensuring seamless monitoring across infrastructure, applications, and services
Instrument agents for on-premise, cloud, and hybrid environments to enable comprehensive monitoring
Design and deploy key service monitoring, including dashboards, monitor creation, SLA/SLO definitions, and anomaly detection with alert notifications
Configure and integrate Datadog with third-party services such as ServiceNow, SSO enablement, and other ITSM tools
Build and maintain comprehensive observability platforms that provide deep insights into complex systems, incorporating logs, metrics, and traces
Instrument applications, infrastructure, and services to collect telemetry data using frameworks like OpenTelemetry
Develop dashboards, reports, and alerts using tools like Prometheus, Grafana, and Splunk to visualize system performance and detect issues
Work with development, SRE, and DevOps teams to integrate observability best practices and align monitoring with business and operational goals
Develop scripts and use Infrastructure as Code (IaC) tools like Ansible and Terraform to automate monitoring configurations and telemetry collection
Proficiency in monitoring, logging, and tracing tools, including Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, Datadog, New Relic, and cloud-native solutions like AWS CloudWatch
Expertise in languages such as Python and Go for scripting and automation
Experience with cloud platforms (AWS, GCP, Azure) and container orchestration systems like Kubernetes
Familiarity with Terraform and Ansible for managing infrastructure and configurations
Experience with CI/CD pipelines and automation tools like Jenkins
A strong background in both system operations and software development
Optimize cloud agent instrumentation, with cloud certifications being a plus
Datadog Fundamental, APM and Distributed Tracing Fundamentals & Datadog Demo Certification (Mandatory)
Strong understanding of Observability concepts (Logs, Metrics, Tracing)
Expertise in security & vulnerability management in observability
Possesses 2 years of experience in cloud-based observability solutions, specializing in monitoring, logging, and tracing across AWS, Azure, and GCP environments
Company
Scalence L.L.C.
In today’s dynamic and competitive market, success hinges on mastering three key areas: Data Intelligence, Business Resilience, and Digital Experience.
Funding
Current Stage
Late StageCompany data provided by crunchbase