Senior Observability Platform Engineer jobs in United States
info-icon
This job has closed.
company-logo

ExecutivePlacements.com ยท 2 days ago

Senior Observability Platform Engineer

ExecutivePlacements.com offers an exciting opportunity for software engineers passionate about open source software, Linux, Kubernetes, and Observability. The Senior Observability Platform Engineer will be responsible for designing and implementing a comprehensive observability stack to ensure the reliability and performance of services across various cloud platforms and infrastructures.

Human ResourcesOnline PortalsRecruiting

Responsibilities

Responsible for designing, developing, implementing, and maintaining our comprehensive observability stack, including tracing, telemetry, logging, health monitoring, visualization, and dashboards. You will play a key role in ensuring the reliability, performance, and operational efficiency of our services
Design and implement a robust observability framework using composable open source solutions like Prometheus, Alertmanager, OpenTelemetry, Grafana, Alloy, Loki, Promtail, Tempo, Thanos, ELK stack, Zabbix, and similar
Develop and maintain health monitoring and alerting systems for our compute platforms, databases, network infrastructure as well as Kubernetes-based platforms including GPU-supported environments
Create and manage visualization dashboards to monitor system performance, resource utilization, and operational health
Implement scalable, distributed logging and tracing solutions to diagnose, troubleshoot, and resolve system issues effectively
Collaborate with development and operations teams to integrate observability practices into the development lifecycle
Conduct performance analysis and optimization to ensure system reliability and efficiency
Stay updated with the latest trends and technologies in observability and performance monitoring
Collaborate with cross-functional teams (Cloud Engineering, Network, and DevOps/Solutions Engineering) to troubleshoot and resolve infrastructure issues

Qualification

Observability toolsKubernetesScripting languagesInfrastructure-as-codePerformance analysisProblem-solvingCommunication skills

Required

Designing, developing, implementing, and maintaining a comprehensive observability stack, including tracing, telemetry, logging, health monitoring, visualization, and dashboards
Design and implement a robust observability framework using composable open source solutions
Develop and maintain health monitoring and alerting systems for compute platforms, databases, network infrastructure, and Kubernetes-based platforms
Create and manage visualization dashboards to monitor system performance, resource utilization, and operational health
Implement scalable, distributed logging and tracing solutions to diagnose, troubleshoot, and resolve system issues effectively
Collaborate with development and operations teams to integrate observability practices into the development lifecycle
Conduct performance analysis and optimization to ensure system reliability and efficiency
Stay updated with the latest trends and technologies in observability and performance monitoring
Collaborate with cross-functional teams to troubleshoot and resolve infrastructure issues
Bachelors or Masters degree in Computer Science, Information Technology, or a related field

Preferred

Proven experience in observability, system and network monitoring, and system performance analysis, particularly in a cloud or data center environment
Expertise in implementing and managing observability tools and technologies such as composable open source solutions like Prometheus, Alertmanager, OpenTelemetry, Grafana, Alloy, Loki, Promtail, Tempo, Thanos, ELK stack, Zabbix, and similar commercial solutions
Hands-on experience with Kubernetes
Experience with infrastructure-as-code and configuration management tools such as Consul, GitHub, Salt Stack, Terraform, etc
Proficiency in scripting and automation using languages such as Go, Python, Shell
Excellent problem-solving skills and the ability to work independently or as part of a team
Strong communication skills and the ability to work in a fast-paced, dynamic environment

Company

ExecutivePlacements.com

twittertwittertwitter
company-logo
Online recruitment

Funding

Current Stage
Early Stage
Company data provided by crunchbase