ExecutivePlacements.com ยท 2 days ago
Senior Observability Platform Engineer
ExecutivePlacements.com offers an exciting opportunity for software engineers passionate about open source software, Linux, Kubernetes, and Observability. The Senior Observability Platform Engineer will be responsible for designing and implementing a comprehensive observability stack to ensure the reliability and performance of services across various cloud platforms and infrastructures.
Human ResourcesOnline PortalsRecruiting
Responsibilities
Responsible for designing, developing, implementing, and maintaining our comprehensive observability stack, including tracing, telemetry, logging, health monitoring, visualization, and dashboards. You will play a key role in ensuring the reliability, performance, and operational efficiency of our services
Design and implement a robust observability framework using composable open source solutions like Prometheus, Alertmanager, OpenTelemetry, Grafana, Alloy, Loki, Promtail, Tempo, Thanos, ELK stack, Zabbix, and similar
Develop and maintain health monitoring and alerting systems for our compute platforms, databases, network infrastructure as well as Kubernetes-based platforms including GPU-supported environments
Create and manage visualization dashboards to monitor system performance, resource utilization, and operational health
Implement scalable, distributed logging and tracing solutions to diagnose, troubleshoot, and resolve system issues effectively
Collaborate with development and operations teams to integrate observability practices into the development lifecycle
Conduct performance analysis and optimization to ensure system reliability and efficiency
Stay updated with the latest trends and technologies in observability and performance monitoring
Collaborate with cross-functional teams (Cloud Engineering, Network, and DevOps/Solutions Engineering) to troubleshoot and resolve infrastructure issues
Qualification
Required
Designing, developing, implementing, and maintaining a comprehensive observability stack, including tracing, telemetry, logging, health monitoring, visualization, and dashboards
Design and implement a robust observability framework using composable open source solutions
Develop and maintain health monitoring and alerting systems for compute platforms, databases, network infrastructure, and Kubernetes-based platforms
Create and manage visualization dashboards to monitor system performance, resource utilization, and operational health
Implement scalable, distributed logging and tracing solutions to diagnose, troubleshoot, and resolve system issues effectively
Collaborate with development and operations teams to integrate observability practices into the development lifecycle
Conduct performance analysis and optimization to ensure system reliability and efficiency
Stay updated with the latest trends and technologies in observability and performance monitoring
Collaborate with cross-functional teams to troubleshoot and resolve infrastructure issues
Bachelors or Masters degree in Computer Science, Information Technology, or a related field
Preferred
Proven experience in observability, system and network monitoring, and system performance analysis, particularly in a cloud or data center environment
Expertise in implementing and managing observability tools and technologies such as composable open source solutions like Prometheus, Alertmanager, OpenTelemetry, Grafana, Alloy, Loki, Promtail, Tempo, Thanos, ELK stack, Zabbix, and similar commercial solutions
Hands-on experience with Kubernetes
Experience with infrastructure-as-code and configuration management tools such as Consul, GitHub, Salt Stack, Terraform, etc
Proficiency in scripting and automation using languages such as Go, Python, Shell
Excellent problem-solving skills and the ability to work independently or as part of a team
Strong communication skills and the ability to work in a fast-paced, dynamic environment
Company
ExecutivePlacements.com
Online recruitment
Funding
Current Stage
Early StageCompany data provided by crunchbase