Lead Observability Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Skyline Technology Solutions · 3 weeks ago

Lead Observability Engineer

Skyline Technology Solutions is seeking a Lead Observability Engineer who will serve as the technical authority for monitoring and reliability insights across platforms and services. The role involves owning the architecture and operation of the observability ecosystem while ensuring engineering teams have the visibility required for resilient systems.

ConsultingCyber SecurityInformation TechnologySecurityVideo
check
Culture & Values

Responsibilities

Architect, implement, and operate the full observability stack, including metrics, logging, tracing, dashboards, alerting, and telemetry pipelines
Maintain and optimize Grafana, Loki, Tempo, exporters, agents, and related services to ensure reliability, performance, and scalability
Ensure high-quality, consistent telemetry across all environments
Define organizational standards for instrumentation, dashboards, alerts, SLIs, and SLOs
Partner with engineering teams to guide adoption of reliability and observability best practices
Improve signal-to-noise ratio in alerts and evolve incident visibility and analysis frameworks
Collaborate with Platform, Application, Security, and Network Engineering teams to ensure observability is embedded into architecture and operational workflows
Provide expert guidance on system behavior, failure modes, performance patterns, and telemetry-driven insights

Qualification

Observability engineeringLinux systems engineeringKubernetesInfrastructure automationLog aggregation systemsDistributed systemsCompliance frameworksTechnical leadershipCollaboration skillsProblem-solving skills

Required

Bachelor's degree in Computer Science, Networking, Telecommunications, or related technical field
8+ years of experience in systems engineering, SRE, platform engineering, or infrastructure operations roles in large-scale, high-availability environments
Observability engineering: metrics, logs, traces, dashboards, alerting, SLOs/SLIs, Linux systems engineering, OS tuning, benchmarking, and troubleshooting at scale
Experience with log aggregation and search systems (Splunk, ElasticSearch), message brokers (RabbitMQ, Kafka), and system monitoring tools (Zabbix, Grafana)
Proven hands-on experience operating Linux systems (RHEL, Ubuntu, CentOS) at scale, including performance tuning, benchmarking, hardening, and troubleshooting
Demonstrated experience with observability tooling such as Splunk, ElasticSearch, Graphite, Zabbix, log pipelines, and metrics systems
Proficiency with Kubernetes, Docker, CI/CD, and infrastructure automation frameworks such as Ansible, Chef, or Salt
Background in security operations or tooling such as MS Defender, Nessus, Carbon Black, CrowdStrike, IAM, or FIM solutions
Experience designing or supporting disaster recovery, high-availability, and SLA-driven systems for mission-critical services
Direct experience with distributed systems, Kafka-based architectures, or microservices environments
Strong familiarity with compliance frameworks (SOC2, PCI, HITRUST, FedRAMP, CONMON, C5, GDPR) and implementing technical controls in production environments
Demonstrated ability to collaborate across cross-functional engineering, security, and compliance teams and lead technical initiatives without direct authority
Experience supporting or designing multi-datacenter infrastructure or hybrid cloud environments
Prior leadership experience in SRE, platform engineering, or cloud operations teams within enterprise-scale organizations

Preferred

Professional certifications Preferred: CISSP, CISM, PMP, ITIL, AWS/Azure

Benefits

Medical Insurance
Vision Insurance
Dental Insurance
FSA Plan
Paid Time Off
401K Retirement Savings Plan
Training & Tuition Assistance
Disability & Life Insurance

Company

Skyline Technology Solutions

company-logo
Skyline Technology Solutions is a technology consulting firm focusing on IT services, video sharing, and cybersecurity.

Funding

Current Stage
Growth Stage

Leadership Team

leader-logo
Mia Millette
Chief Executive Officer
linkedin
leader-logo
Paul Lennon
Chief Technology Officer
linkedin
Company data provided by crunchbase