Skyline Technology Solutions · 3 weeks ago
Lead Observability Engineer
Skyline Technology Solutions is seeking a Lead Observability Engineer who will serve as the technical authority for monitoring and reliability insights across platforms and services. The role involves owning the architecture and operation of the observability ecosystem while ensuring engineering teams have the visibility required for resilient systems.
ConsultingCyber SecurityInformation TechnologySecurityVideo
Responsibilities
Architect, implement, and operate the full observability stack, including metrics, logging, tracing, dashboards, alerting, and telemetry pipelines
Maintain and optimize Grafana, Loki, Tempo, exporters, agents, and related services to ensure reliability, performance, and scalability
Ensure high-quality, consistent telemetry across all environments
Define organizational standards for instrumentation, dashboards, alerts, SLIs, and SLOs
Partner with engineering teams to guide adoption of reliability and observability best practices
Improve signal-to-noise ratio in alerts and evolve incident visibility and analysis frameworks
Collaborate with Platform, Application, Security, and Network Engineering teams to ensure observability is embedded into architecture and operational workflows
Provide expert guidance on system behavior, failure modes, performance patterns, and telemetry-driven insights
Qualification
Required
Bachelor's degree in Computer Science, Networking, Telecommunications, or related technical field
8+ years of experience in systems engineering, SRE, platform engineering, or infrastructure operations roles in large-scale, high-availability environments
Observability engineering: metrics, logs, traces, dashboards, alerting, SLOs/SLIs, Linux systems engineering, OS tuning, benchmarking, and troubleshooting at scale
Experience with log aggregation and search systems (Splunk, ElasticSearch), message brokers (RabbitMQ, Kafka), and system monitoring tools (Zabbix, Grafana)
Proven hands-on experience operating Linux systems (RHEL, Ubuntu, CentOS) at scale, including performance tuning, benchmarking, hardening, and troubleshooting
Demonstrated experience with observability tooling such as Splunk, ElasticSearch, Graphite, Zabbix, log pipelines, and metrics systems
Proficiency with Kubernetes, Docker, CI/CD, and infrastructure automation frameworks such as Ansible, Chef, or Salt
Background in security operations or tooling such as MS Defender, Nessus, Carbon Black, CrowdStrike, IAM, or FIM solutions
Experience designing or supporting disaster recovery, high-availability, and SLA-driven systems for mission-critical services
Direct experience with distributed systems, Kafka-based architectures, or microservices environments
Strong familiarity with compliance frameworks (SOC2, PCI, HITRUST, FedRAMP, CONMON, C5, GDPR) and implementing technical controls in production environments
Demonstrated ability to collaborate across cross-functional engineering, security, and compliance teams and lead technical initiatives without direct authority
Experience supporting or designing multi-datacenter infrastructure or hybrid cloud environments
Prior leadership experience in SRE, platform engineering, or cloud operations teams within enterprise-scale organizations
Preferred
Professional certifications Preferred: CISSP, CISM, PMP, ITIL, AWS/Azure
Benefits
Medical Insurance
Vision Insurance
Dental Insurance
FSA Plan
Paid Time Off
401K Retirement Savings Plan
Training & Tuition Assistance
Disability & Life Insurance
Company
Skyline Technology Solutions
Skyline Technology Solutions is a technology consulting firm focusing on IT services, video sharing, and cybersecurity.
Funding
Current Stage
Growth StageRecent News
Maryland Daily Record
2024-05-21
The Business Journals
2024-04-09
Company data provided by crunchbase