Site Reliability Engineer ( Must have an active TS SCI with POLY) jobs in United States
cer-icon
Apply on Employer Site
company-logo

Aperio Global · 2 weeks ago

Site Reliability Engineer ( Must have an active TS SCI with POLY)

Aperio Global is seeking a Site Reliability Engineer who will support critical missions requiring an active U.S. Government Security Clearance at the TS/SCI level with a required polygraph. The role involves designing and building monitoring and observability solutions to ensure the reliability and performance of complex, cloud-native systems, while collaborating with various teams to troubleshoot and improve system behavior.

Artificial Intelligence (AI)Cyber Security
badNo H1BnoteSecurity Clearance RequirednoteU.S. Citizen Onlynote

Responsibilities

Define and uphold standards for monitoring reliability, availability, performance, and maintainability of sponsor-owned systems
Design and architect operational solutions that support both applications and infrastructure
Drive service acceptance by introducing new operational processes, monitoring strategies, and automation to reduce risk and repeat issues
Partner closely with service and product owners to define key performance indicators (KPIs) and identify meaningful trends
Provide deep, hands-on troubleshooting support for production issues
Work with service owners to quickly identify root causes and restore services during performance or availability incidents
Build or leverage tools that correlate data across multiple systems to accelerate root-cause analysis
Coordinate with the sponsor during major incidents, large-scale deployments, and SecOps user support activities

Qualification

TS/SCI clearanceKubernetesAWS cloudMonitoring solutionsPythonJava scriptingUnix/LinuxCI/CD pipelinesLeadership during incidentsCollaborationDetail-oriented

Required

Active/current TS/SCI with required polygraph
Bachelor's degree in Computer Science or a related field
5+ years of relevant engineering experience
Hands-on experience with Kubernetes, Docker, Helm, and CI/CD pipelines (e.g., Jenkins or Concourse)
Familiarity with distributed version control systems such as Git
Experience working in AWS cloud environments
Proven experience implementing monitoring and observability solutions across complex systems and data feeds
Proficiency in Python and Java scripting
Advanced knowledge of Unix/Linux, with strong command-line comfort
Willingness to work onsite full time and participate in on-call rotations
A collaborative mindset and a sense of ownership when things go wrong

Preferred

Experience with additional cloud providers beyond AWS
Familiarity with AWS CloudWatch or other native monitoring tools
Experience using Prometheus, Grafana, or similar tools for ETL pipelines, APIs, servers, networks, C2S services, and AI/ML platforms
Strong understanding of networking fundamentals
Experience with incident and problem management processes
Root Cause Analysis (RCA) experience
Exposure to ETL workflows and data pipelines
Organized, detail-oriented, and comfortable documenting and communicating work
Willingness to step into leadership roles during incidents—guiding others and driving issues to resolution

Benefits

Health Care Plan (Medical, Dental & Vision)
Retirement Plan (401k, IRA)
Life Insurance (Basic, Voluntary & AD&D)
Paid Time Off (Vacation, Sick & Public Holidays)
Short Term & Long Term Disability
(and much more)

Company

Aperio Global

twittertwitter
company-logo
Aperio Global provides professional and innovative cyber security and artificial intelligence solutions.

Funding

Current Stage
Early Stage

Leadership Team

leader-logo
Earl Stafford, Jr.
Chief Executive Officer
linkedin
Company data provided by crunchbase