Apply on Employer Site

BeaconFire Inc. · 22 hours ago

Site Reliability Engineer

Austin, TX

Contract

Hybrid

Senior Level

7+ years exp

BeaconFire Inc. is seeking a highly skilled Site Reliability Engineer to join their team, primarily located in Austin, TX. The role involves leading transformational initiatives within IT operations, focusing on observability assessments, data collection, and automation to enhance operational excellence and reliability.

Artificial Intelligence (AI)Information TechnologyIT InfrastructureMachine LearningManufacturing

H1B Sponsor Likely

Hiring Manager

Amit Jha

Responsibilities

Participate in design, architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance

Primary skillset to be expertise in Observability as a service, Dashboard as a services, monitoring as a services and alert as a service in all technology domains (application, infrastructure, database, security, middleware, network etc.,)

Telemetry data collection using Dynatrace APM, SolarWinds, CISCO Switches, F5, Databases, Open-Source tools (Prometheus and Grafana), Log Aggregations (Kibana or Splunk) and AIOPS Tools

Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources

Experience performing Observability current-state assessments and gap analysis

Experience instrumenting OTEL Framework for .Net and Java applications

Configure application performance monitoring (APM), infrastructure monitoring, synthetic monitoring, RUM, and log monitoring

Integrate Dynatrace with CI/CD pipelines, alerting tools, ITSM systems, and incident automation frameworks

Tune alert thresholds, baselines, and AI-driven anomaly detection to reduce noise and improve actionable insights

Deeper understanding of Login authentication mechanisms using Ping, ForgeRock and SiteMinder technologies (session management and cookie management)

Define best practices and principles for SRE, including monitoring, alerting, and automation

Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind

Implement monitoring systems to assess the performance of applications and infrastructure and proactively identifying areas for optimization

Ability to develop close relationship with other operational teams to integrate SRE practices and drive overall operational improvements across enterprise

Stay up to date on industry trends, new technologies, and best practices in SRE and applying relevant advancements to the organization

Ability to build strong working relationships across different levels, client focus mindset

Qualification

Observability as a serviceCloud technologies (AWS)Automation experienceDocker/KubernetesDynatrace APMTelemetry data collectionGolden Signals implementationLinux CommandsGitLab CICD SetupTerraformCollaboration skillsClient focus mindsetProblem-solving skills

Required

Around 7-10 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation

Experience performing Observability current-state assessments, gap analysis and solutioning (automation and manual fixes) in all technology domains (application, infrastructure, database, security, middleware, network etc.,)

Strong hands-on automation experience in Observability as a code, dashboard as a code, monitoring as a code, alert as a code (Instrumentation, templates, automatic deployment, visualization and alerting)

Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources

Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO

Solid understanding and hands on experience with Docker/Kubernetes

Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc)

Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK, with pref. for APM (Dynatrace)

Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals

Strong skills in APM, distributed tracing, synthetic & real user monitoring, log monitoring, and Davis AI configuration

Own the design, configuration, CICD deployment, and optimization for enterprise-wide observability tools

Experience integrating, automation, and cloud platforms (AWS, Azure, GCP)

Extended experience instrumenting OTEL Framework

Hands on experience with Dynatrace Plug-and-play observability modules (OKit) development for Observability Developers Java and .Net applications

Define monitoring standards, best practices, and governance to ensure consistency and scalability

Experience to deploy and tune OneAgent, build end-to-end PurePath tracing, and leverage Smartscape topology for proactive performance monitoring and root-cause analysis

Collaborate with application and infrastructure teams to troubleshoot performance issues and implement permanent fixes

Preferred

Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, Google Cloud Professional; DevOps Engineer

Company

BeaconFire Inc.

BeaconFire is a leading technology consulting and professional services organization.

Windsor, New Jersey, USA

501-1000 employees

https://beaconfireinc.com

H1B Sponsorship

BeaconFire Inc. has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (121)

2024 (63)

2023 (34)

2022 (40)

2021 (15)

2020 (1)

Funding

Current Stage

Late Stage

Company data provided by crunchbase