BeaconFire Inc. · 22 hours ago
Site Reliability Engineer
BeaconFire Inc. is seeking a highly skilled Site Reliability Engineer to join their team, primarily located in Austin, TX. The role involves leading transformational initiatives within IT operations, focusing on observability assessments, data collection, and automation to enhance operational excellence and reliability.
Responsibilities
Participate in design, architecture of reliable, scalable, and high-performance systems and services with a focus on operational excellence, availability, and performance
Primary skillset to be expertise in Observability as a service, Dashboard as a services, monitoring as a services and alert as a service in all technology domains (application, infrastructure, database, security, middleware, network etc.,)
Telemetry data collection using Dynatrace APM, SolarWinds, CISCO Switches, F5, Databases, Open-Source tools (Prometheus and Grafana), Log Aggregations (Kibana or Splunk) and AIOPS Tools
Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources
Experience performing Observability current-state assessments and gap analysis
Experience instrumenting OTEL Framework for .Net and Java applications
Configure application performance monitoring (APM), infrastructure monitoring, synthetic monitoring, RUM, and log monitoring
Integrate Dynatrace with CI/CD pipelines, alerting tools, ITSM systems, and incident automation frameworks
Tune alert thresholds, baselines, and AI-driven anomaly detection to reduce noise and improve actionable insights
Deeper understanding of Login authentication mechanisms using Ping, ForgeRock and SiteMinder technologies (session management and cookie management)
Define best practices and principles for SRE, including monitoring, alerting, and automation
Collaborate with development teams on resiliency to ensure that services and applications are designed with operational reliability in mind
Implement monitoring systems to assess the performance of applications and infrastructure and proactively identifying areas for optimization
Ability to develop close relationship with other operational teams to integrate SRE practices and drive overall operational improvements across enterprise
Stay up to date on industry trends, new technologies, and best practices in SRE and applying relevant advancements to the organization
Ability to build strong working relationships across different levels, client focus mindset
Qualification
Required
Around 7-10 years of SRE hands on experience with cloud technologies, development, SRE toolsets and automation
Experience performing Observability current-state assessments, gap analysis and solutioning (automation and manual fixes) in all technology domains (application, infrastructure, database, security, middleware, network etc.,)
Strong hands-on automation experience in Observability as a code, dashboard as a code, monitoring as a code, alert as a code (Instrumentation, templates, automatic deployment, visualization and alerting)
Practical experience implementing Golden Signals (latency, traffic, errors, saturation) using related telemetry sources
Strong hands-on experience with any Cloud Technology (AWS): Control Tower, Project Setup, Creating Accounts, RDS, SSO
Solid understanding and hands on experience with Docker/Kubernetes
Should have good experience with Linux Commands, GitLab CICD Setup and Terraform (state management, etc)
Monitoring & alerting setup experience with Splunk, Prometheus, Grafana, Kibana, ELK, with pref. for APM (Dynatrace)
Good understanding of Observability Framework leveraging programmatic SLI/SLO blueprints to standardize the collection of golden signals
Strong skills in APM, distributed tracing, synthetic & real user monitoring, log monitoring, and Davis AI configuration
Own the design, configuration, CICD deployment, and optimization for enterprise-wide observability tools
Experience integrating, automation, and cloud platforms (AWS, Azure, GCP)
Extended experience instrumenting OTEL Framework
Hands on experience with Dynatrace Plug-and-play observability modules (OKit) development for Observability Developers Java and .Net applications
Define monitoring standards, best practices, and governance to ensure consistency and scalability
Experience to deploy and tune OneAgent, build end-to-end PurePath tracing, and leverage Smartscape topology for proactive performance monitoring and root-cause analysis
Collaborate with application and infrastructure teams to troubleshoot performance issues and implement permanent fixes
Preferred
Any of the relevant professional certifications – Certified Site Reliability Engineer (CSRE), Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer Professional, Google Cloud Professional; DevOps Engineer
Company
BeaconFire Inc.
BeaconFire is a leading technology consulting and professional services organization.
H1B Sponsorship
BeaconFire Inc. has a track record of offering H1B sponsorships. Please note that this does not
guarantee sponsorship for this specific role. Below presents additional info for your
reference. (Data Powered by US Department of Labor)
Distribution of Different Job Fields Receiving Sponsorship
Represents job field similar to this job
Trends of Total Sponsorships
2025 (121)
2024 (63)
2023 (34)
2022 (40)
2021 (15)
2020 (1)
Funding
Current Stage
Late StageCompany data provided by crunchbase