Staff Platform Infrastructure Engineer: Observability jobs in United States
cer-icon
Apply on Employer Site
company-logo

Jack Henry ยท 3 weeks ago

Staff Platform Infrastructure Engineer: Observability

Jack Henry & Associates is a technology company focused on redefining how community banks and credit unions connect with their customers. The Staff Platform Infrastructure Engineer: Observability will be responsible for designing and implementing observability solutions to empower product teams, ensuring the reliability and performance of systems through effective monitoring and incident resolution.

BankingEnterprise SoftwareFinancial Services
badNo H1Bnote

Responsibilities

Plan, design and build the observability blueprints used by Jack Henry's development and engineering teams. Craft the overarching strategy and create detailed designs for how observability will be implemented for various products & services across the organization. This includes defining data flows & telemetry pipelines into the appropriate platform (like Datadog, Honeycomb, Prometheus, etc.) and establishing best practices for instrumentation
Resolve critical observability incidents that impact the ability to monitor and understand system behavior. This engineer will lead the effort to pinpoint the root cause, implement solutions, and prevent recurrence. This requires deep expertise in observability tools and techniques
Design and implement automated pipelines for deploying, configuring, and managing observability tools and instrumentation. This includes automating tasks such as agent installation, configuration updates, and alert provisioning, leveraging IaC principles and tools
Responsible for ensuring the observability systems themselves are healthy, reliable, and providing accurate data. Also champion the use of observability to improve the overall health and resilience of production systems, enabling faster detection and mitigation of potential problems
Actively engage with product teams to understand their upcoming projects, technology choices, and observability needs. Ensure that observability is embedded early in the development lifecycle and provide the insights necessary to optimize application performance and reliability
May perform other job duties as assigned

Qualification

Observability platformsTelemetry pipelinesCloud experienceKubernetes environmentsTerraformDistributed tracingCI/CD integrationAnalytical skillsSRE mindsetProblem-solving skills

Required

Minimum of 10 years of experience in Software Development, Observability Engineering, or Site Reliability Engineering
Minimum of 5 years of in-depth experience with Observability platforms like Datadog, Dynatrace, Honeycomb, New Relic, Splunk, or Prometheus
Minimum of 4 years of cloud experience with Azure, AWS, or GCP
Minimum of 4 years of experience with OTEL and telemetry pipelines
Minimum of 4 years of experience with Kubernetes environments
Understanding and experience with declarative infrastructure using Terraform
Must be able to work an on-call rotation that may include weekends as the business need dictates

Preferred

Proven ability to collect logs, metrics & traces and implement Observability solutions for applications and infrastructure
Solid understanding of distributed tracing and experience instrumenting applications to analyze performance bottlenecks
Hands-on experience configuring and deploying OTEL collectors for telemetry data collection, processing and export
Strong understanding of Kubernetes architecture and experience managing observability within K8s environments
Proficiency in using Kustomize for Kubernetes configurations and Terraform for infrastructure provisioning
Experience integrating observability practices and tools into CI/CD pipelines for automated deployments
Exceptional analytical and problem-solving skills to diagnose and resolve complex issues within observability systems, including data pipeline failures, instrumentation errors, and performance bottlenecks
Ability to demonstrate a strong Site Reliability Engineering (SRE) mindset with a focus on automation, proactive monitoring, and continuous improvement to ensure system reliability and availability
Experience defining and implementing Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure system performance and drive data-driven decisions

Benefits

Comprehensive benefits designed to support your physical, mental, and financial health

Company

Jack Henry

company-logo
Jack Henry (Nasdaq: JKHY) is a well-rounded financial technology company that strengthens the connections between people and their financial institutions through technology and services that reduce the barriers to financial health.

Funding

Current Stage
Public Company
Total Funding
unknown
1985-11-20IPO

Leadership Team

leader-logo
Greg Adelson
Chief Executive Officer
linkedin
leader-logo
Kevin williams
CFO
linkedin
Company data provided by crunchbase