Apply on Employer Site

hackajob · 6 hours ago

Site Reliability Engineer III- Kafka Platform Engineering

Jersey City, NJ

Full-time

Onsite

Mid, Senior Level

3+ years exp

Hackajob is collaborating with J.P. Morgan to connect them with exceptional tech professionals for this role. As a Site Reliability Engineer III at JPMorgan Chase, you will solve complex business problems using code and cloud infrastructure while optimizing applications and their associated infrastructure.

Artificial Intelligence (AI)Generative AIHuman ResourcesRecruitingSoftware

Responsibilities

Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate

Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds

Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines

Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications

Implements infrastructure, configuration, and network as code for the applications and platforms in your remit

Collaborates with technical experts, key stakeholders, and team members to resolve complex problems

Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers

Contribute to the development of technical documentation, including service APIs using Swagger, ensuring robust logging, auditability, security, and monitoring features

Supports the adoption of site reliability engineering best practices within your team

Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform

Qualification

KafkaCloud platformsContinuous integrationProgramming languagesObservability toolsSite reliability principlesNetworking technologiesTechnical documentationTeam collaborationProblem-solving

Required

Formal training or certification on computer science and reliability concepts and 3+ years applied experience

Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform

Proficient in at least one programming language such as Java/Spring Boot, python

Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)

Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc

Experience with public cloud platforms like AWS, GCP or Azure

Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams

Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform

Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker

Familiarity with troubleshooting common networking technologies and issues

Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision

Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation

Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team

Ability to initiate and implement ideas to solve business problems

Preferred

Familiarity with running Apache Flink

Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos)

Experience with AWS cloud services and Kubernetes platform orchestration

Benefits

Comprehensive health care coverage

On-site health and wellness centers

A retirement savings plan

Backup childcare

Tuition reimbursement

Mental health support

Financial coaching

Company

hackajob

The AI-native tech hiring platform trusted by enterprises, scale-ups, and 1M+ tech professionals worldwide.

Founded in 2014

London, England, GBR

51-200 employees

http://www.hackajob.com

Funding

Current Stage

Growth Stage

Total Funding

$33M

Key Investors

Volition CapitalDowning VenturesTechstars

2023-05-03Series B· $25M

2018-10-25Series A· $6.7M

2017-03-31Seed· $0.58M

Leadership Team

Mark Chaffey

CEO

Phil Kell

VP - Marketplace

Recent News

AIM Group

Tech-focused hiring site HackAJob launches dedicated AI voice agents

2025-10-23

AIM Group

Application tsunami, fake profiles drive verification race among job sites

2025-09-26

PR Newswire UK

hackajob launches AI agents to make recruitment more human again

2025-09-12

Company data provided by crunchbase