Senior Kafka Platform Engineer jobs in United States
cer-icon
Apply on Employer Site
company-logo

Selby Jennings · 1 week ago

Senior Kafka Platform Engineer

Selby Jennings is a top Asset Management firm looking for a seasoned Kafka Engineer to design, operate, and scale their event streaming platform. The role focuses on architecture, automation, and reliability engineering, requiring expertise in Kafka and Kubernetes.

BankingEmploymentRecruiting
Hiring Manager
Caitlin Wagner
linkedin

Responsibilities

Architect, deploy, and operate production-grade Kafka clusters (self-managed and/or Confluent/MSK), including upgrades, capacity planning, multi-AZ/region DR, and performance tuning
Operate Kafka on Kubernetes using Operators, Helm, and GitOps; build IaC-driven automation with guardrails for zero-downtime provisioning
Implement and manage Kafka Connect, Schema Registry, and MirrorMaker 2/Cluster Linking; standardize connectors (e.g., Debezium) and enable self-service patterns
Drive reliability: define SLOs/error budgets, on-call rotations, incident response, postmortems, runbooks, and automated remediation
Implement observability: metrics, logs, traces, lag monitoring, and capacity dashboards (Prometheus/Grafana, Burrow, Cruise Control, OpenTelemetry)
Secure the platform: TLS/mTLS, SASL (OAuth/SCRAM), RBAC/ACLs, secrets management, network policies, audit, and compliance automation
Guide event-streaming best practices: topic design, partitioning, retention, idempotency, ordering, schema evolution, DLQs, EOS semantics
Collaborate with app, data, and SRE teams; provide enablement, documentation, and internal tooling for a great developer experience
Lead and mentor engineers; contribute to roadmap, standards, and platform strategy

Qualification

KafkaKubernetesInfrastructure as CodeSecurity practicesPythonCloud experienceObservabilityAutomation mindsetReliability engineeringSoft skills

Required

Deep hands-on experience operating Kafka at scale in Big Tech, trading, or asset management environments
Strong Kubernetes expertise running stateful systems
Automation-first mindset: Infrastructure as Code (Terraform), Helm, Operators, GitOps (Argo CD/Flux), and CI/CD (GitHub Actions/Jenkins)
Proficiency in Python, Go, or Java, plus Bash and solid Linux fundamentals (networking, filesystems, JVM tuning basics)
Observability and reliability engineering for Kafka: Prometheus/Grafana, logging, alerting, lag monitoring, capacity/throughput modeling, performance tuning
Security for data in motion: TLS/mTLS, SASL/OAuth, ACL/RBAC, secrets management (Vault), and audit/compliance practices
Experience with Kafka ecosystem components: Kafka Connect, Schema Registry, MirrorMaker 2/Cluster Linking; familiarity with Cruise Control
Cloud experience (AWS/Azure/GCP) with networking, IAM, and managed offerings (Confluent Cloud or AWS MSK)
Proven track record designing runbooks, leading incidents/postmortems, and driving platform roadmaps

Preferred

Data processing frameworks (Kafka Streams, Flink, Spark Structured Streaming) and EOS semantics
Experience with Strimzi or Confluent for Kubernetes in production
Knowledge of CDC patterns and tools (Debezium) and database connectors at scale
Multi-region architectures, cluster linking strategies, and disaster recovery drills

Company

Selby Jennings

company-logo
Global recruitment firm specialising in Banking

Funding

Current Stage
Late Stage
Company data provided by crunchbase