Selby Jennings · 1 week ago
Senior Kafka Platform Engineer
Selby Jennings is a top Asset Management firm looking for a seasoned Kafka Engineer to design, operate, and scale their event streaming platform. The role focuses on architecture, automation, and reliability engineering, requiring expertise in Kafka and Kubernetes.
Responsibilities
Architect, deploy, and operate production-grade Kafka clusters (self-managed and/or Confluent/MSK), including upgrades, capacity planning, multi-AZ/region DR, and performance tuning
Operate Kafka on Kubernetes using Operators, Helm, and GitOps; build IaC-driven automation with guardrails for zero-downtime provisioning
Implement and manage Kafka Connect, Schema Registry, and MirrorMaker 2/Cluster Linking; standardize connectors (e.g., Debezium) and enable self-service patterns
Drive reliability: define SLOs/error budgets, on-call rotations, incident response, postmortems, runbooks, and automated remediation
Implement observability: metrics, logs, traces, lag monitoring, and capacity dashboards (Prometheus/Grafana, Burrow, Cruise Control, OpenTelemetry)
Secure the platform: TLS/mTLS, SASL (OAuth/SCRAM), RBAC/ACLs, secrets management, network policies, audit, and compliance automation
Guide event-streaming best practices: topic design, partitioning, retention, idempotency, ordering, schema evolution, DLQs, EOS semantics
Collaborate with app, data, and SRE teams; provide enablement, documentation, and internal tooling for a great developer experience
Lead and mentor engineers; contribute to roadmap, standards, and platform strategy
Qualification
Required
Deep hands-on experience operating Kafka at scale in Big Tech, trading, or asset management environments
Strong Kubernetes expertise running stateful systems
Automation-first mindset: Infrastructure as Code (Terraform), Helm, Operators, GitOps (Argo CD/Flux), and CI/CD (GitHub Actions/Jenkins)
Proficiency in Python, Go, or Java, plus Bash and solid Linux fundamentals (networking, filesystems, JVM tuning basics)
Observability and reliability engineering for Kafka: Prometheus/Grafana, logging, alerting, lag monitoring, capacity/throughput modeling, performance tuning
Security for data in motion: TLS/mTLS, SASL/OAuth, ACL/RBAC, secrets management (Vault), and audit/compliance practices
Experience with Kafka ecosystem components: Kafka Connect, Schema Registry, MirrorMaker 2/Cluster Linking; familiarity with Cruise Control
Cloud experience (AWS/Azure/GCP) with networking, IAM, and managed offerings (Confluent Cloud or AWS MSK)
Proven track record designing runbooks, leading incidents/postmortems, and driving platform roadmaps
Preferred
Data processing frameworks (Kafka Streams, Flink, Spark Structured Streaming) and EOS semantics
Experience with Strimzi or Confluent for Kubernetes in production
Knowledge of CDC patterns and tools (Debezium) and database connectors at scale
Multi-region architectures, cluster linking strategies, and disaster recovery drills
Company
Selby Jennings
Global recruitment firm specialising in Banking
Funding
Current Stage
Late StageRecent News
Business Insider
2025-09-30
2025-07-10
Seattle TechFlash
2025-05-03
Company data provided by crunchbase