Selby Jennings · 4 hours ago
Senior Kafka Platform Engineer
Selby Jennings is seeking a Senior Kafka Platform Engineer to design, automate, and scale a mission-critical event-streaming platform. The role involves owning the core Kafka environment and collaborating closely with engineering teams to ensure robust, performant, and secure streaming capabilities.
Responsibilities
Kafka Platform Ownership: Architect, deploy, and operate production‑grade Kafka clusters (self‑managed or cloud‑hosted), overseeing upgrades, scaling strategies, capacity modeling, and multi‑AZ/region resiliency
Kubernetes & Automation: Run Kafka on Kubernetes using Operators, Helm, and GitOps; build automation frameworks and guardrails using IaC to support repeatable, compliant, zero‑downtime deployments
Ecosystem Services: Manage and optimize Kafka Connect, Schema Registry, and replication technologies (MirrorMaker 2, Cluster Linking); define connector standards and enable self‑service provisioning
Reliability Engineering: Establish SLOs, own incident response, maintain runbooks, conduct postmortems, and develop automated remediation and resilience patterns
Observability: Build and maintain monitoring for metrics, logs, traces, consumer lag, partition health, and capacity insights using tools such as Prometheus, Grafana, Burrow, Cruise Control, or OpenTelemetry
Security & Compliance: Implement encryption, authentication, authorization, secrets management, network policies, and audit controls for secure data‑in‑motion
Streaming Best Practices: Guide application teams on topic strategy, partitioning, retention and compaction tuning, idempotency, ordering guarantees, schema evolution, DLQs, and exactly‑once semantics
Cross‑Functional Collaboration: Partner with application, data, platform, and SRE teams to provide tooling, documentation, enablement, and architectural guidance
Technical Leadership: Mentor engineers, help shape platform strategy, and contribute to long‑term standards and roadmap decisions
Qualification
Required
Extensive hands‑on experience operating Kafka in production environments at scale, including brokers, controllers, replication, ISR dynamics, rebalancing, storage tiers, and failure recovery
Strong background operating stateful systems on Kubernetes using Operators, Helm, CRDs, and cloud‑native patterns
Proficiency with IaC tools (e.g., Terraform), GitOps workflows (Argo CD or Flux), and CI/CD tooling for full lifecycle automation
Strong scripting and development experience in Python, Go, or Java; plus solid Bash and Linux fundamentals (networking, filesystems, JVM tuning)
Expertise in Kafka performance troubleshooting, capacity planning, monitoring stacks, and alerting workflows
Hands‑on experience with TLS/mTLS, SASL/OAuth, ACL/RBAC, and secret‑management solutions such as Vault
Experience with Kafka Connect, Schema Registry, MirrorMaker 2/Cluster Linking; familiarity with Cruise Control
Knowledge of AWS, Azure, or GCP networking, IAM, and managed streaming services such as Confluent Cloud or AWS MSK
Demonstrated ability to write runbooks, lead incidents, and drive platform improvements
Preferred
Experience with stream‑processing frameworks (Kafka Streams, Flink, Spark Structured Streaming)
Background running Strimzi or Confluent for Kubernetes in production
Knowledge of CDC technologies and connector operations at scale (e.g., Debezium)
Experience designing multi‑region architectures, cluster‑linking strategies, and disaster‑recovery processes
Company
Selby Jennings
Global recruitment firm specialising in Banking
Funding
Current Stage
Late StageRecent News
Business Insider
2025-09-30
2025-07-10
Seattle TechFlash
2025-05-03
Company data provided by crunchbase