Cypress HCM ยท 6 hours ago
Distributed Systems Optimization Consultant
Maximize your interview chances
Human ResourcesInformation Technology
Growth Opportunities
Insider Connection @Cypress HCM
Get 3x more responses when you reach out via email instead of LinkedIn.
Responsibilities
Analyze the current Zookeeper setup and identify bottlenecks affecting performance.
Implement tuning measures for read/write latency, throughput, and leader election times.
Optimize JVM parameters and Zookeeper settings (e.g., tick time, heap size).
Architect solutions for fault tolerance and disaster recovery.
Design and implement multi-region and multi-data center deployments.
Establish robust configurations for quorum consistency and failover mechanisms.
Review monitoring tools (e.g., Prometheus, Grafana) to track Zookeeper health for resiliency.
Develop custom alerts for potential issues such as latency spikes, memory usage, and connection limits.
Work closely with engineering teams to ensure Zookeeper is optimized and resilient alongside other components like Kafka, RabbitMQ, Redis, and custom services.
Conduct capacity planning to ensure scalability for future workloads.
Qualification
Find out how your skills align with this job's requirements. If anything seems off, you can easily click on the tags to select or unselect skills to reflect your actual expertise.
Required
10+ years of hands-on experience managing and optimizing Apache Zookeeper in production environments at large scale.
Proven track record of designing resilient distributed systems.
Experience with RabbitMQ, Redis, and Kafka in distributed architectures.
Deep understanding of distributed systems, including Zookeeper internals (leader election, session management, quorum design).
Expertise in associated technologies like RabbitMQ, Redis, and Kafka, with an understanding of their integration into distributed environments.
Proficiency in monitoring and troubleshooting tools such as Prometheus, Grafana, or similar.
Strong scripting skills (e.g., Bash, Python) for automation.
Excellent problem-solving and communication abilities.
Preferred
Relevant certifications in distributed systems, messaging technologies, or DevOps practices are a plus.