Zookeeper

Zookeeper comprises of a set of servers that must know about each other. A leader is elected who ensures that every write is consistent and replicated to all other servers. Zookeeper provides a hierarchical view of the namespace, much like a filesystem. It is written in Java and you will find it as a dependency for a lot of other distributed systems.

  • All data is kept in-memory. It is not intended to be used as a datastore

  • It limits size of a znode to 1MB

  • A path in zookeeper is called znode. A znode cannot be deleted if it has children

  • All updates are totally-ordered. Zookeeper timestamps each message with a zxid (Zookeeper Transaction Id) including the read responses (by last zxid processed by the server.

  • It is not intended to run across data-centers as the servers need to be constantly replicating and talking to each other

Defaults

Name Value
Port 2181
Number of Nodes 3,5,7 but no more

Operationalizing

Readings

Zookeeper has detailed documentation on operational concerns and configurations.

https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html

It's important to keep a close watch on how many snapshots zookeeper has on disk too. This is highly correlated with how much memory Zookeeper will use from the operating system. Since Zookeeper 3.4.7, you can set autopurge.interval=<num> to assign how often to cleanup the snapshots.

Connections

Zookeeper tends to be very Network I/O heavy. You want to keep an eye on the number of active connections for all servers. You can modify this using maxClientCnxns property in the zoo.cfg file

Monitoring

4-Letter Words

You can send 4 Letter Words over telnet to Zookeeper for monitoring.1

    • mntr: print out monitoring information
    • srvr
    • ruok

Tools

There are various ways to monitor Zookeeper. Diamond is a good solution.

Exhibitor

Since Zookeeper exists as a critical piece of your whole infrastructure, making sure its up and running despite node failures is paramount. Netflix wrote a tool called Exhibitor that makes sure that Zookeeper nodes are installed, configured and are dynamically discoverable. They also have a great web UI for exploring and changing data if you ever need to.

Libraries

Zookeeper has a lot of support for programming in various languages. Here are some of the popular ones that I have used.

Java

As surprising as it sounds, the default one from apache/Zookeeper isn't all that great if you don't want to worry about a lot of corner cases with Zookeeper. So the fine folks at Netflix contributed a library called Curator which is used extensively by a lot of Java projects. It works together with Exhibitor to discover new zookeeper servers and also implements common algorithms like Leader Election, Queues in Zookeepers.

Go

https://github.com/samuel/go-zookeeper is a good library for Go. Additionally, Youtube has a good wrapper for Zookeeper that they wrote around go-zookeeper inside their vitess project2.

Python

I've found Kazoo to be pretty great for using Zookeeper in Python.

1. You want to supply them as "echo <word> | nc zk-server 2181"

results matching ""

    No results matching ""