Why do we run servers in odd numbers?

Partition tolerant consensus algorithms use an odd number of nodes (e.g. 3, 5 or 7). With just two nodes, it is not possible to have a clear majority after a failure. For example, if the number of nodes is three, then the system is resilient to one node failure; with five nodes the system is resilient to two node failures.

Applies to: Zookeeper, Kafka, Consul, Cassandra

What books should I read to get started?

There are many good answers to this question and perhaps requires a chapter by itself. I recommend starting out here: http://www.aosabook.org/en/distsys.html -- along with a lot of great references, this book is a great place to get a high level view. There's an awesome list on github that is likely to have more upto date information. https://github.com/theanalyst/awesome-distributed-systems

I would like to give a special shout-out to "Distributed Systems for Fun and Profit" and "Notes on Distributed Systems for Young Bloods" as great references to start.

results matching ""

    No results matching ""