Why do we run servers in odd numbers?

Partition tolerant consensus algorithms use an odd number of nodes (e.g. 3, 5 or 7). With just two nodes, it is not possible to have a clear majority after a failure. For example, if the number of nodes is three, then the system is resilient to one node failure; with five nodes the system is resilient to two node failures.

Applies to: Zookeeper, Kafka, Consul, Cassandra

What books should I read to get started?

There are many good answers to this question and perhaps requires a chapter by itself. I recommend starting out here: http://www.aosabook.org/en/distsys.html -- along with a lot of great references, this book is a great place to get a high level view. There's an awesome list on github that is likely to have more upto date information. https://github.com/theanalyst/awesome-distributed-systems

I would like to give a special shout-out to "Distributed Systems for Fun and Profit" and "Notes on Distributed Systems for Young Bloods" as great references to start.

faqs

Why do we run servers in odd numbers?

What books should I read to get started?

results matching ""

No results matching ""