Why do we run servers in odd numbers?
Partition tolerant consensus algorithms use an odd number of nodes (e.g. 3, 5 or 7). With just two nodes, it is not possible to have a clear majority after a failure. For example, if the number of nodes is three, then the system is resilient to one node failure; with five nodes the system is resilient to two node failures.
Applies to: Zookeeper, Kafka, Consul, Cassandra
What books should I read to get started?
There are many good answers to this question and perhaps requires a chapter by itself. I recommend starting out here: http://www.aosabook.org/en/distsys.html -- along with a lot of great references, this book is a great place to get a high level view. There's an awesome list on github that is likely to have more upto date information. https://github.com/theanalyst/awesome-distributed-systems
I would like to give a special shout-out to "Distributed Systems for Fun and Profit" and "Notes on Distributed Systems for Young Bloods" as great references to start.