Glossary

This isn’t everything you will run into and definitely not a replacement for a textbook which will have better thought-out explanations than what is written here. This is intended to make communication easier between teams and also allow engineers to dive deeper into the fundamentals.

A lot of these terms are overloaded in other areas of computer science too and sometimes even within the discipline of Distributed Systems.

Caveat: These definitions are taken from internet and not intended to pass a test of academic rigor. That doesn’t mean they’re incorrect but that they’re simplified and not a substitute for reading the manual. There are some great resources to dive deeper into the details in an approachable way. http://book.mixu.net/distsys/replication.html

Kafka

Kafka is a High Performance message queue -- services use this as a way to share messages among each other at various steps of processing. Each Kafka server is referred to as a “Broker”

See: https://kafka.apache.org/intro for a more detailed introduction. More in the section of “How Kafka Works”

Zookeeper

Think of Zookeeper as a distributed directory structure. Services can use this for coordinating with each other. A group of zookeeper servers is called an Ensemble. Zookeeper exists because coordinating distributed systems can be notoriously hard. By all services using a common service for coordinating, this risk of everyone creating own coordination algorithms is alleviated.

See: https://zookeeper.apache.org/doc/trunk/zookeeperOver.html

Leader

Leader is a process that coordinates a task. In some systems they are called Coordinator or Controller, Master, etc. You have a group of followers ready to take over in case the master fails.

Follower

Servers ready to take over in case of leader failure -- also referred to as slave, replica, agent, etc.

Leader Election

Leader election is the process of designating a single process) as the organizer of some task distributed among several computers (nodes). Before the task is begun, all network nodes are either unaware which node will serve as the "leader" (or coordinator) of the task, or unable to communicate with the current coordinator. After a leader election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task leader.

A valid leader election algorithm must meet the following conditions[2]</sup>

Termination: the algorithm should finish within a finite time once the leader is selected. In randomized approaches this condition is sometimes weakened (for example, requiring termination with probability 1).

Uniqueness: there is exactly one processor that considers itself as leader.

Agreement: all other processors know who the leader is.

Rebalancing

Change the current set of services leaders/followers for doing a certain task. For example a storm cluster rebalance will change which topology goes into which machine, a Kafka topic rebalance changes which broker owns which partition.

Topic

In Kafka (and other Message Queues) a topic is a category or a feed-name where records are published and many consumers can read from.

Fault Tolerance

Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability or life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation.[1]

Consensus

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires processes to agree on some data value that is needed during computation. Examples of applications of consensus include whether to commit a transaction to a database, agreeing on the identity of a leader, state machine replication, and atomic broadcasts. The real world applications include clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs, load balancing) and others.

Replica

A server that copies or has data copied to from a primary source of data. It’s expected to keep up with the master otherwise it cannot take over. In some database systems, you can distinguish between an active replica and a passive one. And, some services even allow a special voting-only or arbiter replica whose job is to ensure that voting will always succeed.

Paxos

Paxos -- other than being a beautiful city in Greece is a famous (in some circles) algorithm for getting potentially faulty servers to agree with each other. It was published by Leslie Lamport in 1989 but was only practically used by Google’s Chubby system (which inspired Zookeeper) in 2006[3]</sup>. Zookeeper uses a much simpler algorithm called ZAB instead[4]</sup>.

See: https://www.quora.com/Distributed-Systems-What-is-a-simple-explanation-of-the-Paxos-algorithm/answer/Vineet-Gupta

Raft

Paxos is considered to be notoriously hard to understand and even harder to implement and validate correctly. So, in 2009 Raft[5]</sup> was written as an easier to implement consensus algorithm. Systems like Consul and Etcd use Raft for consensus. Paxos also had the flaw that any server can be a leader, so if there is a network partition, you can have two leaders with different concept of what the true value is (this is often referred to as split-brain). Raft restricts it by comparing indices and ensures only nodes with most up-to-date can be the leader.

This is a good interactive representation of what goes on in the algorithm: http://thesecretlivesofdata.com/raft/

Quorum

The minimum number of votes required to perform a task. This is enforced to ensure consistent operations in a distributed system. Distributed database systems ensure a Quorum on Commits, and Kafka in particular returns acknowledgement it was able to successfully replicate to some minimum number of replicas. If you have n servers, a majority quorum would be floor(n/2)+1. For example, if there are 5 members in the peer set, we would need 3 nodes to form a quorum. If a quorum of nodes is unavailable for any reason, the cluster becomes unavailable and no new logs can be committed.

Network Partition

A network partition refers to the failure of a network device that causes a network to be split. For example, in a network with multiple subnets where nodes A and B are located in one subnet and nodes C and D are in another, a partition occurs if the switch between the two subnets fails.

See: https://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions

Split Brain

Inspired by the medical term “split-brain,” it indicates data or availability inconsistencies because of failure condition based on servers not communicating and synchronizing their data to each other.

CAP Theorem

Eric Brewer then a principal scientist at Inktomi conjectured at PODC 2000 that between Consistency, Availability and Partition Tolerance you can pick only two. Two years later it was formally proved[6]</sup> by Seth Gilbert and Nancy Lynch.

This is copied verbatim from http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

CAP Theorem comes to life as an application scales. At low transactional volumes, small latencies to allow databases to get consistent has no noticeable affect on either overall performance or the user experience. Any load distribution you do undertake, therefore, is likely to be for systems management reasons.

But as activity increases, these pinch-points in throughput will begin limit growth and create errors. It’s one thing having to wait for a web page to come back with a response and another experience altogether to enter your credit card details to be met with “HTTP 500 java.lang.schrodinger.purchasingerror” and wonder whether you’ve just paid for something you won’t get, not paid at all, or maybe the error is immaterial to this transaction. Who knows? You are unlikely to continue, more likely to shop elsewhere, and very likely to phone your bank.

FLP impossibility

This theorem states that “It’s impossible for a set of processors in an asynchronous system to agree on a binary value, even if only a single process is subject to an unannounced failure.” [7]</sup>

It makes it impossible to tell whether or not a machine has crashed (and therefore it will launch recovery and coordinate with you safely) or you just can’t reach it now (and therefore it’s running separately from you, potentially doing stuff in disagreement with you)

See: https://yuolivia.wordpress.com/2014/11/24/consensus-byzantine-general-problem-paxos-flp-impossibility-authenticated-broadcast/

http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/

Eight Fallacies

In 1994, Peter Deutsch, a sun fellow at the time, drafted 7 assumptions architects and designers of distributed systems are likely to make, which prove wrong in the long run - resulting in all sorts of troubles and pains for the solution and architects who made the assumptions. In 1997 James Gosling added another such fallacy[8]</sup>. The assumptions are now collectively known as the "The 8 fallacies of distributed computing" :

1. The network is reliable.

2. Latency is zero.

3. Bandwidth is infinite.

4. The network is secure.

5. Topology doesn't change.

6. There is one administrator.

7. Transport cost is zero.

8. The network is homogeneous.

See:

http://www.rgoarchitects.com/Files/fallacies.pdf

ACID

In databases, ACID refers to the set of properties of database transactions as a sequence of operations that satisfies the following properties.

A = Atomicity. All of it succeeds or nothing does

C = Consistency. Database goes from one valid state to another

I = Isolation. Effects of one Incomplete transaction isn’t visible to another transaction

D = Durability. Once a transaction is committed, it remains so.

The caveat here is that in reality, software written on top of databases that purport ACID compliance make tradeoffs and often in the interest of abstraction, tend to do concurrency control themselves. This is often referred to feral-concurrency[9]</sup>.

Popular databases like MySQL, Postgres, Oracle, MS-SQL are ACID compliant whereas Cassandra, MongoDB are not. Eric Brewer referred to a reverse as BASE (Basic Availabilty, Soft State, Eventual Consistency).

Eventual Consistency

There’s a good reason a lot of ACID compliant databases run on single machines and replicate their transaction logs to another server for high availability. Certain theoretical and practical limitations make database transactions over networks impossible to do in real-time. Rather than requiring consistency all the time, it is enough for databases to be “eventually consistent”. Accounting systems like ATMs do this all the time, the data is made consistent when the books are closed, and at any given time an approximate value is known by central system.

Most systems fall in this continuum of ACID and BASE, and its upto the business need and risks to decide what the correct tradeoff is.[10]</sup>

CGroup

Cgroups or Control Groups are abstractions in Linux that allow you to allocate resources in a single node such as CPU Time, System Memory, Network Bandwidth, or combination of these resources. They are organized hierarchically in /proc filesystem. You can monitor, configure cgroups and can even be reconfigured dynamically. [11]</sup> Since they’re organized hierarchically, they inherit certain attributes from the parent cgroup. This means any child processes also inherit the cgroup’s limits.

Docker

Docker is a process that coordinates usage of cgroups. It provides a jail like abstraction to create containers for services for easily shipping, starting, stopping them.

Consul

Consul is a distributed, highly available key-value system. Every node that provides services to consul runs an agent, this agent is responsible for health-check and forwarding any requests to the servers automatically. It is able to run across data-centers out of the box and allows for service-discovery.[12]</sup>

Consul-template

It’s a templating service that checks consul servers for keys and generates a modified file any time the value for a key is changed. As an added bonus, Consul Template can execute arbitrary commands when a template update completes.[13]</sup>

Mesos

Mesos is an open-source cluster manager that was developed at UC Berkeley that provides efficient resource isolation, sharing across distributed applications and “Frameworks.” It uses Linux cgroups to provide isolation for CPU, Memory, I/O and filesystem (or anything else like Network usage, GPUs).

Jepsen

Jepsen is a tool that Kyle Kingsbury (@aphyr) wrote to test partition tolerance for various Distributed Systems. His tests and talks are highly regarded in the community.

2. I. Gupta, R. van Renesse, and K. P. Birman,2000, A Probabilistically Correct Leader Election Protocol for Large Groups, Technical Report , Cornell University

3:

4:

5:

6:

7:

8:

9:

10:

11:

12:

13:

results matching ""

    No results matching ""