Distributing consistency

  • resilient enough to withstand the loss of some computers
  • provide delivery guarantees between different computers
  • allow for global ordering for any two information
  • work across large geographical distances
  • not be bottlenecked locally by these constraints

A white piece of paper

I started out by looking at the technology available (in 2015). By that time, we already built a lot of things on event sourcing. I was fascinated by the idea of managing data using a single, sequenced array of events and how much simpler it was to stream those events to remote servers as opposed to communicating the end-results of what a process wrote into the database. The problem was that there is no obvious way to establish a single, global order between events in multiple such sequences of events. You either had a single one, in which case this would either only work locally and be a bottleneck both when communicating with remote servers and when scaling services, or be distributed, in which case we are left with what is called “eventual consistency”, which is a technical term for saying “your data will be correct sometime in the near future, but we don’t know exactly when”.

“A man with a watch knows what time it is. A man with two watches is never sure.” (Segal’s law)

The first idea was that if

  • I had timestamps, to which I could compare other such timestamps, and
  • If each write-process wrote its information using a single such timestamp from the future and
  • If each read-process knew what the current timestamps is, and hence symbolize reliable information

A distributed, global clock

An atomic operation is something which is processed by a computer in a single step. The only atomic operation in an event sourcing based system is the processing of a single event. Therefore, I started to use these to mark atomic events on each node. I established a circular progression of these events and established a value which grows monotonically within a single set of nodes.

Overlapping transactions

The last major problem I was left with were the overlapping write processes. In plain English, I needed to make sure that processes which are trying to modify the same information at the same time will somehow be detected and managed. In other words, how do I make sure that two users cannot both think they reserved the same airplane-seat or bought the last lawnmower on sale and that the system knows everywhere who won and who lost? I also needed to conserve the idea that a single node doesn’t need anything but its events to work the same way, so that meant no clocks, no randomness and no relying on any other external factors.

  • B’s data is deleted, the client API is notified that it either has to retry or notify the client and
  • A can finalise its information, so that subsequent read processes can find its data.

Where this leads us

Let’s revisit our original requirements.

Conclusion

We’re not saying that this system is perfect or that there are no special considerations to be made when using it, especially across large distances. What we are saying is that it’s much simpler to use and maintain when compared with other distributed systems, that it’s capable of providing the sort of guarantees which would have made our lives much easier and that this technology can open the doors to building global-scale databases.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andras Gerlits

Andras Gerlits

Neither distance nor latency nor badly synced clocks stays these transactions from the swift completion of their appointed writes!