Who needs consistency guarantees anyway?

3 min readFeb 3, 2021

In the previous article, I described what challenges led to designing and creating our data integration platform. In this one, I invite you to consider how this new technology can help you even if you’re not a multi-national conglomerate (although I won’t hold it against you if you are).

Distributing consistency

In my previous life, I worked for a number of different multi-national banks as a freelancer. I had the opportunity to…

andrasgerlits.medium.com

State, distribution and databases

There is no software in the world which doesn’t manage some data. Crudely speaking, your data is all the values of all the variables used by your software. A graphical operating system which doesn’t know where you placed your windows or a browser which doesn’t know which webpage you wanted to open would not be very useful. By and large all software needs some way to store and retrieve information for it to work. Databases are just that, stores of information, where the program can go to ask questions and modify existing data.

When you decide you’re going to develop software on a platform which can serve potentially inconsistent data, you’ve already made a decision to complicate your development by at least some degree. In the best case, it’s going to be more scalable, more maintainable and easier to introduce new services, but it’s hard to deny that it requires a lot more planning and expertise. In the worst case it will open up your service to some serious exploits and/or leave you with an unstable, unpredictable piece of software, which will need expensive consultants and/or years of overtime work to stabilise.

This debate called the “monolith vs microservices” has raged on for decades, if with different names. We had the same issues when we did something called Service Oriented Architecture in the noughties and I’m sure people older than me could show examples from 30 or 40 years ago.

I think it’s worth reminding ourselves that when we talk about databases, we shouldn’t necessarily mean “the SQL database instance running on the beefy hardware”, but each piece information our software could need.

Communication and state

Mostly, we need multiple computers to communicate with each other, because we need to:

be able to grow our platforms beyond a single database
safeguard our information from failure
be more responsive in different regions

In other words, we need our state to be distributed for various reasons. We also need these separate bits of information to be available to us as one coherent set. That makes it hard to separate the SQL database from the information that was received from another computer, which leaves us with an “eventually consistent” model, even if we would rather not go there.

In other words, if you’re relying on more than one SQL instance (or a single consistent store) for your company, you’re already exposing yourself to potential inconsistencies with all its pitfalls.

Blue-sky thinking

What would the ideal setup be?

I want to keep using my existing tools and not make all my colleagues learn yet another language.

We provide a CRUD API written in Java, which allows the clients to access to our cluster. This is easy to tailor to specific operations.

I would like to be able to retrofit some consistency guarantees on top of my existing software, as seamlessly as possible.

We offer the ability for your services to exchange information with each other atomically. Our current isolation level is SNAPSHOT, which is well above the usual levels used for SQL.

I want to be able to run it within my existing support and monitoring structure.

We only use Kafka and nothing else. We don’t even need anything special from it, it doesn’t even have to be configured as idempotent, it just has to be available.

I would like to have a single interface, where multiple nodes from potentially distant datacentres can participate in a single, atomic transaction.

You’re in luck. Drop us an email at info@dianemodb.com