Failover, Replication and Recovery in DianemoDB
If you missed the previous article discussing how our solution comes together, it might be a good idea to give that a look before this one.
We’ve claimed in the previous article that failover management and replication are solved by DianemoDB’s architecture, but we didn’t explain why. This is a contentious point, as replicating data over greater distances without impacting performance and maintaining consistency are seen to be inherently contradictory to each other. This is due to the fact that (simply put) you can either wait for further information to know what “the value” is at any point or decide to stop waiting, in which case you can’t be sure what it would have been. These sort of timeouts happen all the time when communicating over the internet (so called software defined networks), so this not a trivial concern either. So how do we get around this?
We don’t care about what’s in a database
That’s right. We don’t. We presume that each database will have the same information as long as it reads the same inputs from an ordered list. We can replicate computers by feeding them the same inputs as we did for the others before they crashed. We can use an older copy of a database, and if the original one crashes, we can simply start the new instance, which will know what the last input it saw was, and continue from there.
In other distributed consensus protocols, we need at least the majority of the computers to agree on a particular piece of information, so in order to be redundant, they have to maintain at least 3 replicas, so that if one of them disappears for some reason, the other two can still carry on.
Our system is different. With ours, you can run multiple instances of the same computer if you wish, or you could just copy over their files from time to time and make sure that they could potentially read from the same ordered list of inputs, so that if the main server stops, a new one can be started, which (starting with the one after the last offset processed in the file) will catch up very quickly to the original one. It doesn’t matter if the previous one surfaces in the meantime, since the messages published by these instances are idempotent.
Our implementation does all this now.
The catch
You still need to make sure that the ordered list of inputs is resilient enough for your requirements. If we take Kafka as an example, it has a large body of literature which concerns multi-datacentre setups. Or, if you lack the in-house expertise to do so, you can always opt for a managed event-log solution, such as Amazon’s Kinesis, Google’s Pub/Sub or Microsoft’s Azure Event Hubs.