zlacker

How do other distributed databases handle this?

replies(4): >>krilno+24 >>hendze+n5 >>jbelli+s6 >>voidma+Tz

>>neolef+(OP)
Google's Spanner [1] uses something it calls TrueTime:

"The key enabler of these properties is a new TrueTime API and its implementation. The API directly exposes clock uncertainty, and the guarantees on Spanner’s timestamps depend on the bounds that the implementation provides. If the uncertainty is large, Spanner slows down to wait out that uncertainty. Google’s cluster-management software provides an implementation of the TrueTime API. This implementation keeps uncertainty small (generally less than 10ms) by using multiple modern clock references (GPS and atomic clocks)."

[1] Spanner: Google's globally-distributed database https://www.usenix.org/system/files/conference/osdi12/osdi12...

replies(1): >>theatr+i6

>>neolef+(OP)
Unfortunately, not in a particularly clever way. CP systems such as MongoDB, HBase, etc. don't have this problem since each datum has an authoritative master. As you can imagine, this can result in some operational...unpleasantness due to the lack of liveness guarantees in the presence of a network partition.

Out of the well known open-source AP systems, Riak is probably the leader here since they implement well understood techniques from the literature such as CRDTs and vclocks.

EDIT: removed my statement about Cassandra since it was a bit misleading and jbellis answered above in greater detail.

>>krilno+24
With TrueTime you are trading some latency on concurrent operations for correctness.

Other structures such as CRDTs/lattices might be more appropriate for your use case.

replies(1): >>madhus+Gc

>>neolef+(OP)
Cassandra offers a mix of commutative operations (sets, maps, increments), an eventlog model, and lightweight (paxos-based) transactions. Unlike a key/value database like Riak, Cassandra can update individual fields of a row or document independently, which simplifies things enormously.

http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-v...

http://www.datastax.com/dev/blog/cql3_collections

http://www.datastax.com/dev/blog/lightweight-transactions-in...

replies(1): >>ssever+Oa

>>jbelli+s6
Cassandra suffers from the same problem and can drop updates. The Paxos transactions were and maybe still are an absolute joke as exposed by Aphyr.

>>theatr+i6
By correctness you mean consistency? You don't have to be consistent all the time, i.e. you can trade consistency, but never correctness.

If we could have traded correctness, we could have optimized everything and gone home by now :)

>>neolef+(OP)
FoundationDB provides real ACID transactions and external consistency, and definitely does NOT rely on clock accuracy for soundness! (Google Spanner, which we are often compared to, does use a trusted clock, but Google went to extreme measures to make it accurate, including installing atomic clocks and GPS hardware.)

As for how, it's a long story. At bottom we rely on Paxos for consistency across failures, but we only actually do Paxos when there are failures. (We use less costly synchronous techniques for replication in "happy times".)