"The key enabler of these properties is a new TrueTime API and its implementation. The API directly exposes clock uncertainty, and the guarantees on Spanner’s timestamps depend on the bounds that the implementation provides. If the uncertainty is large, Spanner slows down to wait out that uncertainty. Google’s cluster-management software provides an implementation of the TrueTime API. This implementation keeps uncertainty small (generally less than 10ms) by using multiple modern clock references (GPS and atomic clocks)."
[1] Spanner: Google's globally-distributed database https://www.usenix.org/system/files/conference/osdi12/osdi12...
Out of the well known open-source AP systems, Riak is probably the leader here since they implement well understood techniques from the literature such as CRDTs and vclocks.
EDIT: removed my statement about Cassandra since it was a bit misleading and jbellis answered above in greater detail.
Other structures such as CRDTs/lattices might be more appropriate for your use case.
http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-v...
http://www.datastax.com/dev/blog/cql3_collections
http://www.datastax.com/dev/blog/lightweight-transactions-in...
If we could have traded correctness, we could have optimized everything and gone home by now :)
As for how, it's a long story. At bottom we rely on Paxos for consistency across failures, but we only actually do Paxos when there are failures. (We use less costly synchronous techniques for replication in "happy times".)