zlacker

A minor quibble; iirc it was only connections that crossed datacenters that were encrypted. RPC connections within a cluster didn't need it, as the fiber taps were all done on the long distance fibers or at telco switching centers.

But otherwise you're totally right. I suspect the NSA got a nasty shock when the internal RPCs started becoming encrypted nearly overnight, just weeks after the "added and removed here" presentation. The fact that Google could roll out a change of that magnitude and at that speed, across the entire organization, would have been quite astonishing to them. And to think... all that work reverse engineering the internal protocols, burned in a matter of weeks.

replies(1): >>lern_t+nD

>>mike_h+(OP)
According to the reporting at the time, the NSA has shut down the email metadata collection program, which was the only leaked NSA program that parsed data on those taps, back in 2011; so the reverse engineering work was burned by an interagency review two years prior to Google's cross-datacenter encryption work.

replies(1): >>mike_h+231

>>lern_t+nD
They were tapping replication traffic on a database that included login IP addresses. I remember it well because it was a database my team had put there.

replies(1): >>lern_t+k72

>>mike_h+231
I missed that leak. Any chance you have a link for me to fill in my gap?

replies(1): >>mike_h+VY2

>>lern_t+k72
Slide 5 (Serendipity - New protocols) in this presentation:

https://github.com/iamcryptoki/snowden-archive/blob/master/d...

It's heavily redacted but the parts that are visible show they were targeting BigTable replication traffic (BTI_TabletServer RPCs) for "kansas-gaia" (Gaia is their account system), specifically the gaia_permission_whitelist table which was one of the tables used for the login risk analysis. You can see the string "last_logins" in the dump.

Note that the NSA didn't fully understand what they were looking at. They thought it was some sort of authentication or authorization RPC, but it wasn't.

In order to detect suspicious logins, e.g. from a new country or from an IP that's unlikely to be logging in to accounts, the datacenters processing logins needed to have a history of recent logins for every account. Before around 2011 they didn't have this - such data existed but only in logs processing clusters. To do real time analytics required the data to be replicated with low latency between clusters. The NSA were delighted by this because real-time IP address info tied to account names is exactly what they wanted. They didn't have it previously because a login was processed within a cluster, and user-to-cluster traffic was protected by SSL. After the authentication was done inter-cluster traffic related to a user was done using opaque IDs and tokens. I know all about this because I initiated and ran the anti-hijacking project there in about 2010.

The pie chart on slide 6 shows how valuable this traffic was to them. "Google Authorization, Security Question" and "gaia // permission_whitelist" (which are references to the same system) are their top target by far, followed by "no content" (presumably that means failed captures or something). The rest is some junk like indexing traffic that wouldn't have been useful to them.

Fortunately the BT replication traffic was easy to encrypt, as all the infrastructure was there already. It just needed a massive devops and capacity planning effort to get it turned on for everything.