https://transparencyreport.google.com/https/overview
https://transparencyreport.google.com/safer-email/overview - transmitting email with some form of encryption is probably a bigger and completely unseen problem that is similar
Before Snowden encryption was something that was mostly seen as a way to protect login forms. People knew it'd be nice to use it for everything but there were difficult technical and capacity/budget problems in the way because SSL was slow.
After Snowden two things happened:
1. Encryption of everything became the companies top priority. Budget became unlimited, other projects were shelved, whole teams were staffed to solve the latency problems. Not only for Google's own public facing web servers but all internal traffic, and they began working explicitly on working out what it'd take to get the entire internet to be encrypted.
2. End-to-end encryption of messengers (a misnomer IMHO but that's what they call it) went from an obscure feature for privacy and crypto nerds to a top priority project for every consumer facing app that took itself seriously.
The result was a massive increase in the amount of traffic that was encrypted. Maybe that would have eventually happened anyway, but it would have been far, far slower without Edward.
Google was driven not out of some panicked rush to protect user privacy, but to protect Google's collection and storage of user data.
Google has 10+ years of my email. It doesn't treat that like Fort Knox because it gives a shit about my privacy; it treats it like Fort Knox because it wants to use that for itself and provide services to others based off it.
You do know that Google was heavily seed-funded by the NSA, right?
What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]
[1]https://gmail.googleblog.com/2010/01/default-https-access-fo...
[2]https://googleblog.blogspot.com/2011/10/making-search-more-s...
[3]https://www.zdnet.com/article/yahoo-finally-enables-https-en...
[4]https://techcrunch.com/2012/11/18/facebook-https/
[5]https://arstechnica.com/information-technology/2013/11/googl...
[1] https://en.m.wikipedia.org/wiki/Firesheep [2] https://www.imperialviolet.org/2010/06/25/overclocking-ssl.h...
People even got internal schwag shirts made of the iconic "SSL added and removed here" note [1]. It became part of the culture.
Over a decade later I still see most environments incur a lot of dev & ops overhead to get anywhere close to what Google got working completely transparently. The leak might have motivated the work, but the insight that it had to be automatic, foolproof, and universal is what made it so effective.
[1] https://blog.encrypt.me/2013/11/05/ssl-added-and-removed-her...
You're right that I might be mis-remembering the ordering of things, but I'm pretty sure by the time Snowden came around the vast majority of traffic was still unencrypted. Bearing in mind that lot of Google's traffic was stuff you wouldn't necessarily think of, like YouTube Thumbnails, map tiles and Omaha pings (for software update). Web search and Gmail by that point made up a relatively small amount of it, albeit valuable. Look at how the Chrome updater does update checks and you'll discover it uses some weird custom protocol which exists purely because at the time it was designed Google was in a massive LB CPU capacity crunch caused by turning on SSL for as many services as possible. Omaha controlled the client so had the flexibility to do cryptographic offload and was pushed to do so, to free up capacity for other services.
> What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]
That also changed and did so at enormous speed, but I'm pretty sure by June 2013 most external traffic still didn't have TLS applied. It looks like Facebook started going all-SSL just 8 months before Snowden.
But otherwise you're totally right. I suspect the NSA got a nasty shock when the internal RPCs started becoming encrypted nearly overnight, just weeks after the "added and removed here" presentation. The fact that Google could roll out a change of that magnitude and at that speed, across the entire organization, would have been quite astonishing to them. And to think... all that work reverse engineering the internal protocols, burned in a matter of weeks.
Edit: Here it is. Only 25% of YouTube's traffic was encrypted at the start of 2014. https://web.archive.org/web/20160802000052/https://youtube-e...
https://github.com/iamcryptoki/snowden-archive/blob/master/d...
It's heavily redacted but the parts that are visible show they were targeting BigTable replication traffic (BTI_TabletServer RPCs) for "kansas-gaia" (Gaia is their account system), specifically the gaia_permission_whitelist table which was one of the tables used for the login risk analysis. You can see the string "last_logins" in the dump.
Note that the NSA didn't fully understand what they were looking at. They thought it was some sort of authentication or authorization RPC, but it wasn't.
In order to detect suspicious logins, e.g. from a new country or from an IP that's unlikely to be logging in to accounts, the datacenters processing logins needed to have a history of recent logins for every account. Before around 2011 they didn't have this - such data existed but only in logs processing clusters. To do real time analytics required the data to be replicated with low latency between clusters. The NSA were delighted by this because real-time IP address info tied to account names is exactly what they wanted. They didn't have it previously because a login was processed within a cluster, and user-to-cluster traffic was protected by SSL. After the authentication was done inter-cluster traffic related to a user was done using opaque IDs and tokens. I know all about this because I initiated and ran the anti-hijacking project there in about 2010.
The pie chart on slide 6 shows how valuable this traffic was to them. "Google Authorization, Security Question" and "gaia // permission_whitelist" (which are references to the same system) are their top target by far, followed by "no content" (presumably that means failed captures or something). The rest is some junk like indexing traffic that wouldn't have been useful to them.
Fortunately the BT replication traffic was easy to encrypt, as all the infrastructure was there already. It just needed a massive devops and capacity planning effort to get it turned on for everything.