zlacker

Had nothing to do with Snowdon but with Google ranking algo changes. Google has a commercial interest of hindering competitors in the add brokering market from observing info on the wire.

replies(3): >>blitza+Z2 >>mike_h+Hb >>kevin_+rr

>>PeterS+(OP)
Google might just be the biggest advocate of https out there, certainly (from my recollection) post Snowden. There has been a lot of progress made over the years.

https://transparencyreport.google.com/https/overview

https://transparencyreport.google.com/safer-email/overview - transmitting email with some form of encryption is probably a bigger and completely unseen problem that is similar

>>PeterS+(OP)
It had everything to do with Snowden. Source: I was at Google at the time he started leaking.

Before Snowden encryption was something that was mostly seen as a way to protect login forms. People knew it'd be nice to use it for everything but there were difficult technical and capacity/budget problems in the way because SSL was slow.

After Snowden two things happened:

1. Encryption of everything became the companies top priority. Budget became unlimited, other projects were shelved, whole teams were staffed to solve the latency problems. Not only for Google's own public facing web servers but all internal traffic, and they began working explicitly on working out what it'd take to get the entire internet to be encrypted.

2. End-to-end encryption of messengers (a misnomer IMHO but that's what they call it) went from an obscure feature for privacy and crypto nerds to a top priority project for every consumer facing app that took itself seriously.

The result was a massive increase in the amount of traffic that was encrypted. Maybe that would have eventually happened anyway, but it would have been far, far slower without Edward.

replies(2): >>KennyB+zn >>lern_t+8B

>>mike_h+Hb
That's nice and all, but the "why" is more important than the "what".

Google was driven not out of some panicked rush to protect user privacy, but to protect Google's collection and storage of user data.

Google has 10+ years of my email. It doesn't treat that like Fort Knox because it gives a shit about my privacy; it treats it like Fort Knox because it wants to use that for itself and provide services to others based off it.

You do know that Google was heavily seed-funded by the NSA, right?

>>PeterS+(OP)
There was literally a PowerPoint slide in the released docs implying they had backdoored Google's internal servers.

>>mike_h+Hb
You were at Google at the time, but your memory of the ordering of events is off. Google used HTTPS everywhere before Snowden.[1][2] HTTPS on just the login form protects the password to prevent a MITM from collecting it and using it on other websites, but it doesn't prevent someone from just taking the logged in cookie and reusing it on the same website. That was a known issue before Snowden, and Google had already addressed it. Many other websites, including Yahoo, didn't start using HTTPS everywhere until after Snowden.[3] I know because this was something I was interested in when using public WiFi points that were popping up at the time. I also remember when Facebook moved their homepage to HTTPS.[4] Previously, only the login form POSTed to an HTTPS endpoint, but that doesn't protect against the login form being modified by a MITM to have a different action for the MITM to get your password, rendering the whole thing useless.

What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]

[1]https://gmail.googleblog.com/2010/01/default-https-access-fo...

[2]https://googleblog.blogspot.com/2011/10/making-search-more-s...

[3]https://www.zdnet.com/article/yahoo-finally-enables-https-en...

[4]https://techcrunch.com/2012/11/18/facebook-https/

[5]https://arstechnica.com/information-technology/2013/11/googl...

replies(3): >>pkaedi+YQ >>fl0ki+U11 >>mike_h+091

>>lern_t+8B
Right, I remember (as an outsider to google) the push for https coming after Firesheep [1] and the google research on the actual CPU cost of https [2], both in 2010. Snowden's revelations came in 2013.

[1] https://en.m.wikipedia.org/wiki/Firesheep [2] https://www.imperialviolet.org/2010/06/25/overclocking-ssl.h...

>>lern_t+8B
An important clarification is that the leaks about NSA snooping on Google motivated end-to-end encryption between all pairs of Google internal services. It was a technical marvel, every Stubby connection had mutual TLS without any extra code or configuration required. Non-Stubby traffic needed special security review because it had to reinvent much of the same.

People even got internal schwag shirts made of the iconic "SSL added and removed here" note [1]. It became part of the culture.

Over a decade later I still see most environments incur a lot of dev & ops overhead to get anywhere close to what Google got working completely transparently. The leak might have motivated the work, but the insight that it had to be automatic, foolproof, and universal is what made it so effective.

[1] https://blog.encrypt.me/2013/11/05/ssl-added-and-removed-her...

replies(1): >>mike_h+a91

>>lern_t+8B
The first two links are about Gmail and personalized results in web search specifically. Even as late as 2011 SSL being activated for a product was treated as unusual enough to write blog posts about, and it was up to individual projects whether or not to activate it and how to trade off the latency costs.

You're right that I might be mis-remembering the ordering of things, but I'm pretty sure by the time Snowden came around the vast majority of traffic was still unencrypted. Bearing in mind that lot of Google's traffic was stuff you wouldn't necessarily think of, like YouTube Thumbnails, map tiles and Omaha pings (for software update). Web search and Gmail by that point made up a relatively small amount of it, albeit valuable. Look at how the Chrome updater does update checks and you'll discover it uses some weird custom protocol which exists purely because at the time it was designed Google was in a massive LB CPU capacity crunch caused by turning on SSL for as many services as possible. Omaha controlled the client so had the flexibility to do cryptographic offload and was pushed to do so, to free up capacity for other services.

> What changed after Snowden was how Google encrypts traffic on its network, according to an article quoting you at the time.[5]

That also changed and did so at enormous speed, but I'm pretty sure by June 2013 most external traffic still didn't have TLS applied. It looks like Facebook started going all-SSL just 8 months before Snowden.

replies(1): >>lern_t+SM1

>>fl0ki+U11
A minor quibble; iirc it was only connections that crossed datacenters that were encrypted. RPC connections within a cluster didn't need it, as the fiber taps were all done on the long distance fibers or at telco switching centers.

But otherwise you're totally right. I suspect the NSA got a nasty shock when the internal RPCs started becoming encrypted nearly overnight, just weeks after the "added and removed here" presentation. The fact that Google could roll out a change of that magnitude and at that speed, across the entire organization, would have been quite astonishing to them. And to think... all that work reverse engineering the internal protocols, burned in a matter of weeks.

replies(1): >>lern_t+xM1

>>mike_h+a91
According to the reporting at the time, the NSA has shut down the email metadata collection program, which was the only leaked NSA program that parsed data on those taps, back in 2011; so the reverse engineering work was burned by an interagency review two years prior to Google's cross-datacenter encryption work.

replies(1): >>mike_h+cc2

>>mike_h+091
I had completely forgotten about YouTube. I think it switched to https video serving post-Snowden, but I can't find the announcement.

Edit: Here it is. Only 25% of YouTube's traffic was encrypted at the start of 2014. https://web.archive.org/web/20160802000052/https://youtube-e...

>>lern_t+xM1
They were tapping replication traffic on a database that included login IP addresses. I remember it well because it was a database my team had put there.

replies(1): >>lern_t+ug3

>>mike_h+cc2
I missed that leak. Any chance you have a link for me to fill in my gap?

replies(1): >>mike_h+584

>>lern_t+ug3
Slide 5 (Serendipity - New protocols) in this presentation:

https://github.com/iamcryptoki/snowden-archive/blob/master/d...

It's heavily redacted but the parts that are visible show they were targeting BigTable replication traffic (BTI_TabletServer RPCs) for "kansas-gaia" (Gaia is their account system), specifically the gaia_permission_whitelist table which was one of the tables used for the login risk analysis. You can see the string "last_logins" in the dump.

Note that the NSA didn't fully understand what they were looking at. They thought it was some sort of authentication or authorization RPC, but it wasn't.

In order to detect suspicious logins, e.g. from a new country or from an IP that's unlikely to be logging in to accounts, the datacenters processing logins needed to have a history of recent logins for every account. Before around 2011 they didn't have this - such data existed but only in logs processing clusters. To do real time analytics required the data to be replicated with low latency between clusters. The NSA were delighted by this because real-time IP address info tied to account names is exactly what they wanted. They didn't have it previously because a login was processed within a cluster, and user-to-cluster traffic was protected by SSL. After the authentication was done inter-cluster traffic related to a user was done using opaque IDs and tokens. I know all about this because I initiated and ran the anti-hijacking project there in about 2010.

The pie chart on slide 6 shows how valuable this traffic was to them. "Google Authorization, Security Question" and "gaia // permission_whitelist" (which are references to the same system) are their top target by far, followed by "no content" (presumably that means failed captures or something). The rest is some junk like indexing traffic that wouldn't have been useful to them.

Fortunately the BT replication traffic was easy to encrypt, as all the infrastructure was there already. It just needed a massive devops and capacity planning effort to get it turned on for everything.