I feel like this is too wide. Everyone collects data. I don't mean all tech companies collect data, I mean, for example, your friends have copies of the emails you've sent them. They have photos with you in them of places you've been with timestamps and GPS coordinates. Your coworkers have access to your calendar. Your mechanic has the service history on your car. Your librarian knows which books you have checked out.
These aren't problematic situations because they each only have a little piece of your data, and you trust each those people with that little piece, and if you don't then you don't have to give it to them.
The problem is when you don't have that choice. Which is what happens when you're dealing with a government or a monopoly (or some other concentrated market where you can't trust any of the players). You can't reasonably choose to not have your location collected by your mobile carrier, or the traffic cameras in front of your home. If all your friends use Facebook, then Facebook Facebook Facebook.
But we don't really want to regulate Facebook. I mean holy cow, what is that even supposed to look like?
I think we can separate the problem into two pieces. The first is collection by, let's call it, unavoidable monopolies. Telecommunications carriers and other utility companies. This is where we know exactly what to do, because these entities should not be collecting any information about people at all. There is no reason Verizon needs to know anything about you other than whether you've paid your bill. So regulation here can be useful, e.g. make it unlawful for carriers to triangulate a cellphone's location without a warrant, or collect anything whatsoever about the contents of IP packets. But we also have a strong technical solution here. Encrypt all the things. Fully deprecate HTTP in favor of HTTPS. We need to build, for example, DNS query privacy. Things like that.
The other part of the problem is what you might call avoidable monopolies. There is no fundamental reason why Facebook has to be as centralized as it is. You have a phone which has all your photos on it and is connected to the internet 24/7. Why is there a copy of your photos on Facebook's servers? If one of your friends wants to see one of your photos, why are they not getting it directly from you? Then you don't have to trust Facebook with a copy of it. So the solution for this half of the problem is, disintermediate the avoidable monopolies.
> Why is there a copy of your photos on Facebook's servers? If one of your friends wants to see one of your photos, why are they not getting it directly from you? Then you don't have to trust Facebook with a copy of it. So the solution for this half of the problem is, disintermediate the avoidable monopolies.
It's because decentralization like that is stupidly, stupidly inefficient. Not to mention that the assumption that your phone is actually on-line 24/7 is unrealistic, and that's before we notice we're not on IPv6 yet, or that people also use cameras, or that they change their phones, go out of service range or simply want to free up space on SD card for something else.
So the fundamental reasons are a) efficiency, and b) availability. That's not to say things couldn't be improved wrt. privacy. I don't know that much about crypto yet (that's about to change, for work-related reasons), but I vaguely recall that there are encryption schemes that would let only you and your friends access the data stored on third party servers, and that would make the data unreadable for said third party.
Disagree. If you're Netflix wanting to distribute Jessica Jones then you want something like a CDN (although in that context BitTorrent is also "something like a CDN").
But think about wanting to share photos with your friends. There are only thirty people who actually want to see the photos. Twenty five of them live in the same city as you, which makes direction connections to you about as efficient as a local CDN node, and the other five live in four different cities, so in all but one case there is nothing to be gained from caching in any of those places because there will only ever be one copy requested. In that one last case the CDN would conserve just one long-distance copy, and that's assuming we can't make P2P software smart enough to have the second person in Timbuktu get the photos from the first person there.
> we're not on IPv6 yet
This one is probably the main reason why this hasn't actually happened yet, but it's not like we don't know what to do -- how about we get on IPv6 already?
> or that people also use cameras
You seem to be implying there is some reason why a photo taken with a camera couldn't still be distributed using a mobile device (or plug server or PC or whatever you like).
> or that they change their phones
And then they can copy the stuff from one to the other.
> Not to mention that the assumption that your phone is actually on-line 24/7 is unrealistic
Availability is a different tack. OK, your phone doesn't have twelve nines of uptime, but it probably is actually online upwards of 90% of the time. And we know how to build reliable systems out of mostly-reliable pieces.
We're assuming that there is a piece of software on your device which already knows who your friends are. So now it just needs a check box that says "cache things for my friends if they cache things for me" and now your friends can get your photos from your other friends (or from their own device) even when your device is occasionally incommunicado.
> or simply want to free up space on SD card for something else.
I think there's a law of physics that says your photos, to exist, have to exist somewhere. I suppose "I would rather give my private data to Facebook than buy an SD card big enough to hold it" is the sort of thing you have to decide for yourself.