zlacker

> unnecessarily track users without their consent

Regardless of your intentions, you are collecting enough data to track users.

> I am transparent about what I collect ([URL])

That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

In my case, you're collecting and storing more than enough additional entropy to make a decent fingerprint because [window.innerWidth, window.innerHeight] == [847, 836]. Even if I resized the window, you could follow those changes simply by watching analytics events from the same IP that are temporally nearby (you are collecting and storing timestamps).

[1] An older comment where I discussed how this could be done (and why GA's supposed "anonymization" feature (aip=1) is a blatant lie): https://news.ycombinator.com/item?id=17170468

replies(6): >>harian+P3 >>Lyndsy+m4 >>mygo+U6 >>Havoc+4t >>aabbcc+xt >>reaper+fu

>>pdkl95+(OP)
Good comment! I only store the window.innerWidth metric. I updated the what we collect page (https://simpleanalytics.io/what-we-collect) to reflect the IP handling. We don't store them. And fingerprinting is something that would be definitely tracking, not on my watch!

replies(3): >>donut+um >>samirm+zp >>pdkl95+qF

>>pdkl95+(OP)
I think there's value in at least distributing the data that's collected. I may not like that the analytics provider has my data, but it seems like a lesser evil if that provider isn't also the world's largest ad company and they aren't using it to build profiles behind the scenes to track my every move across a significant part of the Internet.

Given the choice between a lot of data about me given to a small provider and somewhat less data about me given to Google, I'd generally choose the former.

replies(2): >>sharce+Na >>cwkoss+zD

>>pdkl95+(OP)
> That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address

I’m not the OP, but where is there evidence that they’re storing the IP? Sure it’s in the headers that they process but that doesn’t mean they’re storing it.

>>Lyndsy+m4
Thats no a good way to make a decision. Big,small doesn't matter. What matters is who is providing better security? When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams like Project Zero- is the much better choice. People nowadays assume that a small,indie developer is a good guy. I am just pointing out that this is a very bad bias to have. Technicalities matter, security robustness matters. Google might be collecting data,but their security is really good. Good effort by this dev though.

replies(3): >>Lyndsy+sj >>dotanc+tG >>pvorb+lI

>>sharce+Na
I totally agree on the security aspect, but I think we're talking about different threat models.

Security matters if your concern is the data leaking to a potential malicious actor. The concern that I'm speaking to is the intended use of the data. Google is definitely going to use it for ad targeting and building a "shadow profile", but a small developer probably won't. This one says they won't, but even if they do they're likely to be much less effective than Google would be.

replies(2): >>dzader+er >>sharce+OI1

>>harian+P3
You wrote "Point." - did you mean "Period."?

>>harian+P3
There is absolutely no reason to collect and store window dimensions, other than for fingerprinting and tracking. Sure it might be an interesting piece of trivia for the dev, but it's not necessary for the dev to "make sure the website works great on all of those dimensions", since that much is already obvious and presumed when making websites these days.

replies(3): >>soared+9q >>markbn+jq >>Drdrdr+zq

>>samirm+zp
Besides... optimizing a site for specific window dimensions? If I see conversion rate is lower on a certain band of dimensions, something likely doesn't display properly. It'd be impossible to test every dimension.

>>samirm+zp
Could there not be value in knowing how many pixels your users have available to view your things? You could presumably get that information from device characteristics but then could also presumably use that for fingerprinting.

replies(1): >>samirm+ss

>>samirm+zp
Actually there is, this is one of the most interesting metrics. It doesn't have to be precise though, rounding to nearest 50px would be more than enough. I would argue that height and aspect ratio should be collected too. (I didn't downvote you FWIW)

replies(2): >>Bjartr+7J >>stordo+NC1

>>Lyndsy+sj
I'm curious what your concern with Google building this 'shadow profile' is if you're not worried about this data being leaked to a malicious actor - Is Google simply having this data a bad thing, and if so, why?

replies(2): >>wolco+aB >>tannha+kn1

>>markbn+jq
You as the developer have access to and are aware of all possible display dimensions and aspect ratios. It's not that hard to prioritize the sizes you want to support and then work based off that. There are plenty of tools out there that let you simulate different screen sizes for testing too. I don't see this information providing any extra value.

replies(2): >>CJeffe+mx >>kelnag+XA

>>pdkl95+(OP)
>Regardless of your intentions, you are collecting enough data to track users.

I'd imagine it's difficult to do in depth analytics with tracking users...

>>pdkl95+(OP)
As you're concern to have your user data be on hand of third party, maybe it's better to do the analytics yourself.

Sometime like this https://stackoverflow.com/questions/34031251/javascript-libr...

>>pdkl95+(OP)
That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

This is true. The legal department for the healthcare web sites I maintain doesn't let me store or track IP addresses, even for analytics.

I'm only allowed to tally most popular pages, display language chosen, and date/time. There might be one or two other things, but it's all super basic.

>>samirm+ss
Surely the best sizes to prioritize would be the ones your users are using the most?

replies(1): >>samirm+hA

>>CJeffe+mx
Also known as the most common sizes used by everyone else for most other sites...

replies(2): >>chipot+vC >>Reedx+PC

>>samirm+ss
But you are assuming the users browse the website in full screen mode/maximised. Whilst true for most mobile devices, this is certainly not given on desktops.

replies(1): >>samirm+TL

>>dzader+er
Is that really a question? Google creates global profiles of everyone for tracking and advertising.

Having a random developer create a shadow profile isn't the same.

The scale is vastly different and can be used to track you from site to site.

replies(1): >>dzader+3K

>>samirm+hA
I wonder how those other sites figured out what window dimensions their users were most commonly using!

replies(1): >>samirm+8L

>>samirm+hA
Metrics like that can vary significantly depending on your audience.

>>Lyndsy+m4
I think how the data is used is also a big factor.

There is 'justice' in the blog creator using analytics data to to improve the experience of blog visitors: a user's data will, theoretically and in aggregate, create a better experience for that user in the future. The class of 'users who browse this page' gets a benefit from the cost of providing data.

Selling browsing information to advertisers is sort of 'anti-justice'. Using blog visitor data to track and more effectively manipulate those visitors elsewhere on the internet into paying people money. The blog visitor's external online experience is made worse by browsing that blog.

>>harian+P3
> We don't collect and store IPs.

First, "IPs" might be confusing; "IP addresses" would be more accurate.

More importantly, you have to collect IP addresses (or any other value in the packet headers[1][2]) - even if you don't store it - if you want to receive any packets from the rest of the internet. Storage of those values is separate issue entirely, and it's good to hear that you are intending to NOT store IP addresses (and updating the documenting)!

Also, I strongly recommend using Drdrdrq's suggestion to lower the precision of the collected window dimensions, which should be done on the client i.e. "Math.floor(window.innerWidth/50)*50". This kind of bit-reduction makes fingerprinting a lot harder.

[1] https://en.wikipedia.org/wiki/IPv4#Header

[2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...

replies(1): >>Bjartr+WI

>>sharce+Na
> When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams

This cannot be stressed enough. At my day job I write reasonably secure software on a team for big clients, then at home I write reasonably secure software independently for small clients.

Come new security issue, the big clients at day job get first priority. Not because they are big and not because they are paying more, but rather because as a team we can reallocate resources and work on issues in parallel. At home, there is only one Dotan to work on each independent client in series.

>>sharce+Na
Better than Google "having great security" would be if Google was not collecting that much information in the first place.

>>pdkl95+qF
I would argue that in the conversational context "collect" is more a synonym for "store" than for "receive" or "see". Moreso in the context of a tracking system. In my opinion anyway.

>>Drdrdr+zq
Heck, "rounding" to desktop/mobile could tell you enough to know what to focus your dev time on.

>>wolco+aB
I know Google creates global profiles for tracking - and my question (which is the same as my original question) is why do you care? If that data is only used internally by google to serve you better ads why are you concerned with them having your data?

replies(2): >>Lyndsy+tU >>tagawa+V01

>>chipot+vC
that doesn't matter now does it?

>>kelnag+XA
No, I'm not assuming that, because regardless of how the user browses your site, you're still going to prioritize the sizes important to you.

It wouldn't make sense to prioritize optimizing site design for the few people who are using a non-standard size.

http://gs.statcounter.com/screen-resolution-stats

>>dzader+3K
In short, because they’re doing so without my permission, and I resent that.

>>dzader+3K
Even if a user trusts Google, because the data is digital and therefore permanent, there's no guarantee it will remain internal forever, whether that's because of a hack, a rogue employee, police/government pressure, or a change of ownership.

replies(1): >>seando+jb1

>>tagawa+V01
It seems to me that, with the exception of a rogue employee, all of those examples are at a greater risk of occurring with a small, independent provider. Google almost certainly has more security resources, more legal resources and political clout, and isn’t likely to be acquired any time soon.

I can’t say I love having Google track me, but I don’t feel any better about someone else doing it either.

replies(1): >>marich+Ap1

>>dzader+er
It's not just bad from a privacy PoV. By giving away signals to GA, you're actually underselling your user's data. Google can correlate your analytics with other's to place highly targetted ads for your visitors on other sites, stealing the attention your high-quality content generates, such that sites with big pockets for Google ad bidding and placement but otherwise only low-effort content (and Google itself, of course) make all the money.

>>seando+jb1
If the marketplace was full of independent trackers (which I'm not suggesting is a good idea, because third party trackers are bad in the first place), then as they get compromised, only a small subset of data is lost... The chance of losing everything or enough data to pair to your real identity is a lot lower. It's like IDs in physical activity. If you visit your bank they track you by a different id to the library, your medical record, etc, each might be lost individually and be upsetting, but do they reveal data about all the others? No.

Why is Google security better than anyone else? Monopolies often have more resource, but lack motive, because they are a monopoly. Without transparency we have no idea how secure Google's systems are, but we do know Google has been hacked before.

replies(1): >>sharce+7J1

>>Drdrdr+zq
Could you round to buckets as well - take the 10 (say) most common screen sizes, and round users to the nearest? That way users with odd screen sizes aren't identified.

>>Lyndsy+sj
Probably. Wow, you used the word "probably". I guess you aren't aware of the many cases wherein when a Chrome extension gets popular, indie developers are contacted by some company and many have sold their extension are let them collect data. Also yhis data gets sold to 3rd parties,many such cases with small-medium websites have occured. Remember Unroll.me

Also, Google knows how to make profiles and it knows the importance of that data amd keeping it safe. It is also somewhat answerable to Consumer groups,users,shareholders,regulatory bodies. Indie dev doesn't know how to make good profile, more likely to sell the data to make revenue. Not ridiculing indie devs, just ridiculing your assumptions that if a solo dev is an angel.

https://www.labnol.org/internet/sold-chrome-extension/28377/

https://m.slashdot.org/story/328731

>>marich+Ap1
Humans make systems. Teams like Project Zero (of Google) have contributed a ton to security. They prioritize security a lot.