zlacker

[parent] [thread] 94 comments
1. Adriaa+(OP)[view] [source] 2018-09-19 14:13:28
Creator here. As a developer, I install analytics for clients, but I never feel comfortable installing Google Analytics because Google creates profiles for their visitors, and uses their information for apps (like AdWords). As we all know, big corporations unnecessarily track users without their consent. I want to change that.

So I built Simple Analytics. To ensure that it's fast, secure, and stable, I built it entirely using languages that I'm very familiar with. The backend is plain Node.js without any framework, the database is PostgreSQL, and the frontend is written in plain JavaScript.

I learned a lot while coding, like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request. The script is publicly available (https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...). It works out of the box with modern frontend frameworks by overwriting the "history.pushState"-function.

I am transparent about what I collect (https://simpleanalytics.io/what-we-collect) so please let me know if you have any questions. My analytics tool is just the start for what I want to achieve in the non-tracking movement.

We can be more valuable without exploiting user data.

replies(19): >>cutety+n1 >>blondi+02 >>lucide+d2 >>consto+u2 >>hashr8+43 >>nhooyr+q3 >>wnevet+Q5 >>e12e+q8 >>pdkl95+Qf >>dylz+Gi >>ucario+9r >>ckluis+Ez >>wumms+VD >>harian+7F >>yuhong+FF >>paulja+MQ >>kposeh+k41 >>etatob+KI1 >>iDemon+ZK1
2. cutety+n1[view] [source] 2018-09-19 14:25:26
>>Adriaa+(OP)
Just a heads up, HN comments only use a (I think) small subset of Markdown for formatting, but your link will work as is without having to wrap it in [] and adding the ().

https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...

https://simpleanalytics.io/what-we-collect

Anyway, cool project! I've always felt the same about using GA given I actually like to pretend I have some sort of privacy these days, and always have an adblocker on, so I hated setting it up for people. Definitely will be keeping an eye on this the next time someone asks me to setup GA.

replies(1): >>harian+f2
3. blondi+02[view] [source] 2018-09-19 14:30:00
>>Adriaa+(OP)
nice job! i like the direction you are taking with this project. it's still young, so we don't know X) you might get bitten too lol

anyways, wish you the best luck with your endeavor. btw you might want to fix links above.

replies(1): >>harian+Ie
4. lucide+d2[view] [source] 2018-09-19 14:31:58
>>Adriaa+(OP)
This look great—have bookmarked it for future projects.

I would however a little more skeptical with tools claiming to be privacy-first than I would be with GA (who I presume are not privacy-first). On that note, some quick questions:

- Any plans to open source? I've used Piwik/Matomo in the past, and while I'm not a massive fan of the code-quality of that project, it's at least auditable (and editable).

- You say you're transparent about what you collect—IPs aren't mentioned on that page[0]. Are IPs stored in full or how are they handled? I assume you log IPs?

- How do you discern unique page-views? You seem to be dogfooding and I see no cookies or localStorage keys set.

[0] https://simpleanalytics.io/what-we-collect

replies(1): >>harian+X2
◧◩
5. harian+f2[view] [source] [discussion] 2018-09-19 14:32:00
>>cutety+n1
Thanks, I changed it. Glad you can relate!
6. consto+u2[view] [source] 2018-09-19 14:34:03
>>Adriaa+(OP)
This is pretty much exactly what I have been looking for. I recently ditched Google Analytics and all other possible third party resources (except for YouTube which I implemented a click to play system) on my blog (consto.uk).

I just have a quick question. What subset of the javascript implementation does the tracking pixel provide? If all that is missing is screen size, I might just choose that to avoid running third party code. For performance, I combine, minify, and embed all scripts and styles into each page which lets me acheive perfect scores in the Chrome Auditor.

replies(1): >>harian+E3
◧◩
7. harian+X2[view] [source] [discussion] 2018-09-19 14:37:31
>>lucide+d2
- No plans to go open source with the backend, but I do show the code that is run in the browser. The visualisation of the data is not super important I think. - I don't save IP's, not even in the logs. - I don't have unique pageviews at the moment. I will in the future. If the referrer is the same as the current page, I will measure that as a non-unique. What do you think?
replies(6): >>always+v3 >>lucide+a5 >>nickps+Pj >>andypa+tk >>wongar+8u >>enriqu+OE1
8. hashr8+43[view] [source] 2018-09-19 14:38:10
>>Adriaa+(OP)
Hey I like the idea but have a question.

How are you storing all the information that analytics users want to know i.e. (What devices, what languages, what geolocations, what queries, what page navigations and clicks, etc.)

After reading what you collect I'm assuming you are doing a lot of JS sniffing of browser properties to gather this information along with IP address analysis is that correct? Or what are you plans about these features if you don't have them now?

Overall though I'd say great design + sales pitch. I think if the product delivers on enough features you will have something here. Great job!

replies(1): >>sincer+S6
9. nhooyr+q3[view] [source] 2018-09-19 14:42:54
>>Adriaa+(OP)
Why would sending requests as JSON require an extra request?
replies(1): >>harian+N3
◧◩◪
10. always+v3[view] [source] [discussion] 2018-09-19 14:43:17
>>harian+X2
Doesn't seem like a very useful measure of uniqueness.

What if you had one-day retention of IP addresses for per-day unique views? Seems like too important of a metric to eliminate completely, and one-day retention seems like a decent trade-off at the expense of being able to do unique analysis over longer time periods.

replies(1): >>iampim+74
◧◩
11. harian+E3[view] [source] [discussion] 2018-09-19 14:44:27
>>consto+u2
I practice, you can copy the code to your server. You could subscribe to repo updates on https://github.com/simpleanalytics/cdn.simpleanalytics.io and update your code if the changes make sense to you.
◧◩
12. harian+N3[view] [source] [discussion] 2018-09-19 14:46:10
>>nhooyr+q3
Explained here: https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS (search for preflight)
◧◩◪◨
13. iampim+74[view] [source] [discussion] 2018-09-19 14:49:20
>>always+v3
Don’t retain the IP address, retain a hash of the IP address.
replies(3): >>detaro+j4 >>always+Y4 >>thesim+Y6
◧◩◪◨⬒
14. detaro+j4[view] [source] [discussion] 2018-09-19 14:52:19
>>iampim+74
A plain hash doesn't make a difference.

One can use hashes with regularly changing salts that are destroyed after a while to make older hashes unusable though for some purposes.

◧◩◪◨⬒
15. always+Y4[view] [source] [discussion] 2018-09-19 14:57:28
>>iampim+74
When you can trivially crawl the input space like ipv4 addresses, you'd have to expire a fresh per-day salt as well.

But to my eyes, expiring salts isn't much different than deleting ip addresses after one day. Just more machinery. People have to trust that you're doing either, so why bother beyond being able to use the word "hashing" in marketing language?

replies(1): >>e12e+c6
◧◩◪
16. lucide+a5[view] [source] [discussion] 2018-09-19 14:58:44
>>harian+X2
> What do you think?

Apart from the unfortunate non-open-source answer, this sounds great!

I get others' concerns about wanting unique pageviews, but that metric is always a bit of a sketchy either-or for extremely privacy-conscious people. It's both an incredibly valuable metric, and also one that's difficult to square with complete privacy (basically it's always going to be pseudonymous at best).

replies(1): >>dvko+Hm
17. wnevet+Q5[view] [source] 2018-09-19 15:02:52
>>Adriaa+(OP)
> like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request.

what are the security implications of this?

replies(1): >>cevn+Gc
◧◩◪◨⬒⬓
18. e12e+c6[view] [source] [discussion] 2018-09-19 15:06:13
>>always+Y4
You'd at least want per record salts. But even then it's trivial to check if a given ip is in the dataset. Better, but not great. (ie: you have access to the dataset, you want to check if a given ip/time match the log - read the salt, check the hash).
replies(1): >>wongar+zs
◧◩
19. sincer+S6[view] [source] [discussion] 2018-09-19 15:13:33
>>hashr8+43
It doesn't look like they collect or store languages, geolocations, or devices beyond screen sizes.
replies(1): >>Drdrdr+E8
◧◩◪◨⬒
20. thesim+Y6[view] [source] [discussion] 2018-09-19 15:13:59
>>iampim+74
Not private enough as the space of IP addresses is too small. Removing the last octet of IPv4 addresses before storing them should provide better privacy.
replies(1): >>pdkl95+rg
21. e12e+q8[view] [source] 2018-09-19 15:22:24
>>Adriaa+(OP)
> The script is publicly available

Nice. You might want to add an explicit copyright/license though. Make it less (or more) dangerous for other devs to read it...

I think it could actually be quite useful to "standardize" on a simple (open/libre) front end for analytics (with an implied back-end standard).

◧◩◪
22. Drdrdr+E8[view] [source] [discussion] 2018-09-19 15:23:52
>>sincer+S6
And even if they did, if it is stored as aggregates, I (as a visitor) wouldn't mind.
◧◩
23. cevn+Gc[view] [source] [discussion] 2018-09-19 15:49:50
>>wnevet+Q5
He's talking about skipping CORS by using a "plain" request. Avoiding CORS is not a huge security vulnerability afaik.
◧◩
24. harian+Ie[view] [source] [discussion] 2018-09-19 16:05:54
>>blondi+02
Thank you! Fixed the links.
25. pdkl95+Qf[view] [source] 2018-09-19 16:13:14
>>Adriaa+(OP)
> unnecessarily track users without their consent

Regardless of your intentions, you are collecting enough data to track users.

> I am transparent about what I collect ([URL])

That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

In my case, you're collecting and storing more than enough additional entropy to make a decent fingerprint because [window.innerWidth, window.innerHeight] == [847, 836]. Even if I resized the window, you could follow those changes simply by watching analytics events from the same IP that are temporally nearby (you are collecting and storing timestamps).

[1] An older comment where I discussed how this could be done (and why GA's supposed "anonymization" feature (aip=1) is a blatant lie): https://news.ycombinator.com/item?id=17170468

replies(6): >>harian+Fj >>Lyndsy+ck >>mygo+Km >>Havoc+UI >>aabbcc+nJ >>reaper+5K
◧◩◪◨⬒⬓
26. pdkl95+rg[view] [source] [discussion] 2018-09-19 16:18:48
>>thesim+Y6
> Removing the last octet of IPv4 addresses before storing them should provide better privacy.

That doesn't provide any practical amount of privacy. For a longer discussion of why this is at best a placebo, see: https://news.ycombinator.com/item?id=17170468

replies(1): >>dividu+9m
27. dylz+Gi[view] [source] 2018-09-19 16:34:18
>>Adriaa+(OP)
How can I run this myself?

It absolutely isn't privacy-first if it requires running on someone else's machine and giving your users' data to them - another issue would be that while your server is in the EU, the hosting company is subject to US law, and all the stuff that comes with it (https://en.wikipedia.org/wiki/CLOUD_Act f.e.)

replies(1): >>shermo+N83
◧◩
28. harian+Fj[view] [source] [discussion] 2018-09-19 16:41:36
>>pdkl95+Qf
Good comment! I only store the window.innerWidth metric. I updated the what we collect page (https://simpleanalytics.io/what-we-collect) to reflect the IP handling. We don't store them. And fingerprinting is something that would be definitely tracking, not on my watch!
replies(3): >>donut+kC >>samirm+pF >>pdkl95+gV
◧◩◪
29. nickps+Pj[view] [source] [discussion] 2018-09-19 16:42:23
>>harian+X2
Have you considered using a shared-source license where they can inspect and build from source that they have to pay for? And where people can obtain the source freely for academic research and/or security reviews?

Shared-source proprietary goes as far back as Burroughs B5000 mainframe whose customers got the source and could send in fixes/updates. Microsoft has a Shared Source program. Quite a few suppliers in embedded do it. There's also a company that sells UI software which gives the source to customers buying higher-priced version.

I will warn that people might still rip off and use your code. Given it's JavaScript, I think they can do that anyway with reverse engineering. It also sounds like they could build it themselves anyway. Like most software bootstrappers or startups, you're already in a race with other players that might copy you with clean slate implementations. So, I don't know if the risk is that big a deal or not. I figured I should mention it for fairness.

◧◩
30. Lyndsy+ck[view] [source] [discussion] 2018-09-19 16:44:51
>>pdkl95+Qf
I think there's value in at least distributing the data that's collected. I may not like that the analytics provider has my data, but it seems like a lesser evil if that provider isn't also the world's largest ad company and they aren't using it to build profiles behind the scenes to track my every move across a significant part of the Internet.

Given the choice between a lot of data about me given to a small provider and somewhat less data about me given to Google, I'd generally choose the former.

replies(2): >>sharce+Dq >>cwkoss+pT
◧◩◪
31. andypa+tk[view] [source] [discussion] 2018-09-19 16:46:56
>>harian+X2
Could a bloom filter be useful here?
◧◩◪◨⬒⬓⬔
32. dividu+9m[view] [source] [discussion] 2018-09-19 16:59:24
>>pdkl95+rg
I solved this my SaaS by internally logging all the requests and then using the Measurement Protocol (https://developers.google.com/analytics/devguides/collection...) to send them from the server-side. While doing that I also set the last digit to 0 and unify user agents and other data that's not important for me.
◧◩◪◨
33. dvko+Hm[view] [source] [discussion] 2018-09-19 17:02:47
>>lucide+a5
If you need an open-source solution that truly cares about privacy yet can still keep track of unique pageviews, there's always Fathom Analytics (https://github.com/usefathom/fathom).
replies(2): >>lucide+eD >>rapnie+Fz1
◧◩
34. mygo+Km[view] [source] [discussion] 2018-09-19 17:03:06
>>pdkl95+Qf
> That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address

I’m not the OP, but where is there evidence that they’re storing the IP? Sure it’s in the headers that they process but that doesn’t mean they’re storing it.

◧◩◪
35. sharce+Dq[view] [source] [discussion] 2018-09-19 17:29:31
>>Lyndsy+ck
Thats no a good way to make a decision. Big,small doesn't matter. What matters is who is providing better security? When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams like Project Zero- is the much better choice. People nowadays assume that a small,indie developer is a good guy. I am just pointing out that this is a very bad bias to have. Technicalities matter, security robustness matters. Google might be collecting data,but their security is really good. Good effort by this dev though.
replies(3): >>Lyndsy+iz >>dotanc+jW >>pvorb+bY
36. ucario+9r[view] [source] 2018-09-19 17:32:42
>>Adriaa+(OP)
First off: hats off for making a product that takes the rights of the end user seriously!

However, I am a bit confused as to who would want this product. The sort of questions this product answers seem quite limited:

1. What URLs are getting lots of hits?

2. What referrers are generating lots of hits?

3. What screen sizes are those hits coming from?

What decisions can be drawn from those questions? This seems useful only to perhaps some blog, where they're wondering what sort of content is successful, where to advertise more, and whether to bother making a mobile website.

Without the ability to track user sessions -- even purely in localStorage -- you can't correlate pageview events. For instance, how would I answer a question like:

- How many high-interest users do I have? By "high interest", I mean someone who visited at least three pages on my website.

- Is a mobile website really worthwhile? How much of an effect does being on mobile have on whether someone will be "high-interest"?

I should think some anonymized user ID system -- even if it rotates anonymous IDs -- should be able to answer these questions without compromising privacy.

Also, I'll leave it to others to point out it's unlikely this product is exempt from GDPR.

replies(3): >>boerna+aJ >>zaarn+GE1 >>tomask+bt2
◧◩◪◨⬒⬓⬔
37. wongar+zs[view] [source] [discussion] 2018-09-19 17:41:26
>>e12e+c6
But per record hashes break the original use case: checking if a given hash is already in the database.
◧◩◪
38. wongar+8u[view] [source] [discussion] 2018-09-19 17:51:42
>>harian+X2
If you don't go open source, will you at least offer paid self-hosting (similar to what e.g. Atlassian offers).

The idea of privacy is much easier to sell if the data never leaves your own server, instead of using some analytics provider that might be run by the CIA or the Russian mafia for all we can prove.

replies(1): >>rapnie+iz1
◧◩◪◨
39. Lyndsy+iz[view] [source] [discussion] 2018-09-19 18:26:40
>>sharce+Dq
I totally agree on the security aspect, but I think we're talking about different threat models.

Security matters if your concern is the data leaking to a potential malicious actor. The concern that I'm speaking to is the intended use of the data. Google is definitely going to use it for ad targeting and building a "shadow profile", but a small developer probably won't. This one says they won't, but even if they do they're likely to be much less effective than Google would be.

replies(2): >>dzader+4H >>sharce+EY1
40. ckluis+Ez[view] [source] 2018-09-19 18:29:05
>>Adriaa+(OP)
If you could track pageload time and show a distribution of that per page time - I would buy this in a second.
◧◩◪
41. donut+kC[view] [source] [discussion] 2018-09-19 18:47:29
>>harian+Fj
You wrote "Point." - did you mean "Period."?
◧◩◪◨⬒
42. lucide+eD[view] [source] [discussion] 2018-09-19 18:53:24
>>dvko+Hm
Wow. I've never come across this. Thank you, I'll look into it.
43. wumms+VD[view] [source] 2018-09-19 18:58:14
>>Adriaa+(OP)
"NoScript detected a potential Cross-Site Scripting attack from https://simpleanalytics.io to https://js.stripe.com."
replies(1): >>javver+bR
44. harian+7F[view] [source] 2018-09-19 19:06:26
>>Adriaa+(OP)
Because of the feedback I changed the pricing to $9!
◧◩◪
45. samirm+pF[view] [source] [discussion] 2018-09-19 19:08:21
>>harian+Fj
There is absolutely no reason to collect and store window dimensions, other than for fingerprinting and tracking. Sure it might be an interesting piece of trivia for the dev, but it's not necessary for the dev to "make sure the website works great on all of those dimensions", since that much is already obvious and presumed when making websites these days.
replies(3): >>soared+ZF >>markbn+9G >>Drdrdr+pG
46. yuhong+FF[view] [source] 2018-09-19 19:10:40
>>Adriaa+(OP)
I mentioned Google Analytics in my essay/overview.
◧◩◪◨
47. soared+ZF[view] [source] [discussion] 2018-09-19 19:13:57
>>samirm+pF
Besides... optimizing a site for specific window dimensions? If I see conversion rate is lower on a certain band of dimensions, something likely doesn't display properly. It'd be impossible to test every dimension.
◧◩◪◨
48. markbn+9G[view] [source] [discussion] 2018-09-19 19:15:45
>>samirm+pF
Could there not be value in knowing how many pixels your users have available to view your things? You could presumably get that information from device characteristics but then could also presumably use that for fingerprinting.
replies(1): >>samirm+iI
◧◩◪◨
49. Drdrdr+pG[view] [source] [discussion] 2018-09-19 19:17:28
>>samirm+pF
Actually there is, this is one of the most interesting metrics. It doesn't have to be precise though, rounding to nearest 50px would be more than enough. I would argue that height and aspect ratio should be collected too. (I didn't downvote you FWIW)
replies(2): >>Bjartr+XY >>stordo+DS1
◧◩◪◨⬒
50. dzader+4H[view] [source] [discussion] 2018-09-19 19:22:40
>>Lyndsy+iz
I'm curious what your concern with Google building this 'shadow profile' is if you're not worried about this data being leaked to a malicious actor - Is Google simply having this data a bad thing, and if so, why?
replies(2): >>wolco+0R >>tannha+aD1
◧◩◪◨⬒
51. samirm+iI[view] [source] [discussion] 2018-09-19 19:32:11
>>markbn+9G
You as the developer have access to and are aware of all possible display dimensions and aspect ratios. It's not that hard to prioritize the sizes you want to support and then work based off that. There are plenty of tools out there that let you simulate different screen sizes for testing too. I don't see this information providing any extra value.
replies(2): >>CJeffe+cN >>kelnag+NQ
◧◩
52. Havoc+UI[view] [source] [discussion] 2018-09-19 19:37:55
>>pdkl95+Qf
>Regardless of your intentions, you are collecting enough data to track users.

I'd imagine it's difficult to do in depth analytics with tracking users...

◧◩
53. boerna+aJ[view] [source] [discussion] 2018-09-19 19:39:14
>>ucario+9r
Since the creator points out that he doesn't store any IP addresses, he doesn't store any data that allows identifying an individual. For the GDPR to be applicable you need to store data that allows you to identify an individual. Thus when you use this, you don't have to think about GDPR.
replies(1): >>ucario+T41
◧◩
54. aabbcc+nJ[view] [source] [discussion] 2018-09-19 19:40:29
>>pdkl95+Qf
As you're concern to have your user data be on hand of third party, maybe it's better to do the analytics yourself.

Sometime like this https://stackoverflow.com/questions/34031251/javascript-libr...

◧◩
55. reaper+5K[view] [source] [discussion] 2018-09-19 19:45:29
>>pdkl95+Qf
That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

This is true. The legal department for the healthcare web sites I maintain doesn't let me store or track IP addresses, even for analytics.

I'm only allowed to tally most popular pages, display language chosen, and date/time. There might be one or two other things, but it's all super basic.

◧◩◪◨⬒⬓
56. CJeffe+cN[view] [source] [discussion] 2018-09-19 20:13:46
>>samirm+iI
Surely the best sizes to prioritize would be the ones your users are using the most?
replies(1): >>samirm+7Q
◧◩◪◨⬒⬓⬔
57. samirm+7Q[view] [source] [discussion] 2018-09-19 20:41:04
>>CJeffe+cN
Also known as the most common sizes used by everyone else for most other sites...
replies(2): >>chipot+lS >>Reedx+FS
58. paulja+MQ[view] [source] 2018-09-19 20:46:30
>>Adriaa+(OP)
Fathom thought about data and privacy policy too:

https://usefathom.com/data/

◧◩◪◨⬒⬓
59. kelnag+NQ[view] [source] [discussion] 2018-09-19 20:46:30
>>samirm+iI
But you are assuming the users browse the website in full screen mode/maximised. Whilst true for most mobile devices, this is certainly not given on desktops.
replies(1): >>samirm+J11
◧◩◪◨⬒⬓
60. wolco+0R[view] [source] [discussion] 2018-09-19 20:47:22
>>dzader+4H
Is that really a question? Google creates global profiles of everyone for tracking and advertising.

Having a random developer create a shadow profile isn't the same.

The scale is vastly different and can be used to track you from site to site.

replies(1): >>dzader+TZ
◧◩
61. javver+bR[view] [source] [discussion] 2018-09-19 20:48:16
>>wumms+VD
It appears to be the analytics of the stripe form, ironically.
◧◩◪◨⬒⬓⬔⧯
62. chipot+lS[view] [source] [discussion] 2018-09-19 20:58:41
>>samirm+7Q
I wonder how those other sites figured out what window dimensions their users were most commonly using!
replies(1): >>samirm+Y01
◧◩◪◨⬒⬓⬔⧯
63. Reedx+FS[view] [source] [discussion] 2018-09-19 21:01:36
>>samirm+7Q
Metrics like that can vary significantly depending on your audience.
◧◩◪
64. cwkoss+pT[view] [source] [discussion] 2018-09-19 21:07:33
>>Lyndsy+ck
I think how the data is used is also a big factor.

There is 'justice' in the blog creator using analytics data to to improve the experience of blog visitors: a user's data will, theoretically and in aggregate, create a better experience for that user in the future. The class of 'users who browse this page' gets a benefit from the cost of providing data.

Selling browsing information to advertisers is sort of 'anti-justice'. Using blog visitor data to track and more effectively manipulate those visitors elsewhere on the internet into paying people money. The blog visitor's external online experience is made worse by browsing that blog.

◧◩◪
65. pdkl95+gV[view] [source] [discussion] 2018-09-19 21:22:39
>>harian+Fj
> We don't collect and store IPs.

First, "IPs" might be confusing; "IP addresses" would be more accurate.

More importantly, you have to collect IP addresses (or any other value in the packet headers[1][2]) - even if you don't store it - if you want to receive any packets from the rest of the internet. Storage of those values is separate issue entirely, and it's good to hear that you are intending to NOT store IP addresses (and updating the documenting)!

Also, I strongly recommend using Drdrdrq's suggestion to lower the precision of the collected window dimensions, which should be done on the client i.e. "Math.floor(window.innerWidth/50)*50". This kind of bit-reduction makes fingerprinting a lot harder.

[1] https://en.wikipedia.org/wiki/IPv4#Header

[2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...

replies(1): >>Bjartr+MY
◧◩◪◨
66. dotanc+jW[view] [source] [discussion] 2018-09-19 21:36:06
>>sharce+Dq
> When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams

This cannot be stressed enough. At my day job I write reasonably secure software on a team for big clients, then at home I write reasonably secure software independently for small clients.

Come new security issue, the big clients at day job get first priority. Not because they are big and not because they are paying more, but rather because as a team we can reallocate resources and work on issues in parallel. At home, there is only one Dotan to work on each independent client in series.

◧◩◪◨
67. pvorb+bY[view] [source] [discussion] 2018-09-19 21:59:47
>>sharce+Dq
Better than Google "having great security" would be if Google was not collecting that much information in the first place.
◧◩◪◨
68. Bjartr+MY[view] [source] [discussion] 2018-09-19 22:08:19
>>pdkl95+gV
I would argue that in the conversational context "collect" is more a synonym for "store" than for "receive" or "see". Moreso in the context of a tracking system. In my opinion anyway.
◧◩◪◨⬒
69. Bjartr+XY[view] [source] [discussion] 2018-09-19 22:10:06
>>Drdrdr+pG
Heck, "rounding" to desktop/mobile could tell you enough to know what to focus your dev time on.
◧◩◪◨⬒⬓⬔
70. dzader+TZ[view] [source] [discussion] 2018-09-19 22:18:57
>>wolco+0R
I know Google creates global profiles for tracking - and my question (which is the same as my original question) is why do you care? If that data is only used internally by google to serve you better ads why are you concerned with them having your data?
replies(2): >>Lyndsy+ja1 >>tagawa+Lg1
◧◩◪◨⬒⬓⬔⧯▣
71. samirm+Y01[view] [source] [discussion] 2018-09-19 22:32:48
>>chipot+lS
that doesn't matter now does it?
◧◩◪◨⬒⬓⬔
72. samirm+J11[view] [source] [discussion] 2018-09-19 22:42:35
>>kelnag+NQ
No, I'm not assuming that, because regardless of how the user browses your site, you're still going to prioritize the sizes important to you.

It wouldn't make sense to prioritize optimizing site design for the few people who are using a non-standard size.

http://gs.statcounter.com/screen-resolution-stats

73. kposeh+k41[view] [source] 2018-09-19 23:11:17
>>Adriaa+(OP)
Thanks for making something simple and elegant.

Main question: How are you handling Safari Intelligent Tracking Protection 2.0?

◧◩◪
74. ucario+T41[view] [source] [discussion] 2018-09-19 23:17:54
>>boerna+aJ
I'm not so sure. By putting this service's code on your website, you transmit personal data (IP addresses) to this third party. That appears to make the GDPR applicable here? Transmission is considered "data processing" under the GDPR.

Really, the central point that should be clear is that this is a question for lawyers. The GDPR is incredibly far-reaching.

replies(2): >>Aeolun+891 >>zaarn+NE1
◧◩◪◨
75. Aeolun+891[view] [source] [discussion] 2018-09-20 00:14:44
>>ucario+T41
I mean, sure GDRP applies, but little of it’s provisions apply to storing no PII at all.

If it means your website has to show a message ‘We transmit your info, but save nothing.’ It becomes a bit weird.

◧◩◪◨⬒⬓⬔⧯
76. Lyndsy+ja1[view] [source] [discussion] 2018-09-20 00:30:19
>>dzader+TZ
In short, because they’re doing so without my permission, and I resent that.
◧◩◪◨⬒⬓⬔⧯
77. tagawa+Lg1[view] [source] [discussion] 2018-09-20 01:52:48
>>dzader+TZ
Even if a user trusts Google, because the data is digital and therefore permanent, there's no guarantee it will remain internal forever, whether that's because of a hack, a rogue employee, police/government pressure, or a change of ownership.
replies(1): >>seando+9r1
◧◩◪◨⬒⬓⬔⧯▣
78. seando+9r1[view] [source] [discussion] 2018-09-20 04:34:49
>>tagawa+Lg1
It seems to me that, with the exception of a rogue employee, all of those examples are at a greater risk of occurring with a small, independent provider. Google almost certainly has more security resources, more legal resources and political clout, and isn’t likely to be acquired any time soon.

I can’t say I love having Google track me, but I don’t feel any better about someone else doing it either.

replies(1): >>marich+qF1
◧◩◪◨
79. rapnie+iz1[view] [source] [discussion] 2018-09-20 07:02:16
>>wongar+8u
I think self-hosting is possible with matomo.org
◧◩◪◨⬒
80. rapnie+Fz1[view] [source] [discussion] 2018-09-20 07:07:48
>>dvko+Hm
And also Matomo (https://github.com/matomo-org/matomo)
◧◩◪◨⬒⬓
81. tannha+aD1[view] [source] [discussion] 2018-09-20 08:05:59
>>dzader+4H
It's not just bad from a privacy PoV. By giving away signals to GA, you're actually underselling your user's data. Google can correlate your analytics with other's to place highly targetted ads for your visitors on other sites, stealing the attention your high-quality content generates, such that sites with big pockets for Google ad bidding and placement but otherwise only low-effort content (and Google itself, of course) make all the money.
◧◩
82. zaarn+GE1[view] [source] [discussion] 2018-09-20 08:28:10
>>ucario+9r
Hi,

I might be able to help because I wrote an analytics tool a while back that tracks these three properties and some other stuff

1. Knowing which URLs are being visited allows me to see if a particular campaign or blog site is popular

2. The referrer tells me where a user came from, this is helpful to know if I'm being linked to reddit and should allocate more CPU cores from my host to the VMs responsible for a particular service

3. The screen size allows me to know what aspect ratios and sizes I should optimize for. My general rule is that any screen shape that can fit a 640x480 VGA screen without clipping should allow my website to be fully readable and usable.

4. I also track a trimmed down user agent; "Firefox", "Chrome", "IE", "Edge", "Safari" and other. All will include "(recent)" or "(old)" to indicate version and other will include the full user agent. This allows me to track what browsers people use and if people use outdated browsers ("(old)" usually means 1 year out of date, I try to adjust it regularly to keep the interval shorter)

5. Page Load Speed and Connection. This is a number in 10ms steps and a string that's either "Mobile" or "Wired", which uses a quick and dirty heuristic to evaluate based on if a connection is determined to be throttled, slow and a few other factors. Mobile means people use my website with devices that can't or shouldn't be drawing much bandwidth, Wired means I could go nuts. This allows me to adjust the size of my webpage to fit my userbase.

6. GeoIP: This is either "NAm", "SAm", "Eur", "Asi", "Chin", "OcA", "NAf", "SAf", "Ant" or "Other". I don't need to know more than the continent my users live on, it's good enough data. I track Chinese visitors separately since it interests me.

Overall the tool is fairly accurate and high performance + low bandwidth (a full analytics run takes 4KB of bandwidth including the script and POST request to the server). It doesn't collect any personal data and doesn't allow accurate tracking of any individual.

If I want to track high interest users, I collate some attributes together (Ie, Screen Size, User Agent, Continent) which gets me a rough enough picture of high interest stuff for what I care. You don't need to track specific user sessions, that stuff is covered under the GDPR and not necessary.

Before anyone asks if they could have this tool; nope. It's proprietary and mine. The code I've written for it isn't hard, very minimal and fast. I wrote all this over a weekend and I use influx + grafana for the output. You can do that too.

Both mine and the product of the HN post are likely not in the scope of the GDPR since no data is collected that can specifically identify a user.

◧◩◪◨
83. zaarn+NE1[view] [source] [discussion] 2018-09-20 08:29:14
>>ucario+T41
The IP necessary for the connection itself is covered under necessary data, you can process it for the purpose of a request without needing consent at all. Same applies to shopping cart cookies or anything else that is essential to running a website and isn't being used for secondary purposes like data mining.
replies(1): >>mikekc+bG1
◧◩◪
84. enriqu+OE1[view] [source] [discussion] 2018-09-20 08:29:33
>>harian+X2
> No plans to go open source with the backend

You say that you do not store IP addresses, but why should anybody believe it?

Modern security is based on proof, not on trust.

replies(1): >>Adriaa+5U1
◧◩◪◨⬒⬓⬔⧯▣▦
85. marich+qF1[view] [source] [discussion] 2018-09-20 08:38:07
>>seando+9r1
If the marketplace was full of independent trackers (which I'm not suggesting is a good idea, because third party trackers are bad in the first place), then as they get compromised, only a small subset of data is lost... The chance of losing everything or enough data to pair to your real identity is a lot lower. It's like IDs in physical activity. If you visit your bank they track you by a different id to the library, your medical record, etc, each might be lost individually and be upsetting, but do they reveal data about all the others? No.

Why is Google security better than anyone else? Monopolies often have more resource, but lack motive, because they are a monopoly. Without transparency we have no idea how secure Google's systems are, but we do know Google has been hacked before.

replies(1): >>sharce+XY1
◧◩◪◨⬒
86. mikekc+bG1[view] [source] [discussion] 2018-09-20 08:49:26
>>zaarn+NE1
The key is to determine under which lawful basis you are processing that data. "Necessary data" is not a get out of jail free card. Because the analytics are not necessary to perform the contract (in any way that I can imagine), you can't claim contract lawful basis. Probably you are stuck with legitimate interest.

So I think you would have to notify the user that you are sending their IP address to the processor under legitimate interest and have a way for them to "object" to that use (i.e. turn off analytics). For legitimate interest, the objection can be after the fact, so having a configuration screen that stores a cookie that allows them to turn off analytics when they are on the site would probably do it.

replies(1): >>zaarn+WG1
◧◩◪◨⬒⬓
87. zaarn+WG1[view] [source] [discussion] 2018-09-20 09:03:19
>>mikekc+bG1
Since in my case, the processor is me, there is no data being sent elsewhere. I don't have a notification since all data collected is either heavily anonymized in client if possible or on the server side or simply not identifying. Since the data I collect is used to optimize the website experience, I think it's a good enough legitimate interest with no privacy impact.
88. etatob+KI1[view] [source] 2018-09-20 09:29:17
>>Adriaa+(OP)
Good Sir, props to you for including a noscript/image tag in the default code. Google Analytics didn't do it for the longest time, and in fact may still not do it.

Whether on purpose or by accident (or simply by mental bias) they seriously misrepresent the amount of people for whom JavaScript is blocked, not loading, disabled by default for unknown websites (me) or not available for any other reason.

Website owners and creators should at least have that information as a reliable metric to base their development choices on.

89. iDemon+ZK1[view] [source] 2018-09-20 10:04:38
>>Adriaa+(OP)
Nice product!

Could I ask what tech you're using for the graph data? I'm working on a similar SaaS (not analytics) which requires graphs. I'm a DevOps engineer for an ISP, and I do a lot of work with things like Graphite/Carbon, Prometheus and so on - but I can't seem to settle on what to use for personal projects. Do you use a TSDB at all? Or are you just storing it in SQL for now?

◧◩◪◨⬒
90. stordo+DS1[view] [source] [discussion] 2018-09-20 11:48:26
>>Drdrdr+pG
Could you round to buckets as well - take the 10 (say) most common screen sizes, and round users to the nearest? That way users with odd screen sizes aren't identified.
◧◩◪◨
91. Adriaa+5U1[view] [source] [discussion] 2018-09-20 12:04:08
>>enriqu+OE1
> You say that you do not store IP addresses, but why should anybody believe it?

I can show the code, I will probably do this in my next blog post, but that does not guaranty anything.

> Modern security is based on proof, not on trust.

Is it? So if there is a hosted version of a open source tool, you are sure they use the same code on the hosted version a in the open source tool? It's still based on trust.

◧◩◪◨⬒
92. sharce+EY1[view] [source] [discussion] 2018-09-20 12:44:01
>>Lyndsy+iz
Probably. Wow, you used the word "probably". I guess you aren't aware of the many cases wherein when a Chrome extension gets popular, indie developers are contacted by some company and many have sold their extension are let them collect data. Also yhis data gets sold to 3rd parties,many such cases with small-medium websites have occured. Remember Unroll.me

Also, Google knows how to make profiles and it knows the importance of that data amd keeping it safe. It is also somewhat answerable to Consumer groups,users,shareholders,regulatory bodies. Indie dev doesn't know how to make good profile, more likely to sell the data to make revenue. Not ridiculing indie devs, just ridiculing your assumptions that if a solo dev is an angel.

https://www.labnol.org/internet/sold-chrome-extension/28377/

https://m.slashdot.org/story/328731

◧◩◪◨⬒⬓⬔⧯▣▦▧
93. sharce+XY1[view] [source] [discussion] 2018-09-20 12:47:10
>>marich+qF1
Humans make systems. Teams like Project Zero (of Google) have contributed a ton to security. They prioritize security a lot.
◧◩
94. tomask+bt2[view] [source] [discussion] 2018-09-20 16:43:54
>>ucario+9r
Here's a gdpr compliant system that answers complex questions. Hint: if your content is worthy, a part of readers will agree to reasonable analytics, and you can extrapolate from this.

https://www.baekdal.com/thoughts/inside-story-what-i-did-to-...

◧◩
95. shermo+N83[view] [source] [discussion] 2018-09-21 00:19:16
>>dylz+Gi
Try Snowplow if you want to run it yourself.
[go to top]