So I built Simple Analytics. To ensure that it's fast, secure, and stable, I built it entirely using languages that I'm very familiar with. The backend is plain Node.js without any framework, the database is PostgreSQL, and the frontend is written in plain JavaScript.
I learned a lot while coding, like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request. The script is publicly available (https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...). It works out of the box with modern frontend frameworks by overwriting the "history.pushState"-function.
I am transparent about what I collect (https://simpleanalytics.io/what-we-collect) so please let me know if you have any questions. My analytics tool is just the start for what I want to achieve in the non-tracking movement.
We can be more valuable without exploiting user data.
https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...
https://simpleanalytics.io/what-we-collect
Anyway, cool project! I've always felt the same about using GA given I actually like to pretend I have some sort of privacy these days, and always have an adblocker on, so I hated setting it up for people. Definitely will be keeping an eye on this the next time someone asks me to setup GA.
Have you considered a free tier for up to 1k page views a month for example?
How can this track conversions for A/B testing? This is one of the most common usages of analytics in my experience. Is there a way to have user based conversion tracking whilst still being GDPR compliant?
anyways, wish you the best luck with your endeavor. btw you might want to fix links above.
I would however a little more skeptical with tools claiming to be privacy-first than I would be with GA (who I presume are not privacy-first). On that note, some quick questions:
- Any plans to open source? I've used Piwik/Matomo in the past, and while I'm not a massive fan of the code-quality of that project, it's at least auditable (and editable).
- You say you're transparent about what you collect—IPs aren't mentioned on that page[0]. Are IPs stored in full or how are they handled? I assume you log IPs?
- How do you discern unique page-views? You seem to be dogfooding and I see no cookies or localStorage keys set.
I just have a quick question. What subset of the javascript implementation does the tracking pixel provide? If all that is missing is screen size, I might just choose that to avoid running third party code. For performance, I combine, minify, and embed all scripts and styles into each page which lets me acheive perfect scores in the Chrome Auditor.
How are you storing all the information that analytics users want to know i.e. (What devices, what languages, what geolocations, what queries, what page navigations and clicks, etc.)
After reading what you collect I'm assuming you are doing a lot of JS sniffing of browser properties to gather this information along with IP address analysis is that correct? Or what are you plans about these features if you don't have them now?
Overall though I'd say great design + sales pitch. I think if the product delivers on enough features you will have something here. Great job!
What if you had one-day retention of IP addresses for per-day unique views? Seems like too important of a metric to eliminate completely, and one-day retention seems like a decent trade-off at the expense of being able to do unique analysis over longer time periods.
One can use hashes with regularly changing salts that are destroyed after a while to make older hashes unusable though for some purposes.
But to my eyes, expiring salts isn't much different than deleting ip addresses after one day. Just more machinery. People have to trust that you're doing either, so why bother beyond being able to use the word "hashing" in marketing language?
It disappoints in every way, you can't even check yesterdays stats.
Apart from the unfortunate non-open-source answer, this sounds great!
I get others' concerns about wanting unique pageviews, but that metric is always a bit of a sketchy either-or for extremely privacy-conscious people. It's both an incredibly valuable metric, and also one that's difficult to square with complete privacy (basically it's always going to be pseudonymous at best).
what are the security implications of this?
Speaking of the video, it's ridiculously professionally done, by the way; excellent acting to begin with and perfect line delivery (confident, well-timed, no hesitancy/awkwardness) as far as I'm concerned.
-
Apart from this, my only other advice is - reject buy offers, reject partner offers, sleep on VC offers for as long as you can (if, ideally, you don't outright reject these as well), and take this as far as possible on your own. I say this considering two standpoints.
a) Considering the developer: this is incredibly well done and you clearly have the competency to drive this forward without assistance. The website and video presentations are both great; the product defaults easily tick "sane enough"; and the only thing stopping me throwing money at the screen is that I have no projects that need this right now - but others definitely will, and I look forward to seeing this go viral.
b) Considering the product: "oooo internet privacy" is a well-trodden path with a thousand and one different options which are all terrible in their own way. You have the opportunity to differentiate by offering something that gains a reputation for actually not compromising, even months and years down the track by working to eliminate some of the sociopolitical cascade that can contribute to dilution of quality. Customers have sadly had good reason to associate buyouts with rapid decline in quality, so that sort of thing just looks bad at face value too.
To clarify what I mean by taking this as far as you can on your own: it's obvious others have already provided assistance - filming and acting in the video, and for all I know beta testing and maybe other development support - and I'm not pointing at that and suggesting it will bite you. I mean that, if you ever bring help on, find a good lawyer who will ensure the project remains _yours_ and make sure there are no implicit "50/50" partnership agreements or the like.
I can't find the references right now but I've read of a couple of projects/products that have exploded sideways (very sadly) because of jealousies and impedance mismatches creating imbalances that provoke partners brought onto projects to assume control and pivot things out of a creator's control, without the creator having any legal recourse.
Nice. You might want to add an explicit copyright/license though. Make it less (or more) dangerous for other devs to read it...
I think it could actually be quite useful to "standardize" on a simple (open/libre) front end for analytics (with an implied back-end standard).
It's a business service, not a consumer service. Apples to oranges. People will pay for it.
IP-address is considered personal data. So when the browsers visits a page with the JS, the IP-address of the user is transferred to your server. So that means the website I am visiting is sharing my IP-address with a third-party (you).
It looks like anyone can see the stats for any domain using the service without any authentication. I added the tracking code to my domain and was able to hit https://simpleanalytics.io/[mydomain.co.uk] without signing up or logging in. I was also able to see the stats for your personal site.
Is that intentional? If it is, it seems like an odd choice for a privacy-first service. If not, it seems like quite a worrying oversight in a paid-for product.
I'm curious why you chose to host the data yourself instead of giving customers the data immediately at the point of collection. That's the path we chose for Blockmetry as it genuinely required to be a non-tracking web analytics service and makes it impossible to profile users. Any service that hosts its data would still be open to being untrusted on the "no tracking no profiling" argument.
Thanks, Pierre
PS - YC Startup School founders: ping me via the forums and get an extended-period free trial.
Something that would interest me, is a little explanation of https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl....
You already have very brief comments at strategic points. If you would explain these one by one, I would learn a lot about optimizing for number of requests, skipping stuff to load, etc. Maybe a technical blog post at a later time when the dust settles?
Just want to point out for all the front-end devs out there: 12% of traffic to this site atm is coming from screen-width < 375px.
Usually too early you don't want many low paying customer which would imply too much support while you iterate quickly. A low number of high paying customer is much better.
It's also easier to lower the cost later than increasing it. It's hard early on to find the right price point.
If he doesn't sale as much as he want, he will adapt.
Has somebody figured out how to resolve this situation with log files?
Besides you don't want to be dealing with desperately trying to convert free users later, or worse, having to grandfather everyone in at free forever from your initial launch. Gotta pay the bills somehow.
A Summary page should show traffic volume, who exactly is driving it and where it arrives. That's the bare minimum needed to make shown information actually _useful_ and _actionable_. Things like "Top Domain Referrers" and "Top Pages" are aggregate vanity metrics, their effective utility is zero. If you have a spike in traffic, you want to know the reason and with your current design you can't.
What's wrong with "50/50" partnership agreements, unless the keyword here is "implicit"?
Edit: It seems to have been filtered now, but people were using spoofed referer headers to leave offensive messages for HN users.
Best of luck with it!
Regardless of your intentions, you are collecting enough data to track users.
> I am transparent about what I collect ([URL])
That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.
In my case, you're collecting and storing more than enough additional entropy to make a decent fingerprint because [window.innerWidth, window.innerHeight] == [847, 836]. Even if I resized the window, you could follow those changes simply by watching analytics events from the same IP that are temporally nearby (you are collecting and storing timestamps).
[1] An older comment where I discussed how this could be done (and why GA's supposed "anonymization" feature (aip=1) is a blatant lie): https://news.ycombinator.com/item?id=17170468
You are part of The Problem.
This is a solo dev's venture, that has a relatively pure and straightforward goal. If you can't afford it, don't use it and pick one of the others.
Do NOT compare this with a B2C offering that has nothing to do with analytics.
Do NOT compare this with a B2B offering that's free and feeds your user's data into the parent corporation's advertising revenue stream.
Do NOT compare this with a B2B offering that is open-source, with a team of a dozen core contributors that has had a decade of development under its belt.
For this you can logrotate daily and run goaccess before rotation. I believe you can keep the server logs for a week for debugging while respecting GDPR.
For today's "realtime" data you can use goaccess on today's log on demand and use a cache.
You can write your custom stat viewers or use goaccess to view time range data from multiple json files.
That doesn't provide any practical amount of privacy. For a longer discussion of why this is at best a placebo, see: https://news.ycombinator.com/item?id=17170468
I still get objecting to Google products on principle, but their privacy policy for GA seems pretty reasonable to me: https://support.google.com/analytics/answer/6004245
Do you know what isn't creepy and privacy invading? Analysing the attributes of the visitors to FranksKebabShop.com, as part of the tooling that runs FranksKebabShop.com.
This could be analysing web server/cache logs. It could be a more active piece of software that operates via JS and reports back to a service running on the same domain.
I know, I know "everything is SaaS now, nobody installs software". Nobody can install it if you don't make it installable. Be part of the solution not part of the problem.
It absolutely isn't privacy-first if it requires running on someone else's machine and giving your users' data to them - another issue would be that while your server is in the EU, the hosting company is subject to US law, and all the stuff that comes with it (https://en.wikipedia.org/wiki/CLOUD_Act f.e.)
I personally wouldn't use one that isn't OSS, but plenty of people don't care about that, but do care about privacy, including the privacy of their site visitors.
Anyone remember what the price was? I want to say it was something like $60-$100, but my memory could be conflating it with something else.
Shared-source proprietary goes as far back as Burroughs B5000 mainframe whose customers got the source and could send in fixes/updates. Microsoft has a Shared Source program. Quite a few suppliers in embedded do it. There's also a company that sells UI software which gives the source to customers buying higher-priced version.
I will warn that people might still rip off and use your code. Given it's JavaScript, I think they can do that anyway with reverse engineering. It also sounds like they could build it themselves anyway. Like most software bootstrappers or startups, you're already in a race with other players that might copy you with clean slate implementations. So, I don't know if the risk is that big a deal or not. I figured I should mention it for fairness.
Given the choice between a lot of data about me given to a small provider and somewhat less data about me given to Google, I'd generally choose the former.
Simple Analytics is a real service. Blockmetry is a 90's-looking page with a "contact sales" button and not even a demo (nevermind a real product).
Am I missing something?
I do have some questions/comments and I apologize if they seem a bit rapid-fire.
* When I look at the "Top Pages", there are links. When I click the link, it brings me to that page on your site not a chart of hits for that page. Is that how it's meant to work?
* If I sign up for your service, do my stats become public? https://simpleanalytics.io/apple.com just says "This domain does not have any data yet" (presumably because Apple doesn't have your script installed). But that kinda indicates that any domain with your script installed would show up there. It might just be an error in the messaging, but probably something to fix.
* What's your backend like? I'm mostly curious because analytics at scale isn't an easy problem. Do you write to a log-structured system with high availability (like Kafka) and then process asynchronously? How do you handle making the chart of visitors? Do you roll up the stats periodically?
* Speaking of scale, if I started sending thousands or tens of thousands of requests per second at you, would that be bad? Is this more targeted at small sites?
* What do you do about bots? Bot traffic can be a large source of traffic that throws off numbers.
* How long before numbers are available? It's September 19th, but the last stats on the live demo are September 18th. Is it lagged by a day?
* Do you not want to track user-agents for privacy reasons as well? Seems like a UA doesn't really identify anyone, but it can be useful for determining if you want to support a browser.
* You're not counting anyone that has the "Do Not Track" header. To me, DNT is more about tracking than counting (which is different). Even if you counted my hit, it wouldn't be tracking me if you didn't record information like IP address and there were no cookies.
Kudos for launching something. I think my biggest suggestions would be fixing the live-demo page so it doesn't look like it's leaking other site's data and providing some guidance about limits. It's easy to think that you don't want to put limits on people, but any architecture is made with a certain scale in mind. There's no shame in that. Sometimes what you want is a "let us know if you need more than X" message. At the very least, it lets you prepare. People sometimes use products in ways you wouldn't imagine and ways you didn't intend which the system doesn't handle gracefully.
Good luck with your product!
I’m not the OP, but where is there evidence that they’re storing the IP? Sure it’s in the headers that they process but that doesn’t mean they’re storing it.
I mean, can I just see stats of a site that uses the service?
e.g.
But, how useful are these stats going to be without being able to see user journeys through a path of pages / actions? Yes, it's good to know which pages are getting how many views. But, in order to improve the UX, we often need to know how many users are able to go from Page A to Page C and whether they went through Page B first. Or e.g. if 90% of sessions that start on Page A (so we know what their purpose was), end on Page B but the main (perhaps beneficial) action for the user was on Page C. You can't just look at the pageviews for each, because you don't know where the session started.
I fear that this would reduce people to "inferring" (guessing) too much about the data that they see, and making decisions they feel are backed with data when there's not enough data to conclude. Then again, I'm sure that happens when the data is there too :-)
Plus, I have zero confidence that someone using a naive postgres implementation can scale an analytics backend with customers paying only $12/mo unless all those customers get barely any traffic. Perhaps if he was using Timescale on top of postgres, but even then, $12/mo seems awfully low.
But as it is, the price point signals that he doesn't think it's a particularly valuable service.
However, I am a bit confused as to who would want this product. The sort of questions this product answers seem quite limited:
1. What URLs are getting lots of hits?
2. What referrers are generating lots of hits?
3. What screen sizes are those hits coming from?
What decisions can be drawn from those questions? This seems useful only to perhaps some blog, where they're wondering what sort of content is successful, where to advertise more, and whether to bother making a mobile website.
Without the ability to track user sessions -- even purely in localStorage -- you can't correlate pageview events. For instance, how would I answer a question like:
- How many high-interest users do I have? By "high interest", I mean someone who visited at least three pages on my website.
- Is a mobile website really worthwhile? How much of an effect does being on mobile have on whether someone will be "high-interest"?
I should think some anonymized user ID system -- even if it rotates anonymous IDs -- should be able to answer these questions without compromising privacy.
Also, I'll leave it to others to point out it's unlikely this product is exempt from GDPR.
It’s intended as a mostly drop-in replacement for the GA analytics.js API and to be used as an AWS Lambda.
You can check it out here: https://github.com/NYPL/google-analytics-proxy
The idea of privacy is much easier to sell if the data never leaves your own server, instead of using some analytics provider that might be run by the CIA or the Russian mafia for all we can prove.
The example (https://trackingco.de/public/9ykvs7rk) does not work for me. Also, the first time I visited the site I saw Lightning Bitcoin and then left. You lost me as soon as I read that because I'm not interested in that. I was just trying to find a simple (but useful) analytics service that's easy to use.
source: https://github.com/allinurl/goaccess/blob/master/config/goac...
Security matters if your concern is the data leaking to a potential malicious actor. The concern that I'm speaking to is the intended use of the data. Google is definitely going to use it for ad targeting and building a "shadow profile", but a small developer probably won't. This one says they won't, but even if they do they're likely to be much less effective than Google would be.
I would remove this, any company would hesitate to buy a service from a single guy.
With that said, a minimalist approach to web analytics is attractive to me, specially if I can see its "boundaries", the set of reports etc.
The argument on privacy (or lack of it) has no impact on my perception about this service's value proposition.
The example should work, however. Well, I guess your feedback was very useful. Thanks!
I'd imagine it's difficult to do in depth analytics with tracking users...
Sometime like this https://stackoverflow.com/questions/34031251/javascript-libr...
Another bit of feedback is to draft on as many related stories as you can here in HN (like you are doing now).
This is true. The legal department for the healthcare web sites I maintain doesn't let me store or track IP addresses, even for analytics.
I'm only allowed to tally most popular pages, display language chosen, and date/time. There might be one or two other things, but it's all super basic.
As an author of SPAs and PWAs, though, I'd really like the ability to push a page hit programmatically.
The goal is great, the design is sleek, but at the current price point (which is already lowered to $9) and feature set, it's just not worth it to me. For that price the tool has to provide more actionable data or other value.
There's probably a market out there, but most of that market is probably not the type of person you will find on hacker news.
Having a random developer create a shadow profile isn't the same.
The scale is vastly different and can be used to track you from site to site.
There is 'justice' in the blog creator using analytics data to to improve the experience of blog visitors: a user's data will, theoretically and in aggregate, create a better experience for that user in the future. The class of 'users who browse this page' gets a benefit from the cost of providing data.
Selling browsing information to advertisers is sort of 'anti-justice'. Using blog visitor data to track and more effectively manipulate those visitors elsewhere on the internet into paying people money. The blog visitor's external online experience is made worse by browsing that blog.
It's incredible how many developers undervalue their own time, effort, and liability. I believe you're mistaken if you think this'll only take two hours of your time. Even if that's true, I believe you're mistaken that your two hours of time is cheaper than $9/mo. I'm also certain that being responsible for analytics sets you up for liabilities and maintenance that distracts from your main value proposition.
There's definitely a market for this, and that market is absolutely here, but apparently a lot of developers don't know how to pick and choose their battles.
This is one of the top ways that credit card breaches are happening lately - e-commerce sites include tons of 3rd party tracking / analytics / remarketing / etc code on their checkout pages, one of them gets hacked and the modified JS posts the credit card form to some compromised server.
Sure, the blockmetry site has some issues. The menu is unusable on my mobile (android) and there are no screenshots or explanation of how it actually works (server/client side, self/cloud hosted?). There are some style choices that I don't agree with, like the binary background pattern.
But I like to assume good faith unless I have some solid evidence otherwise. Do you know for a fact that there is no product? If so, please share :)
It is, however, poor form to plug your competing product in a Show HN. It's a fine line between mentioning and plugging, but I think offering a discount falls on to the wrong side of the line.
All these things can be pointed out politely.
Then, clients that help keeping lights on start asking for this and that.
And suddenly you end up providing a service with user level insights, cross-device tracking and advanced behavioral segments powered by ML because why not.
GA was simple, before.
If he had offered it for free initially, haters would say the author must think it is worthless.
This place is the last bastion of civility on the internet. Let's keep it constructive. Pretty please?
First, "IPs" might be confusing; "IP addresses" would be more accurate.
More importantly, you have to collect IP addresses (or any other value in the packet headers[1][2]) - even if you don't store it - if you want to receive any packets from the rest of the internet. Storage of those values is separate issue entirely, and it's good to hear that you are intending to NOT store IP addresses (and updating the documenting)!
Also, I strongly recommend using Drdrdrq's suggestion to lower the precision of the collected window dimensions, which should be done on the client i.e. "Math.floor(window.innerWidth/50)*50". This kind of bit-reduction makes fingerprinting a lot harder.
[1] https://en.wikipedia.org/wiki/IPv4#Header
[2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...
No you didn't. You argued what it shouldn't be compared to, so I compared it to something else.
> I'm also certain that being responsible for analytics sets you up for liabilities
Outsourcing analytics opens me to the same or worse legal liabilities.
> Even if that's true, I believe you're mistaken that your two hours of time is cheaper than $9/mo.
I don't live in Silicon Valley, so $9/mo pays for two hours within a few months. Sure, there's hosting and an uncertain maintenance burden, but on the other hand buying a service has its own uncertain overheads.
What happens if the service is down, what if it fails to scale, what if it gets hacked? What if it just disappears because there weren't enough customers? Most of these are much easier to answer and take up less time with a self-built service.
We will consider it. Thanks for making this. hopefully more companies will follow suit.
> When a customer of Analytics requests IP address anonymization, Analytics anonymizes the address as soon as technically feasible at the earliest possible stage of the collection network. The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros in memory shortly after being sent to the Analytics Collection Network. The full IP address is never written to disk in this case.
This cannot be stressed enough. At my day job I write reasonably secure software on a team for big clients, then at home I write reasonably secure software independently for small clients.
Come new security issue, the big clients at day job get first priority. Not because they are big and not because they are paying more, but rather because as a team we can reallocate resources and work on issues in parallel. At home, there is only one Dotan to work on each independent client in series.
This product (https://truestats.com) collects the I.P. address and user agent for the purpose of detecting fraud (not selling data or profiling users). It is used for frequency checking and other patterns that would indicate fraud. We are still going through the legal analysis of how to deal with this, even though we have no idea who the visitors are.
I think considering the I.P. address as PII is a little much if you are not using it in a way that would violate privacy or selling the data.
It's also Open Source so you can see for yourself what is going on, or even self-host.
It wouldn't make sense to prioritize optimizing site design for the few people who are using a non-standard size.
Code is not an asset, it's a liability. And I mean that from a pure responsibility standpoint, not just from a legal responsibility standpoint.
There are lots of config options. Here's what I like to use:
// Google Analytics Code.
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};
// https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference
ga('create', 'UA-XXX-XX', 'auto', {
// The default cookie expiration is 2 years. We don't want our cookies
// around that long. We only want just long enough to see analytics on
// repeat visits. Instead, limit to 31 days. Field is in seconds:
// 31 * 24 * 60 * 60 = 2678400
'cookieExpires': 2678400,
// We don't need a cookie to track campaign information, so remove that.
'storeGac': false,
// Anonymize the ip address of the user.
'anonymizeIp': true,
// Always send all data over SSL. Unnecessary, since the site only loads on
// SSL, but defense in depth.
'forceSSL': true});
// Now, record 1 pageview event.
ga('send', 'pageview');Main question: How are you handling Safari Intelligent Tracking Protection 2.0?
Really, the central point that should be clear is that this is a question for lawyers. The GDPR is incredibly far-reaching.
Also, it’s easier to raise prices than to lower them. If you lower prices, you need to do it for your current users too or they’ll complain. If you raise prices, you can grandfather people in AND it can be used to incentivize people to buy before the price goes up.
Will also add a lot of comments here are very unfair I hope you take them with a grain of salt.
If it means your website has to show a message ‘We transmit your info, but save nothing.’ It becomes a bit weird.
I can’t say I love having Google track me, but I don’t feel any better about someone else doing it either.
By 2014 when I left, we had a few petabytes of analytics data for a very small but high traffic set of customers. Could we query all of that at once within a reasonable online SLA? No. We partitioned and sharded the data easily and only queried the partitions we needed.
If I were to do this now and didn't need near real-time (what is real-time?) I'd use sqlite. Otherwise I'ld use trickle-n-flip on postgres or mysql. There are literally 10+ year-old books[1] on this wrt RDBMS.
And yes, even with 2000 clients reaching billions of requests per day, only the top few stressed the system. The rest is long tail.
1. https://www.amazon.com/Data-Warehousing-Handbook-Rob-Mattiso...
Users like me must be complete ghosts unless one looks in their real server logs!
Executing third party JS on your website is an access to the page content, so unless the customer never had any user data or sensitive data on the page, they'll have to categorise simpleanalytics as a data processor.
Referers are often on their own private data, for example https://www.linkedin.com/in/markalanrichards/edit identifies not just you looked at this user, but that you are this user as it is the profile editing page, unique to this account.
The difference between whether simpleanalytics get or store data might remove a GDPR issue for them, but it certainly is for customers. Having access to the IP addresses is sufficient for privacy to be invaded at any point or by accident (wrong logging parameter added by the next new dev), malice (how can we illegally use this and lie to customers) or compromise (hackers take control of the analytics system) and therefore puts users at risk of full tracking at any point. As mentioned earlier GDPR is also about access, it is definitely about storage but the part in between of being given data (not just access to take it and not putting it on disk) is definitely included too.
In summary, simpleanalytics need to stop lying and redo their privacy impact assessments. Meanwhile don't use third party analytics (I have no idea how you maintain security control on third party JS) and if you're silly enough to, then it definitely is a GDPR consideration that needs to be assessed, added to audit, added to privacy policies, etc.
I might be able to help because I wrote an analytics tool a while back that tracks these three properties and some other stuff
1. Knowing which URLs are being visited allows me to see if a particular campaign or blog site is popular
2. The referrer tells me where a user came from, this is helpful to know if I'm being linked to reddit and should allocate more CPU cores from my host to the VMs responsible for a particular service
3. The screen size allows me to know what aspect ratios and sizes I should optimize for. My general rule is that any screen shape that can fit a 640x480 VGA screen without clipping should allow my website to be fully readable and usable.
4. I also track a trimmed down user agent; "Firefox", "Chrome", "IE", "Edge", "Safari" and other. All will include "(recent)" or "(old)" to indicate version and other will include the full user agent. This allows me to track what browsers people use and if people use outdated browsers ("(old)" usually means 1 year out of date, I try to adjust it regularly to keep the interval shorter)
5. Page Load Speed and Connection. This is a number in 10ms steps and a string that's either "Mobile" or "Wired", which uses a quick and dirty heuristic to evaluate based on if a connection is determined to be throttled, slow and a few other factors. Mobile means people use my website with devices that can't or shouldn't be drawing much bandwidth, Wired means I could go nuts. This allows me to adjust the size of my webpage to fit my userbase.
6. GeoIP: This is either "NAm", "SAm", "Eur", "Asi", "Chin", "OcA", "NAf", "SAf", "Ant" or "Other". I don't need to know more than the continent my users live on, it's good enough data. I track Chinese visitors separately since it interests me.
Overall the tool is fairly accurate and high performance + low bandwidth (a full analytics run takes 4KB of bandwidth including the script and POST request to the server). It doesn't collect any personal data and doesn't allow accurate tracking of any individual.
If I want to track high interest users, I collate some attributes together (Ie, Screen Size, User Agent, Continent) which gets me a rough enough picture of high interest stuff for what I care. You don't need to track specific user sessions, that stuff is covered under the GDPR and not necessary.
Before anyone asks if they could have this tool; nope. It's proprietary and mine. The code I've written for it isn't hard, very minimal and fast. I wrote all this over a weekend and I use influx + grafana for the output. You can do that too.
Both mine and the product of the HN post are likely not in the scope of the GDPR since no data is collected that can specifically identify a user.
You say that you do not store IP addresses, but why should anybody believe it?
Modern security is based on proof, not on trust.
Why is Google security better than anyone else? Monopolies often have more resource, but lack motive, because they are a monopoly. Without transparency we have no idea how secure Google's systems are, but we do know Google has been hacked before.
So I think you would have to notify the user that you are sending their IP address to the processor under legitimate interest and have a way for them to "object" to that use (i.e. turn off analytics). For legitimate interest, the objection can be after the fact, so having a configuration screen that stores a cookie that allows them to turn off analytics when they are on the site would probably do it.
But Slack basically is just IRC with pictures right? What extra value does slack give me other than a decent web client?
Whether on purpose or by accident (or simply by mental bias) they seriously misrepresent the amount of people for whom JavaScript is blocked, not loading, disabled by default for unknown websites (me) or not available for any other reason.
Website owners and creators should at least have that information as a reliable metric to base their development choices on.
Could I ask what tech you're using for the graph data? I'm working on a similar SaaS (not analytics) which requires graphs. I'm a DevOps engineer for an ISP, and I do a lot of work with things like Graphite/Carbon, Prometheus and so on - but I can't seem to settle on what to use for personal projects. Do you use a TSDB at all? Or are you just storing it in SQL for now?
I've had blogs do quite well on occasion and when that happens, GA seems to see > 75% of what the server logs do. And that's with a tech audience.
I can show the code, I will probably do this in my next blog post, but that does not guaranty anything.
> Modern security is based on proof, not on trust.
Is it? So if there is a hosted version of a open source tool, you are sure they use the same code on the hosted version a in the open source tool? It's still based on trust.
Also, Google knows how to make profiles and it knows the importance of that data amd keeping it safe. It is also somewhat answerable to Consumer groups,users,shareholders,regulatory bodies. Indie dev doesn't know how to make good profile, more likely to sell the data to make revenue. Not ridiculing indie devs, just ridiculing your assumptions that if a solo dev is an angel.
https://www.labnol.org/internet/sold-chrome-extension/28377/
<img src="/images/1px.png?"+screen.width+"x"+screen.height+"x"+screen.colorDepth+" />
https://www.baekdal.com/thoughts/inside-story-what-i-did-to-...
I'm well aware that all we have is "certification" and "audit" programs to verify their claims. I am also that these are less then perfect and they have been found out to misleading/"lying" before and appeared to prefer large fines rather then fix the the issue. It is 100% likely that their public statements don't match reality perfectly.
I posted the quote because there seemed to be a lack of understanding that this feature even exists in GA. The author of the Show HN post didn't even have a statement on how IP address logging (and various other PI in the GDPR sense) was handled when it was originally posted.
BTW, I think it's great that someone is starting fresh with privacy in mind but even with them we will still no way of trusting what they do with the packets sent their way...
<script> document.write("<img src=\"/images/1px.png?" + screen.width + "x" + screen.height + "x" + screen.colorDepth + "\" />"); </script>
I planned to do a write-up for a while now, I should finally get it done (my blog link is in my profile)
I want to build a brand around trust. If people self-host and say they use my software, but are doing different things behind the scenes, it would hurt my brand.
Simple Analytics does host the data themselves, so people know the numbers are not tampered with and the data is handled the same for every customer. If people use our scripts directly, vistors of those websites can be sure that we respect their privacy with our values.