Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe. (CAPTCHAs use similar techniques but result in more UI friction.) Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist. We see this directly: some businesses don't use Stripe.js and they are often suddenly and unpleasantly surprised when attacked by sophisticated fraud rings.
If you don't want to use Stripe.js, you definitely don't have to (or you can include it only on a minimal checkout page) -- it just depends how much PCI burden and fraud risk you'd like to take on.
We will immediately clarify the ToS language that makes this ambiguous. We'll also put up a clearer page about Stripe.js's fraud prevention.
(Updated to add: further down in this thread, fillskills writes[1]: "As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase." This is what we're aiming for (and up against) with Stripe Radar and Stripe.js, and why we work on these technologies.)
I also took out "your". That's a standard moderation trick since second-person pronouns in titles tend also to be clickbait: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
It lets you use stripe.js (thus getting the PCI compliance benefits) without Stripe being able to spy on your visitors.
We more or less do this today, but if you need to setup a new workflow to take payments (one-time or recurring) there's a lot of work already done for you in the Stripe.js ecosystem.
So in our case, to take one-time payments it would've been more work to stand-up the checkout page itself and all of that work behind the scenes. It is much easier to just create a checkout session (basically just hitting the DB to pull the outstanding payment record and creating a stripe customer if one doesn't already exist) and redirect to Stripe's checkout.
The PCI part isn't overstated either, that checkout session lives on Stripe's domain not ours and that's where payment method is collected & stored within Stripe so you're not having to worry about it.
It's pretty nifty, give it a look - https://stripe.com/docs/payments/checkout/one-time
But, yes, part of the intent here is to enable us to achieve better ROC[1] in our models and to block more fraud while also encumbering fewer false positives. From our testing, it's very clear that these bot-detection techniques do substantially improve the accuracy when compared to other, coarser heuristics.
[1] https://en.wikipedia.org/wiki/Receiver_operating_characteris...
We particularly edit titles that users have started complaining about: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... Experience has shown that to be the way to minimize off-topic title complaints (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).
The meaning of the title in this case hasn't changed. Websites don't make noises when they record things.
Edit: out of curiosity, I looked for some other cases where we took out the word 'silently'. Here are some:
https://news.ycombinator.com/item?id=22678471 (changed from "~30% of Android apps silently inspect other apps installed on your smartphone")
https://news.ycombinator.com/item?id=20453115 (changed from "Apple is silently updating Macs * again* to remove Zoom's insecure software")
https://news.ycombinator.com/item?id=16715835 (changed from "Giraffes Silently Slip onto the Endangered Species List")
People have made HN title trackers over the years. My favorite is https://hackernewstitles.netlify.app/ (via https://news.ycombinator.com/item?id=21617016). It's not perfect because it can't distinguish what submitters did from what moderators did, doesn't know what the software changed, etc. But it gives the basic picture.
> What makes you believe that exactly?
I've read all the StackOverflow and Github issue posts I can find related to this issue.[0,1,2,3,4] The overall sentiment from developers is that they're surprised and don't want Stripe to send this information. That said, there's obviously a selection bias because the ones who consider it expected behavior don't post.
> If you include stripe.js on your About page, all bets are off for that page. You can believe all you want here, but you have explicitly included some 3rd js code, so feigning surprise that it gets executed is shallow.
Sure, I'm ultimately responsibility for what runs on my site. I believe Stripe is also responsible for clearly disclosing the behavior of their library, and I feel like open critique is an appropriate way to encourage that.
[0] https://github.com/stripe/react-stripe-elements/issues/257
[1] https://github.com/stripe/react-stripe-elements/issues/99
[2] https://stackoverflow.com/questions/45718026/stripe-js-makin...
[3] https://stackoverflow.com/questions/56481458/why-does-stripe...
[4] https://stackoverflow.com/questions/55904278/reduce-network-...
We're not transparent about enterprise pricing since our costs on any given user are so country/business model/implementation-dependent. It's less that our sales team isn't willing to share the details and more that the models themselves are very complicated and change frequently. (Visa and Mastercard are both making big changes to their pricing this year, for example, and that will change almost all of them.)
Instead of loading it on startup, I always load the library as the last step before the checkout flow is initiated. Here is a working example of how to do this for anyone who's curious.
Perhaps we all have a natural unconscious bias against being "edited" ("you're not in control of me [or the OP]!!"). But seeing the edits over time in the open really makes one appreciate the moderation work. Maybe it's worth making this more official somehow (e.g., adding a footnote in the submission page or to the FAQ) - because like you say, it must surely minimize off-topic discussions as well.
Anyway, thanks for your work!
It doesn't matter whether "Stripe.js collects this data only for fraud prevention" or if it works in practice. Under CalOPPA [1], Stripe still has to disclose the collection of the data, and (among other things) allow customers to opt out of collection of this data, and allow customers to inspect the data collected. Stripe's privacy policy refers to opt-out and inspection rights about certain data, but AFAICT not this.
[This is not legal advice]
[1] http://leginfo.legislature.ca.gov/faces/codes_displayText.xh...
More broadly, I assure you that Stripe.js and our fraud prevention technologies are very carefully designed with full compliance with the relevant California (and other) statutes in mind. I’d be happy to connect you with our legal team if you’d like to discuss this in more detail. (I'm patrick@stripe.com.)
https://news.ycombinator.com/item?id=20429573
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
What about a face? Fingerprints? Voice? Aren't those identifiable information even though it didn't make your (common sensical) short list? Mouse movements are on the same order of specificity.
Edit: Also not giving legal advice.
Edit2: Please see https://news.ycombinator.com/item?id=22939145
https://www.researchgate.net/publication/221325920_User_re-a...
https://medium.com/stanford-magazine/your-computer-may-know-...
It's not unexpected when they tell you to include it on every page:
As was in their docs ages ago and still now: https://stripe.com/docs/js
>Including Stripe.js >Include the Stripe.js script on each page of your site—it should always be loaded directly from https://js.stripe.com, rather than included in a bundle or hosted yourself.
>To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.
... they are asking you to enable them to track your user's interaction with your entire website.
It's a library everyone can technically analyze, yes, but by 1) using ever-changing obfuscation that requires a lot of work to RE, and 2) constantly changing the client-side logic itself, it makes the work of the adversaries a lot harder and more tedious, and means either fewer of them will consistently succeed, or more of them will be forced to become more centralized around solutions/services that've successfully solved it, which means Stripe can focus-fire their efforts a bit more.
Of course there's also a lot going on on the backend that'll never be seen, but the adversary is trying to mimic a legitimate user as much as they can, so if the JavaScript is totally unobfuscated and stays the same for a while, it's a lot easier for them to consistently trace exactly what data is being sent and compare it against what their system or altered browser is sending.
It's cat-and-mouse across many dimensions. In such adversarial games, obscurity actually can and often does add some security. "Security by obscurity is no security at all" isn't exactly a fallacy, but it is a fallacy to apply it universally and with a very liberal definition of "security". It's generally meant for things that are more formal or provable, like an encryption or hashing algorithm or other cryptography. It's still totally reasonable to use obscurity as a minor practical measure. I'd agree with this part of https://en.wikipedia.org/wiki/Security_through_obscurity: "Knowledge of how the system is built differs from concealment and camouflage. The efficacy of obscurity in operations security depends by whether the obscurity lives on top of other good security practices, or if it is being used alone. When used as an independent layer, obscurity is considered a valid security tool."
For example, configuring your web server to not display its version on headers or pages is "security by obscurity", and certainly will not save you if you're running a vulnerable version, but may buy you some time if a 0-day comes out for your version and people search Shodan for the vulnerable version numbers - your site won't appear in the list. These kinds of obscurity measures of course never guarantee security and should be the very last line of defense in front of true security measures, but they can still potentially help you a little.
In the "malware vs. anti-virus" and "game cheat vs. game cheat detection software" fights that play out every day, both sides of each heavily obfuscate their code and the actions they perform. No, this never ensures it won't be fully reverse engineered. And the developers all know that. Given enough time and dedication, it'll eventually happen. But it requires more time and effort, and each time it's altered, it requires a re-investment of that time and effort.
Obfuscation and obscurity is arguably the defining feature and "value proposition" of each of those four types of software. A lot of that remains totally hidden on the backend (e.g. a botnet C2 web server only responding with malware binaries if they analyze the connection and believe it really is a regular infected computer and not a security researcher or sandbox), but a lot is also present in the client.
Most of your examples are quite low-level, but it's much harder to keep things hidden within the constraints of the browser sandbox when you have to interface with standard APIs which can be easily instrumented.
You misunderstand what personally identifiable information is. Each individual letter of my name is also not identifiable, the letters of the alphabet are not PII, but when stored in in the same database row, the separate letters do form PII no matter that you stored them separately or even hashed or encrypted them. My phone number is also not something that just anyone could trace to my name, but since my carrier stores my personal data together with the number (not to mention the CIOT database where law enforcement can look it up at will), there exists a way to link the number to my person, making it PII. Everything about me is PII, unless you make it no longer about me.
Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.
Another example[1] mentioned on a website that the Dutch DPA links to (from [2]) is location data. Coordinates that point to somewhere in a forest aren't personal data, but if you store them with a user ID...
[1] (English) https://www.privacy-regulation.eu/en/4.htm
[2] (Dutch) https://autoriteitpersoonsgegevens.nl/nl/over-privacy/persoo...
Relevant text for those who want to know what GDPR says about this: "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL... (So one has to prove that it 'significantly' affects you, but I guess e-commerce is commonplace enough that being banned from a common platform could be argued to significantly impact you. But IANAJudge so this is up for interpretation by a real judge.)
[1] https://en.wikipedia.org/wiki/History_of_advertising#16th%E2...
"The simplest way for you to be PCI compliant is to never see (or have access to) card data at all. Stripe makes this easy for you as we can do the heavy lifting to protect your customers’ card information." [1]
Interesting question whether Stripe incurs statutory privacy duties to the web vendor and the buyer separately. I would imagine so, because given the "triangular" nature of this kind of Stripe transaction, Stripe ends up collecting data from two parties.
[This is not legal advice]
Good chance I'm missing something, or there's some kind of protections in place around this.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/if...
https://developers.google.com/authorized-buyers/rtb/download...
1. reCAPTCHA doesn't send information until you explicitly call their library. Stripe's library immediately begins reporting to data as soon as the script is loaded.
2. reCAPTCHA is explicit in its documentation that it's collecting behavior about your users. Its sole purpose is to track user behavior, so implementers understand that it does this. Stripe's main purpose is to accept payment information, and it is currently not transparent about how it collects user behavior to achieve that. I don't believe that most implementers understand the nature of Stripe's data collection.
We have two possible options here:
1. Client sends mouse-data + card info to a server, server checks the mouse data, turns it into a fraudPercent, and only stores that percent. That seems to be what they're doing now.
2. Client sends mouse data, gets back a unique nonce, and then sends that nonce to the server with card info. The server could have either stored or discarded the mouse info. It's perfectly possible the nonce was stored with the mouse info.
Those two things seem totally identical. The nonce by necessity must be unique (or else one person could wiggle their mouse, and then use that one nonce to try 1000 cards at once), and you can't know that they don't store the full mouse movement info with the nonce.
You gain nothing by adding that extra step other than some illusion of security.
Note, cloudflare + tor has a similar problem that they tried to solve with blind signatures (see https://blog.cloudflare.com/the-trouble-with-tor/), but that hasn't gone anywhere and requires a browser plugin anyway. It's not a viable solution yet.
I haven't analyzed it and can't say this with any certainty, but my guess is that you're probably right: they're focusing primarily on backend analysis and ML comparing activity across a massive array of customers. This is different from smaller security firms who have a lot less data due to fewer customers, and a kind of sampling bias of customers who are particularly worried about or inundated by fraud.
They may be less interested in suspicious activity or fingerprinting at the device level and more interested in it at the payment and personal information level (which is suggested by articles like https://stripe.com/radar/guide).
Pure, uninformed speculation, but it's possible that if they get deeper into anti-fraud in the future (perhaps if fraudsters get smarter about this higher layer of evasion), they might supplement the data science / finance / payment oriented stuff with more lower-level device and browser analysis, in which case I wouldn't be surprised if they eventually separate out some of the anti-fraud/security parts into an obfuscated portion. (Or, more likely, have Stripe.js load that portion dynamically. Maybe they're already doing this, even? Dunno.)
> To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.
> To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.
There are also indications on the product page for Stripe Radar and other places where it is obvious they are doing device fingerprinting.
I can accept Stripe's explanation given the nature of their product and the effectiveness of Stripe Radar. That said, I think they need to make some changes. First of all, they should lay it out clearly that the tracking is high-resolution and includes mouse movement. Second, the tracking should be disabled by default and more closely tied to the usage of Radar. Most businesses don't need Radar until they reach a certain scale. Stripe could encourage the use of Radar when the account transaction volume reaches a certain size and use that opportunity to explain the benefits of enabling the tracking system. It should be optional, even then, though.
I put together a proof-of-concept using a 'same-domain frame', no secondary domains or apps. The idea is separation over security, so you can unload without any side hustle. Tho without a second domain you're relying on Stripe being as trustworthy as they are, and not looking to actively undermine your sandboxing attempts [which I think is ok - we embedded their library in the first place].
The idea of marking every single edit, or publishing a complete moderation log, feels like asking for trouble. I fear that it would lead to more objections of the litigious, bureaucratic, meta type. Even though it's a tiny minority of users who make such objections, they have a lot of energy for it and there are many more of them than us. That kind of thing could quickly burn us out, like an unintended DoS attack. On the other hand, maybe it would just work fine; it's hard to know.
Also, I'm skeptical that it would create more confidence in the site, because the users who want to feel that way basically already do, and the ones who don't probably wouldn't be persuaded by more data. There's always going to be something that's not included, or the suspicion that there is.
Because of this, the way we address concerns is to answer people's individual questions, here and by email. We're happy to do that, and there basically isn't anything we aren't willing to explain. That's by design. We try never to do anything that isn't defensible to the community. Even when there are genuine secrets that can't be spelled out, like how the anti-abuse software works, we can say what they are at a high level and why a secret is needed. Those cases are rare.