Stripe records user movements on its customers' websites

>>mtlync+(OP)
Stripe cofounder here. The question raised ("Is Stripe collecting this data for advertising?") can be readily answered in the negative. This data has never been, would never be, and will never be sold/rented/etc. to advertisers.

Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe. (CAPTCHAs use similar techniques but result in more UI friction.) Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist. We see this directly: some businesses don't use Stripe.js and they are often suddenly and unpleasantly surprised when attacked by sophisticated fraud rings.

If you don't want to use Stripe.js, you definitely don't have to (or you can include it only on a minimal checkout page) -- it just depends how much PCI burden and fraud risk you'd like to take on.

We will immediately clarify the ToS language that makes this ambiguous. We'll also put up a clearer page about Stripe.js's fraud prevention.

(Updated to add: further down in this thread, fillskills writes[1]: "As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase." This is what we're aiming for (and up against) with Stripe Radar and Stripe.js, and why we work on these technologies.)

[1] https://news.ycombinator.com/item?id=22938141

>>voz_+c9
Ok, we've abandoned silence in the title above. I think that's redundant anyhow.

I also took out "your". That's a standard moderation trick since second-person pronouns in titles tend also to be clickbait: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

>>mtlync+(OP)
This is a good reason to use a technique like cperciva's payment iframe: https://www.paymentiframe.com/

It lets you use stripe.js (thus getting the PCI compliance benefits) without Stripe being able to spy on your visitors.

>>wolco+id
Sure, you could.

We more or less do this today, but if you need to setup a new workflow to take payments (one-time or recurring) there's a lot of work already done for you in the Stripe.js ecosystem.

So in our case, to take one-time payments it would've been more work to stand-up the checkout page itself and all of that work behind the scenes. It is much easier to just create a checkout session (basically just hitting the DB to pull the outstanding payment record and creating a stripe customer if one doesn't already exist) and redirect to Stripe's checkout.

The PCI part isn't overstated either, that checkout session lives on Stripe's domain not ours and that's where payment method is collected & stored within Stripe so you're not having to worry about it.

It's pretty nifty, give it a look - https://stripe.com/docs/payments/checkout/one-time

>>swyx+Ea
We've changed the title - see https://news.ycombinator.com/item?id=22937739

>>asclep+si
I'm sorry to hear that! Feel free to email me (patrick@stripe.com) and I'll connect you with the team if you'd like us to do a deeper dive.

But, yes, part of the intent here is to enable us to achieve better ROC[1] in our models and to block more fraud while also encumbering fewer false positives. From our testing, it's very clear that these bot-detection techniques do substantially improve the accuracy when compared to other, coarser heuristics.

[1] https://en.wikipedia.org/wiki/Receiver_operating_characteris...

>>George+7k
We edit titles all the time, including sensational titles about Google or anything else. This is routine. You probably wouldn't remember such edits because you probably wouldn't notice them in the first place.

We particularly edit titles that users have started complaining about: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... Experience has shown that to be the way to minimize off-topic title complaints (https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...).

The meaning of the title in this case hasn't changed. Websites don't make noises when they record things.

Edit: out of curiosity, I looked for some other cases where we took out the word 'silently'. Here are some:

https://news.ycombinator.com/item?id=22678471 (changed from "~30% of Android apps silently inspect other apps installed on your smartphone")

https://news.ycombinator.com/item?id=20453115 (changed from "Apple is silently updating Macs * again* to remove Zoom's insecure software")

https://news.ycombinator.com/item?id=16715835 (changed from "Giraffes Silently Slip onto the Endangered Species List")

People have made HN title trackers over the years. My favorite is https://hackernewstitles.netlify.app/ (via https://news.ycombinator.com/item?id=21617016). It's not perfect because it can't distinguish what submitters did from what moderators did, doesn't know what the software changed, etc. But it gives the basic picture.

>>huhten+Hk
>> I believe many developers integrate with Stripe expecting that their JS library executes and shares data only on the pages where Stripe UI elements appear on the page.

> What makes you believe that exactly?

I've read all the StackOverflow and Github issue posts I can find related to this issue.[0,1,2,3,4] The overall sentiment from developers is that they're surprised and don't want Stripe to send this information. That said, there's obviously a selection bias because the ones who consider it expected behavior don't post.

> If you include stripe.js on your About page, all bets are off for that page. You can believe all you want here, but you have explicitly included some 3rd js code, so feigning surprise that it gets executed is shallow.

Sure, I'm ultimately responsibility for what runs on my site. I believe Stripe is also responsible for clearly disclosing the behavior of their library, and I feel like open critique is an appropriate way to encourage that.

[0] https://github.com/stripe/react-stripe-elements/issues/257

[1] https://github.com/stripe/react-stripe-elements/issues/99

[2] https://stackoverflow.com/questions/45718026/stripe-js-makin...

[3] https://stackoverflow.com/questions/56481458/why-does-stripe...

[4] https://stackoverflow.com/questions/55904278/reduce-network-...

>>michae+Em
I apologize that anything about the pricing change felt sneaky. (We tried to do the opposite: we emailed every single impacted customer!) I posted a few thoughts about the refund change here: https://news.ycombinator.com/item?id=22893388.

We're not transparent about enterprise pricing since our costs on any given user are so country/business model/implementation-dependent. It's less that our sales team isn't willing to share the details and more that the models themselves are very complicated and change frequently. (Visa and Mastercard are both making big changes to their pricing this year, for example, and that will change almost all of them.)

>>seanwi+fp
Sure -- one very common form of attack is "card testing". Here's a quick summary: https://www.forbes.com/sites/tomgroenfeldt/2017/05/02/card-t....

>>Wesoly+7p
Almost like a...services agreement (https://stripe.com/legal#section_d)?

>>mtlync+(OP)
I use stripe.js in a number of my projects but I've never trusted it. Looks like my fears were justified.

Instead of loading it on startup, I always load the library as the last step before the checkout flow is initiated. Here is a working example of how to do this for anyone who's curious.

https://jsfiddle.net/167ajcbw/

>>dang+uk
I find https://hackernewstitles.netlify.app/ fascinating.

Perhaps we all have a natural unconscious bias against being "edited" ("you're not in control of me [or the OP]!!"). But seeing the edits over time in the open really makes one appreciate the moderation work. Maybe it's worth making this more official somehow (e.g., adding a footnote in the submission page or to the FAQ) - because like you say, it must surely minimize off-topic discussions as well.

Anyway, thanks for your work!

>>pc+P7
Stripe customer here. The question raised is, more broadly, "Is Stripe collecting this data in a legal and ethical way?" This too can be readily answered in the negative.

It doesn't matter whether "Stripe.js collects this data only for fraud prevention" or if it works in practice. Under CalOPPA [1], Stripe still has to disclose the collection of the data, and (among other things) allow customers to opt out of collection of this data, and allow customers to inspect the data collected. Stripe's privacy policy refers to opt-out and inspection rights about certain data, but AFAICT not this.

[This is not legal advice]

[1] http://leginfo.legislature.ca.gov/faces/codes_displayText.xh...

[2] https://stripe.com/privacy#your-rights-and-choices

>>threep+ut
https://stripe.com/privacy describes what we do in some detail (including disclosing that we use this kind of browsing data).

More broadly, I assure you that Stripe.js and our fraud prevention technologies are very carefully designed with full compliance with the relevant California (and other) statutes in mind. I’d be happy to connect you with our legal team if you’d like to discuss this in more detail. (I'm patrick@stripe.com.)

>>tmsh+Ws
Maybe we should publish a complete log after all. Especially with the title edits, we've been doing them for so long now that they really have become routine. It's pretty much a craft at this point—a very tiny and trivial craft, with many tiny rules and heuristics. I used to mildly resent having to do it, because titles feel so, again, trivial. But eventually it dawned on me why they are such an emotional thing. There's more about this here if anyone cares:

https://news.ycombinator.com/item?id=20429573

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...

>>mook+4v
This is a fair call-out. We have actually worked pretty hard to ensure that our Privacy[1] and Cookies[2] policies are clear and easy-to-read, rather than filled with endless boilerplate jargon. But we still did make a mistake by not have a uniquely clear document covering Stripe.js fraud prevention in particular.

[1] https://stripe.com/privacy

[2] https://stripe.com/cookies-policy/legal

>>Kalium+Qx
> first or last name, physical or email address, SSN, telephone number, or any contact method I am familiar with (maybe you know a way?)

What about a face? Fingerprints? Voice? Aren't those identifiable information even though it didn't make your (common sensical) short list? Mouse movements are on the same order of specificity.

Edit: Also not giving legal advice.

Edit2: Please see https://news.ycombinator.com/item?id=22939145

>>swsieb+qA
Not a lawyer, but not that surprised that the laws you refer to are growing technical loopholes. Here are a couple things that mouse movements can identify in case no one knows what I'm talking about:

https://www.researchgate.net/publication/221325920_User_re-a...

https://medium.com/stanford-magazine/your-computer-may-know-...

>>mtlync+Wg
>I believe many developers integrate with Stripe expecting that their JS library executes and shares data only on the pages where Stripe UI elements appear on the page. The fact that JS library runs on every page and sends data back to Stripe, even before the app calls the API, is unexpected.

It's not unexpected when they tell you to include it on every page:

As was in their docs ages ago and still now: https://stripe.com/docs/js

>Including Stripe.js >Include the Stripe.js script on each page of your site—it should always be loaded directly from https://js.stripe.com, rather than included in a bundle or hosted yourself.

>To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.

... they are asking you to enable them to track your user's interaction with your entire website.

>>adamby+CF
With these kinds of adversarial things, I think it's a mix of frontend and backend.

It's a library everyone can technically analyze, yes, but by 1) using ever-changing obfuscation that requires a lot of work to RE, and 2) constantly changing the client-side logic itself, it makes the work of the adversaries a lot harder and more tedious, and means either fewer of them will consistently succeed, or more of them will be forced to become more centralized around solutions/services that've successfully solved it, which means Stripe can focus-fire their efforts a bit more.

Of course there's also a lot going on on the backend that'll never be seen, but the adversary is trying to mimic a legitimate user as much as they can, so if the JavaScript is totally unobfuscated and stays the same for a while, it's a lot easier for them to consistently trace exactly what data is being sent and compare it against what their system or altered browser is sending.

It's cat-and-mouse across many dimensions. In such adversarial games, obscurity actually can and often does add some security. "Security by obscurity is no security at all" isn't exactly a fallacy, but it is a fallacy to apply it universally and with a very liberal definition of "security". It's generally meant for things that are more formal or provable, like an encryption or hashing algorithm or other cryptography. It's still totally reasonable to use obscurity as a minor practical measure. I'd agree with this part of https://en.wikipedia.org/wiki/Security_through_obscurity: "Knowledge of how the system is built differs from concealment and camouflage. The efficacy of obscurity in operations security depends by whether the obscurity lives on top of other good security practices, or if it is being used alone. When used as an independent layer, obscurity is considered a valid security tool."

For example, configuring your web server to not display its version on headers or pages is "security by obscurity", and certainly will not save you if you're running a vulnerable version, but may buy you some time if a 0-day comes out for your version and people search Shodan for the vulnerable version numbers - your site won't appear in the list. These kinds of obscurity measures of course never guarantee security and should be the very last line of defense in front of true security measures, but they can still potentially help you a little.

In the "malware vs. anti-virus" and "game cheat vs. game cheat detection software" fights that play out every day, both sides of each heavily obfuscate their code and the actions they perform. No, this never ensures it won't be fully reverse engineered. And the developers all know that. Given enough time and dedication, it'll eventually happen. But it requires more time and effort, and each time it's altered, it requires a re-investment of that time and effort.

Obfuscation and obscurity is arguably the defining feature and "value proposition" of each of those four types of software. A lot of that remains totally hidden on the backend (e.g. a botnet C2 web server only responding with malware binaries if they analyze the connection and believe it really is a regular infected computer and not a security researcher or sandbox), but a lot is also present in the client.

>>jimmas+6J
Reddit can sell you virtual coins: https://www.reddit.com/premium

>>meowfa+ZI
Thanks for a thoughtful reply (upvoted), but have you looked at the library in question? The code is minified, but there is not much obfuscation going on: https://js.stripe.com/v3/

Most of your examples are quite low-level, but it's much harder to keep things hidden within the constraints of the browser sandbox when you have to interface with standard APIs which can be easily instrumented.

>>Kalium+Qx
> Mouse movements are not a first or last name, physical or email address, [or one of a dozen other obvious examples]

You misunderstand what personally identifiable information is. Each individual letter of my name is also not identifiable, the letters of the alphabet are not PII, but when stored in in the same database row, the separate letters do form PII no matter that you stored them separately or even hashed or encrypted them. My phone number is also not something that just anyone could trace to my name, but since my carrier stores my personal data together with the number (not to mention the CIOT database where law enforcement can look it up at will), there exists a way to link the number to my person, making it PII. Everything about me is PII, unless you make it no longer about me.

Mouse movements may not be PII if you don't link it to a session ID, but then it would be useless in fraud detection because you don't know whose transaction you should be blocking or allowing since it's no longer traceable to a person.

Another example[1] mentioned on a website that the Dutch DPA links to (from [2]) is location data. Coordinates that point to somewhere in a forest aren't personal data, but if you store them with a user ID...

[1] (English) https://www.privacy-regulation.eu/en/4.htm

[2] (Dutch) https://autoriteitpersoonsgegevens.nl/nl/over-privacy/persoo...

>>mtlync+(OP)
So it's being used for anti-fraud, let's say that's fair. But what if you're accidentally classified as bot or fraudulent? The GDPR allows automated individual decision making in general, but there must (legally) be an option to ask a human for a second opinion. How does that work with Stripe?

Relevant text for those who want to know what GDPR says about this: "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL... (So one has to prove that it 'significantly' affects you, but I guess e-commerce is commonplace enough that being banned from a common platform could be argued to significantly impact you. But IANAJudge so this is up for interpretation by a real judge.)

>>mkolod+uH
Media businesses have been funded by advertising for hundreds of years (since the start of regular newspapers in the 1600s at least)[1]. Many internet businesses are more like media businesses than shops.

[1] https://en.wikipedia.org/wiki/History_of_advertising#16th%E2...

>>stanfo+oW
It’s specifically different in this case: a big part of Stripe's value to a web vendor is that Stripe can collect credit-card info directly from the buyer (thereby exempting the vendor from PCI compliance and other issues related to storing and processing CCs).

"The simplest way for you to be PCI compliant is to never see (or have access to) card data at all. Stripe makes this easy for you as we can do the heavy lifting to protect your customers’ card information." [1]

Interesting question whether Stripe incurs statutory privacy duties to the web vendor and the buyer separately. I would imagine so, because given the "triangular" nature of this kind of Stripe transaction, Stripe ends up collecting data from two parties.

[This is not legal advice]

[1] https://stripe.com/docs/security

>>mtlync+kf
Could you simply use an iframe with a sandbox attribute? Idea being you dynamically create an iframe, fill it with content (styles, postmessage scripts, what have you), then dynamically set a semi-restrictive sandbox before loading Stripe's library. When you're done (i.e. have a payment token in the parent) just remove the iframe. This way everything Stripe related is sandboxed and the script is unloaded as soon as you're finished with it.

Good chance I'm missing something, or there's some kind of protections in place around this.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/if...

>>Squish+r11
What user data doesn't google sell?

https://developers.google.com/authorized-buyers/rtb/download...

>>abunne+N91
I haven't used reCAPTCHA, but based on my understanding from Google's documentation[0], there are a few differences:

1. reCAPTCHA doesn't send information until you explicitly call their library. Stripe's library immediately begins reporting to data as soon as the script is loaded.

2. reCAPTCHA is explicit in its documentation that it's collecting behavior about your users. Its sole purpose is to track user behavior, so implementers understand that it does this. Stripe's main purpose is to accept payment information, and it is currently not transparent about how it collects user behavior to achieve that. I don't believe that most implementers understand the nature of Stripe's data collection.

[0] https://developers.google.com/recaptcha/docs/v3

>>mister+L71
How is the nonce not an association?

We have two possible options here:

1. Client sends mouse-data + card info to a server, server checks the mouse data, turns it into a fraudPercent, and only stores that percent. That seems to be what they're doing now.

2. Client sends mouse data, gets back a unique nonce, and then sends that nonce to the server with card info. The server could have either stored or discarded the mouse info. It's perfectly possible the nonce was stored with the mouse info.

Those two things seem totally identical. The nonce by necessity must be unique (or else one person could wiggle their mouse, and then use that one nonce to try 1000 cards at once), and you can't know that they don't store the full mouse movement info with the nonce.

You gain nothing by adding that extra step other than some illusion of security.

Note, cloudflare + tor has a similar problem that they tried to solve with blind signatures (see https://blog.cloudflare.com/the-trouble-with-tor/), but that hasn't gone anywhere and requires a browser plugin anyway. It's not a viable solution yet.

>>adamby+WQ
Yeah, theirs is far less obfuscated than most fraud/bot detection libraries I've seen. I believe almost all of the JS code I've seen from companies that primarily do fraud detection and web security is pretty heavily obfuscated. Here, it looks like Stripe.js is doing much more than just the fraud stuff - this is their client library for everything, including payment handling.

I haven't analyzed it and can't say this with any certainty, but my guess is that you're probably right: they're focusing primarily on backend analysis and ML comparing activity across a massive array of customers. This is different from smaller security firms who have a lot less data due to fewer customers, and a kind of sampling bias of customers who are particularly worried about or inundated by fraud.

They may be less interested in suspicious activity or fingerprinting at the device level and more interested in it at the payment and personal information level (which is suggested by articles like https://stripe.com/radar/guide).

Pure, uninformed speculation, but it's possible that if they get deeper into anti-fraud in the future (perhaps if fraudsters get smarter about this higher layer of evasion), they might supplement the data science / finance / payment oriented stuff with more lower-level device and browser analysis, in which case I wouldn't be surprised if they eventually separate out some of the anti-fraud/security parts into an obfuscated portion. (Or, more likely, have Stripe.js load that portion dynamically. Maybe they're already doing this, even? Dunno.)

>>falcol+8c
I mean, the docs literally spell this out, so I'm not sure how much you or the author of the article wants their hand held:

> To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.

https://stripe.com/docs/js

>>mtlync+(OP)
As a Stripe customer, I have to say it is pretty easy to deduce this from their documentation. They recommend loading Stripe.js on every page of your website rather than just the payment form. The given reason is to detect fraud. [1]

> To best leverage Stripe’s advanced fraud functionality, include this script on every page, not just the checkout page. This allows Stripe to detect anomalous behavior that may be indicative of fraud as customers browse your website.

There are also indications on the product page for Stripe Radar and other places where it is obvious they are doing device fingerprinting.

I can accept Stripe's explanation given the nature of their product and the effectiveness of Stripe Radar. That said, I think they need to make some changes. First of all, they should lay it out clearly that the tracking is high-resolution and includes mouse movement. Second, the tracking should be disabled by default and more closely tied to the usage of Radar. Most businesses don't need Radar until they reach a certain scale. Stripe could encourage the use of Radar when the account transaction volume reaches a certain size and use that opportunity to explain the benefits of enabling the tracking system. It should be optional, even then, though.

1: https://stripe.com/docs/js

>>mtlync+B21
My problem is primarily that I'm working with SPAs where a refresh really lowers the game.

I put together a proof-of-concept using a 'same-domain frame', no secondary domains or apps. The idea is separation over security, so you can unload without any side hustle. Tho without a second domain you're relying on Stripe being as trustworthy as they are, and not looking to actively undermine your sandboxing attempts [which I think is ok - we embedded their library in the first place].

https://codepen.io/theprojectsomething/full/ExVNEoZ

>>mtlync+(OP)
Author uses python one-liner to pretty-print JSON struture. There is marvelous MIT-licensed tool "jq" just for that and ton more: https://github.com/stedolan/jq

>>domado+mN
We nearly always post a comment when we change a URL: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... The most significant title edits get comments too: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que.... If we published a title log, URL changes could certainly be included.

The idea of marking every single edit, or publishing a complete moderation log, feels like asking for trouble. I fear that it would lead to more objections of the litigious, bureaucratic, meta type. Even though it's a tiny minority of users who make such objections, they have a lot of energy for it and there are many more of them than us. That kind of thing could quickly burn us out, like an unintended DoS attack. On the other hand, maybe it would just work fine; it's hard to know.

Also, I'm skeptical that it would create more confidence in the site, because the users who want to feel that way basically already do, and the ones who don't probably wouldn't be persuaded by more data. There's always going to be something that's not included, or the suspicion that there is.

Because of this, the way we address concerns is to answer people's individual questions, here and by email. We're happy to do that, and there basically isn't anything we aren't willing to explain. That's by design. We try never to do anything that isn't defensible to the community. Even when there are genuine secrets that can't be spelled out, like how the anti-abuse software works, we can say what they are at a high level and why a secret is needed. Those cases are rare.

>>mtlync+(OP)
For those that are working with SPAs or similar and are not overly affected by fraud, I've put together a simple example showing how to sandbox Stripe.js code and unload it when you're done. No secondary domains, no reverse engineering of the Stripe.js library. It also maintains a reasonable level of trust in Stripe, who deserve it.

https://codepen.io/theprojectsomething/full/ExVNEoZ

zlacker

Stripe records user movements on its customers' websites