zlacker

I wish services didn't store IPs at all.

If abuse is an issue, why not hash the IP with a nonce?

replies(8): >>codetr+A >>kadoba+O >>gizmo6+q1 >>uses+22 >>aneutr+B2 >>nullc+S2 >>Genera+Z2 >>8note+O6

>>xvecto+(OP)
IPv4 space is small so they will subpoena the nonce and find what the original IP was

>>xvecto+(OP)
For ipv4 is there a difference between storing IPs and storing their hash with a nonce? You can calculate the hash of every IP address in reasonable time, so it's reversible.

Only benefit I can think of is you can forget the nonce and now the data is securely useless, if the nonce was secure, but that doesn't seem that useful really.

replies(1): >>xvecto+V5

>>xvecto+(OP)
There are only 2^32 possible IP addresses. You can brute-force that on a personal laptop.

replies(2): >>vgaldi+d2 >>xvecto+I9

>>xvecto+(OP)
Hm, I'm confused, usually the whole point of storing an IP is in case the visitor uses the platform to do something illegal, like a death threat. Without the original IP law enforcement can't subpoena the ISP, etc. But also as someone else said, if you use a nonce, and I think you mean salt, then it can be cracked nearly instantaneously anyways due to the small space of IPv4 (~4 billion).

replies(2): >>ed2551+15 >>njharm+ds

>>gizmo6+q1
There's even less 'usable' ones, when you exclude private ranges etc...

replies(1): >>olliej+25

>>xvecto+(OP)
Sometimes there's a forensic purpose. For example, you want to know which servers exfiltrated your data and to which IP.

Or for audit purposes (e.g. you might need to prove to some regulator no outside access was made, which is stupid but ...)

>>xvecto+(OP)
There are only 2^32 IPv4 addresses, if you know the nonce you just try them all... no privacy provided.

If you don't know the nonce, you can't match against other users-- so not useful for abuse.

But I'm skeptical re: abuse uses. For commenters, sure-- you may need to store IPs to combat abuse. But for readers? At most you would need sampled data or in-memory counters (e.g. to catch high volume bots).

Unfortunately, there really isn't any penalty for failing to minimize private data collection.

replies(4): >>xvecto+J5 >>xwolfi+H7 >>hyperb+zg >>spacem+pa1

>>xvecto+(OP)
Anti-spam/anti-abuse operations often look a lot like tracking. There's no point in knowing a single request's IP, or a single user's; to spot the patterns, you have to be able to join it with others.

>>uses+22
Wouldn’t be relevant for a news website.

replies(1): >>LatteL+S5

>>vgaldi+d2
And if you further restrict to US service provider ranges

>>nullc+S2
If you use a difficult hash function that takes ~1 seconds to calculate then it would take over 120 years to iterate through the IPv4 address space. At the very least, this could cut down on dragnet surveillance

replies(4): >>542458+26 >>nullc+O7 >>gizmo6+S7 >>b9a2ca+Vm

>>ed2551+15
I wouldn't be so sure, lots of papers have comment sections and they're pretty rammed with death threats on any social issue sadly...

replies(1): >>ed2551+om

>>kadoba+O
I think if we use a difficult enough hash function it should be okay? With 4 billion IPv4 addresses it would take 120+ years to iterate through all of them. You could probably rotate the nonce periodically, making it effectively worthless to pre-compute any table. But this gets complicated fast.

replies(3): >>rpadov+L8 >>someth+oa >>kadoba+Tk

>>xvecto+J5
Yes, but then I’m burning a second of compute time every time I want to log something.

Also, by removing unlikely candidates (IPs owned by irrelevant entities or that are not US based) you could get the search range much much smaller, and with the FBIs budget you could probably compute it all in a few days even with a 1-second hash time.

>>xvecto+(OP)
There's some structure to the IP that can be somewhat useful for tracking identity.

Its imperfect, but you'd expect definitely good folks to look a certain way

>>nullc+S2
The best model would be to display publicly commenters IPs, never store readers', store error logs (like people bruteforcing a password).

You d have a triple virtuous effect: people would stop being such insuferable asses once they understand basically their name is on the comment, readers would be completely safe because why not and abusers would be logged still.

It's even probably what most websites do: it news to me to keep the IP of every visitor, I'd have pruned them.

replies(2): >>techbi+59 >>taneq+Ta

>>xvecto+J5
But then a single user clicking on links quickly would bring your webserver to its knees. So much for using those addresses to combat abuse... :)

Plus the FBI could probably narrow their search to a few hundred thousand addresses (relevant ISPs, no unroutable/multicast/etc), then only use the list to confirm.

Finally, if it takes 120 years on one core, it'll take 1.4 months on 1000 cores. I'm willing to be the FBI has access to more computing power than I do. ~100 CPU years isn't a particularly daunting amount of computing work, even for fairly low stakes research.

That search would also decode all addresses in the logs, not just one targeted one...

>>xvecto+J5
This requires that you add ~1 second of latency to every request that requires you to hash the IP. Even if we assume relatively aggressive caching, this is still incredibly unacceptable from a user experience perspective.

Assuming you do that, you are looking at about 1193046 hours to hash the entire address space. More specifically, you are looking at 1193046 CPU hours.

You can rent a 96 vCPU c5.24xlarge instance from AWS for a rate of $4.08/hour; or $0.0425/CPU-Hour. Assuming this offers the same per-cpu hashrate as the general purpose web-server, you are looking at a cost of $50,704 to construct a rainbow table. That is no where near a prohibitive sum of money.

You can probably reduce the cost by shopping around for compute or using bare metal. You could see significant cost reductions by using hashing optimized ASICs.

Combine this with the fact that no website is going to spend 1000ms just computing the hash for every request (even if you allow for caching). And the fact that they can probably narrow down the address space they are interested in considerably if they wanted to save money.

2^32 is just too small of an asymmetry between legitimate use and an attack to be a viable defense.

replies(1): >>xvecto+m9

>>xvecto+V5
Why 120 years? It is easily parallelized, and with any cloud provider you can launch hundreds of thousands of computing unit in seconds. I'd say, as a private citizen, I can create a rainbow table of the IPv4 space in half a day, more or less?

>>xwolfi+H7
Any examples? I like the transparency and self-filtering. What is/isn't this approach suitable for? Anonymous is a very common pen-name.

>>gizmo6+S7
From a user experience perspective, you can perform the computation asynchronously. There are also hash algorithms resistant to ASIC.

But yeah, everything else you said makes sense.

replies(1): >>gizmo6+ne

>>gizmo6+q1
If you use a hard hash function you cannot brute force that on a laptop - not even a tenth of that. You can, however, spin up compute instances to brute force it in a few days if you have $50k lying around.

replies(2): >>zeroim+jm >>xxs+5Q

>>xvecto+V5
Except you are still storing the nonce/salt (not sure which you are proposing)...which means you can reverse it, so the data is subpoenable. It doesn't really buy anyone anything, in this scenario. It could help if the logs were stolen, but that isn't what is being discussed here.

>>xwolfi+H7
And then my modem reconnects and I get a new IP that used to belong to some insufferable asshole, and suddenly I’m blocked / blackholed / shadowbanned everywhere and some vigilante is flood pinging me.

replies(2): >>nexuis+9b >>Scound+SM

>>taneq+Ta
Bingo. IRC tried the strategy of banning users by IP and half the time you'd end up k-lining entire countries because their ISPs were too cheap to buy more endpoints.

>>xvecto+m9
And now you have a ~1000ms latency between when some events happen, and when you can log them. Even assuming all such events get logged, you will be left with a jumbled mess of out-of-order events.

replies(1): >>tremon+6X

>>nullc+S2
But of course, the real reason is that those ips are worth analytics $$$.

replies(1): >>cout+861

>>xvecto+V5
You could try to do a more difficult hash or something (bcrypt maybe?) but I don't know if it's a very good idea. I think you'd spike your latency, open yourself to DoS attacks or only minorly inconvenience anyone reversing the hashes, or some combination of those.

replies(1): >>xxs+mQ

>>xvecto+I9
What are the odds that a website will run a computationally hard hash function on every single HTTP request just so it can log something less sensitive than an IP address?

replies(1): >>josefx+9y

>>LatteL+S5
Oh jeeze with comments enabled anything goes.

>>xvecto+J5
1 second on a CPU can easily be 100x faster on a GPU, then distributed over 1000's of GPUs. For reference argon2 was supposed to be an ASIC-resistant, GPU-resistant memory-hard hashing algorithm, but a K20X from 2013 is 5x faster than a CPU [1] and GPUs have only gotten faster since then compared to CPUs.

[1]: https://github.com/WebDollar/argon2-gpu

>>uses+22
IP is not a person identifier. They should not be allowed as evidence in criminal cases.

replies(1): >>gizmo6+Ft

>>njharm+ds
There is more to investigating then gathering admissible evividence.

For example, if you are talking with Alice and she says that she heard from Bob chat Charlie was in the office at a weird hour on the day in question investigators have gains nothing that they can admit as evidence [0]. However, there is nothing that anyone (defense or a third party) can do challenge this portion of the investigation other then keep it out of evidence, and the investigators are free to follow up with either Bob or Charlie to get something that would be admissible.

>>zeroim+jm
The website could cache the hash for an hour or two.

>>taneq+Ta
Maybe in the 56k days, but my DOCSIS ISP rarely re-assigns IPs.

>>xvecto+I9
>force that on a laptop

They said: "personal computer", which could easily have 2x GPUs and 16+ cores. Heck, laptops nowadays can have a pretty good discrete GPU.

Using password grade hash with a =nonce= is absolutely no way to be accomplished per each request. The nonce would have to be the same for multiple uses - hence NOT a 'nonce'. The sharing of the said non-nonce would require a form a replicated Map (or IP-sticky processing with a local map). It's rather convoluted solution for absolutely no benefit as it's still not hard to brute force.

Storing it in such a way - slow hash + salt yields no benefits for debugging either, so I wonder why would you do so? Password hashes are useful for proving a match with an unknown plain text (while making it expensive to brute force) - so what would be the exact purpose of having non-nonce+IP?

>>kadoba+Tk
>(bcrypt maybe?) but I don't know if it's a very good idea

b/scrypt and all other password grade hashes are slow on purpose but they are slow per each use. Imagine the processing takes 0.1s (which is on the low side of hardness) per each request - you just killed all your servers w/o any designated DoS. If you abandon the nonce and use the same salt multiple times (so the computation is amortized), it'd take a replicated cache of IP->hash and even then it still doesn't accomplish much...

>>gizmo6+ne
Why does your logging system rely on the order of entry insertion and not on the entry timestamp?

>>hyperb+zg
It's also useful forensic data if your site is ever hacked.

>>nullc+S2
An example of using IPs to combat abuse is Wordfence. It's a WordPress plugin which blocks traffic from known abusive IPs. A quick glimpse at the "live traffic" for one of my websites reveals several IPs within the last hour that have attempted to access the site which were blocked.

A site I was repairing after a hack fortunately had server logs which included IP data. That IP allowed me to identify the specific exploit used.

So, there are definitely uses for IP data in security terms.