zlacker

[parent] [thread] 156 comments
1. samwil+(OP)[view] [source] 2023-05-10 12:56:05
Using robots.txt as a model for anything doesn't work. All a robots.txt is is a polite request to please follow the rules in it, there is no "legal" agreement to follow those rules, only a moral imperative.

Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

In the age of AI we need to better understand where copyright applies to it, and potentially need reform of copyright to align legislation with what the public wants. We need test cases.

The thing I somewhat struggle with is that after 20-30 years of calls for shorter copyright terms, lesser restrictions on content you access publicly, and what you can do with it, we are now in the situation where the arguments are quickly leaning the other way. "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

In many ways an ai.txt would be worse than doing nothing as it's a meaningless veneer that would be ignored, but pointed to as the answer.

replies(30): >>shaneb+81 >>George+J1 >>Karuna+n2 >>brooks+Z2 >>bombol+e3 >>prepen+v3 >>mschus+U7 >>majews+79 >>FinnKu+Ja >>IanCal+ze >>none_t+Fe >>safety+fg >>Retric+Lj >>hoofhe+rr >>mort96+vr >>shadow+Fv >>FpUser+Qw >>hidden+KD >>drbawb+xE >>waffle+LG >>zitter+qH >>bachme+tP >>Macha+WV >>omoika+EW >>balaji+GW >>User23+791 >>quenix+od1 >>yafbum+Mj1 >>wwwest+Ht1 >>dragon+Be2
2. shaneb+81[view] [source] 2023-05-10 13:01:25
>>samwil+(OP)
"Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare."

I like the idea of "ai.txt" but those who eat resources rarely listen to ToS. Frankly, I serve 503s to all identifiable bots, unless they are on my explicit allow list.

replies(2): >>always+x2 >>spc476+I41
3. George+J1[view] [source] 2023-05-10 13:04:25
>>samwil+(OP)
robots.txt works for the major search engines who voluntarily abide by it, so it isn't a failed system. Just because it doesn't work on everybody doesn't mean it's useless.
4. Karuna+n2[view] [source] 2023-05-10 13:07:37
>>samwil+(OP)
On the contrary, it works perfectly well for normal, non-bad actors running services used by most of the public. That includes search engines and stuff like archive.org. A robots.txt set to deny all will result in your site not showing up on any search engine that matters.

It doesn't work for bad actors, but then again, nothing really does.

◧◩
5. always+x2[view] [source] [discussion] 2023-05-10 13:08:25
>>shaneb+81
Why not serve fake garbage indistinguishable from real content by a computer, like LLM output? Sending errors just incentivizes bot owners to fix the identifiable parts
replies(4): >>shaneb+z6 >>twelve+g7 >>ape4+n9 >>dspill+fc
6. brooks+Z2[view] [source] 2023-05-10 13:10:27
>>samwil+(OP)
> Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

Failing to solve every problem does not mean a solution is a failure.

From sunscreen to seatbelts, the world is full of great solutions that occasionally fail due to statistics and large numbers.

replies(4): >>samwil+d4 >>bileka+lb >>usrusr+Vc >>vlunkr+1q
7. bombol+e3[view] [source] 2023-05-10 13:11:33
>>samwil+(OP)
> "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

AI is being used to do copyright laundering, at the same time "we", the people who can't afford to run our own AI, are still subject to absurd rules that AI owners get to ignore, apparently.

replies(1): >>rhn_mk+J7
8. prepen+v3[view] [source] 2023-05-10 13:12:51
>>samwil+(OP)
> "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

While I’m sure others than you share this opinion, I don’t think it’s as uniform as the more common “shorten/rationalize copyright terms and fair use” crowd “we.”

I consider myself a knowledge worker and a pretty staunch proponent of floss and am perfectly fine with training AI on everything publicly available. While create stuff, I don’t make a living off selling particular copies of things I make, so my self preservation bias isn’t kicking in as much as someone who does want to sell items of their work.

But I also made some pretty explicit choices in the 90s based on where I thought IP would go so I was never in a position where I had to sell copies to survive. My decision was more pragmatic first and philosophical second.

I think someone entering the workforce now probably wants to align their livelihood with AI training on everything and not go against that. Even if US/Euro law limits training, there’s no way all other countries are going to, so it’s going to happen. And I don’t think it’s worth locking down the world to try to stop AIs from training on text, images, etc.

replies(1): >>JohnFe+65
◧◩
9. samwil+d4[view] [source] [discussion] 2023-05-10 13:16:31
>>brooks+Z2
Ok, fair point, I may be being a little hyperbolic. But my point is that it's not a system that we should copy for preventing the use of content in training AI. It would become a useless distraction.

If you "violate" a robots.txt the server administrator can choose to block your bot (if they can fingerprint it) or IP (if its static).

With an ai.txt there is no potential downside to violating it - unless we get new legislation enforcing its legal standing. The nature of ML models is that it's opaque what content exactly it's trained on, there is no obvious retaliation or retribution.

replies(4): >>Wowfun+w6 >>Burnin+c7 >>capabl+l7 >>jefftk+K8
◧◩
10. JohnFe+65[view] [source] [discussion] 2023-05-10 13:21:54
>>prepen+v3
Fair enough. But there should be some mechanism where people who don't want their works to contribute to AI training to be able to prevent that without having to resort to removing their works from the web.
replies(1): >>prepen+ca1
◧◩◪
11. Wowfun+w6[view] [source] [discussion] 2023-05-10 13:28:10
>>samwil+d4
> But my point is that it's not a system that we should copy for preventing the use of content in training AI.

I don't think that's what OP is envisioning based on their post!

◧◩◪
12. shaneb+z6[view] [source] [discussion] 2023-05-10 13:28:25
>>always+x2
"Why not serve fake garbage indistinguishable from real content by a computer, like LLM output?"

Serving more than the minimum wastes resources. Worse yet, a better solution would cost my time.

"Sending errors just incentivizes bot owners to fix the identifiable parts"

Sure, someone could make or configure their scraper perfectly. "Perfect" is now the table stakes though.

Edit:

My solution strives to cause an unproportional expense in order to circumvent. I want 10x on my time.

◧◩◪
13. Burnin+c7[view] [source] [discussion] 2023-05-10 13:30:59
>>samwil+d4
OP is trying to give helpful info to the AI, not set boundaries for it.
◧◩◪
14. twelve+g7[view] [source] [discussion] 2023-05-10 13:31:14
>>always+x2
it'd be cool to be able to fingerprint that garbage, too. Like, sprinkle some hashes here and there (or something like that) so that you can later uniquely look up your own "content" being stolen by chatbots and which ones.
replies(1): >>shaneb+n8
◧◩◪
15. capabl+l7[view] [source] [discussion] 2023-05-10 13:31:40
>>samwil+d4
> But my point is that it's not a system that we should copy for preventing the use of content in training AI

The purpose OP is suggesting in the submission is the opposite, help AI crawlers to understand what the page/website is about without actually having to infer the purpose from the content itself.

replies(1): >>Xelyne+sb
◧◩
16. rhn_mk+J7[view] [source] [discussion] 2023-05-10 13:33:35
>>bombol+e3
The barrier to running an AI model is getting lower every day, so the threshold for ignoring copyright is getting lower with it.
replies(1): >>gavinh+jb
17. mschus+U7[view] [source] 2023-05-10 13:34:36
>>samwil+(OP)
> Robots.txt has failed as a system, if it hadn't we wouldn't have captchas or Cloudflare.

That depends what you expect from it. For the purpose of limiting crawlers, at least the major search engines respect it.

◧◩◪◨
18. shaneb+n8[view] [source] [discussion] 2023-05-10 13:36:55
>>twelve+g7
You can. I can't think of the appropriate term though. Hopefully someone else chimes in here with a link.
◧◩◪
19. jefftk+K8[view] [source] [discussion] 2023-05-10 13:38:33
>>samwil+d4
> It's not a system that we should copy for preventing the use of content in training AI

I don't see the OP saying anything about "ai.txt" being for that? They're advocating it as a way that AIs could use fewer tokens to understand what a site is about.

(Which I also don't think is a good idea, since we already have lots of ways of including structured metadata in pages, but the main problem is not that crawlers would ignore it.)

replies(1): >>kmoser+CF
20. majews+79[view] [source] 2023-05-10 13:39:50
>>samwil+(OP)
> All a robots.txt is is a polite request to please follow the rules in it

At least in my country (Germany), respecting robots.txt is a legal requirement for data mining. See German Copyright Code, section 44b: https://www.gesetze-im-internet.de/urhg/__44b.html

(IANAL)

◧◩◪
21. ape4+n9[view] [source] [discussion] 2023-05-10 13:41:22
>>always+x2
I like this idea. Of course it would have to be only to robots that visit a page disallowed by the robots.txt
22. FinnKu+Ja[view] [source] 2023-05-10 13:47:32
>>samwil+(OP)
if you do data mining in the EU you are legally required to respect robots.txt afaik
replies(1): >>LawTal+cJ
◧◩◪
23. gavinh+jb[view] [source] [discussion] 2023-05-10 13:50:13
>>rhn_mk+J7
You are mistaken if you think companies will allow common people to ignore copyright on their IP.

The only IP that will be allowed to be stolen is that of other common people.

replies(2): >>alphan+Eg >>rhn_mk+nn
◧◩
24. bileka+lb[view] [source] [discussion] 2023-05-10 13:50:30
>>brooks+Z2
> Failing to solve every problem does not mean a solution is a failure.

There is something to be said though to OP's point where it's actually better to do nothing than an AI.txt because it can give a false sense of security, which is obviously not what you want.

replies(1): >>lelant+sS
◧◩◪◨
25. Xelyne+sb[view] [source] [discussion] 2023-05-10 13:51:25
>>capabl+l7
Isn't that the entire point of the semantic web?
replies(1): >>kmoser+4G
◧◩◪
26. dspill+fc[view] [source] [discussion] 2023-05-10 13:55:12
>>always+x2
> Sending errors just incentivizes bot owners to fix the identifiable parts

Nah. It'll just make them fake their identity so it is harder to tell the traffic is from a bot.

◧◩
27. usrusr+Vc[view] [source] [discussion] 2023-05-10 13:57:30
>>brooks+Z2
That's still not an argument to introduce ai.txt, because everything a hypothetical ai.txt could ever do is already done just as good (or not) by the robots.txt we have. If a training data crawler ignores robots.txt it won't bother checking for an ai.txt either.

And if you feel like rolling out the "welcome friend!" doormat to a particular training data crawler, you are free to dedicate as detailed a robots.txt block as you like to its user agent header of choice. No new conventions needed, everything is already on place.

replies(3): >>michae+It >>irobet+1y >>joshua+sB1
28. IanCal+ze[view] [source] 2023-05-10 14:04:48
>>samwil+(OP)
The poster wants the opposite - a way of explicitly helping AI systems/etc to use their site. If people ignore it, they're just giving up a bit of help.
replies(1): >>circui+Qq
29. none_t+Fe[view] [source] 2023-05-10 14:05:06
>>samwil+(OP)
Robots.txt is meant as an aid to crawlers. "This stuff is not useful to index," rather than a blocking mechanism
30. safety+fg[view] [source] 2023-05-10 14:11:46
>>samwil+(OP)
> "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

This gross generalization of other people's views on important issues is really offensive.

My view is that the Copyright Act of 1976 had it about right when they established the duration of copyright. My view is that members of Congress were handsomely rewarded by a specific corporation to carve out special exceptions to this law because they wanted larger profits. "We" didn't call the Copyright Term Extension Act of 1998 the "Mickey Mouse Act" for nothing. It's also no coincidence that Disney is now the largest media company in the world.

Reducing copyright term extension has everything to do with restoring competition and creativity to our economy, and reversing corruption that borders on white collar crime. It has nothing to do with AI. Don't recruit me into some bullshit argument that rewrites history and entrenches Disney's ill-gotten monopoly.

replies(10): >>casey2+kj >>ramble+Jn >>soperj+Gp >>samwil+tq >>lances+rs >>xhkkff+5u >>dclowd+Ev >>pachic+gH >>majorm+qb1 >>SergeA+Ni2
◧◩◪◨
31. alphan+Eg[view] [source] [discussion] 2023-05-10 14:13:34
>>gavinh+jb
You can’t steal an idea.
replies(1): >>gavinh+nH
◧◩
32. casey2+kj[view] [source] [discussion] 2023-05-10 14:24:35
>>safety+fg
Disney has a monopoly on media because they still have the copyright to their IP from the 20s? LOL!

Companies that can leverage this new wave of AI will have, in reality, 1000x the advantage that you believe Disney has.

replies(2): >>safety+fm >>Spivak+Qo
33. Retric+Lj[view] [source] 2023-05-10 14:26:30
>>samwil+(OP)
That’s factually incorrect. Germany for example does explicitly give machine readable permissions like robots.txt legal weight.

In general without a fair use exemption or permission from robots.txt saving a copy of a website’s content to your own servers is copyright infringement.

Purely factual information like Amazon’s prices isn’t protected by copyright, but if you want to save artwork or source files to train AI, that’s a copyright issue even before you get into the possibility of your AI being considered a derivative work.

◧◩◪
34. safety+fm[view] [source] [discussion] 2023-05-10 14:37:15
>>casey2+kj
Yes.

There's this little thing called brand value. Disney has one of the most valuable brands in the world. Forbes estimated it at being worth about $60 billion as I recall.

That brand was built heavily over many decades on IP that dates back to the 1920s, such as the most recognizable Disney character, Mickey Mouse. They manipulated the law to enhance the value of that IP and thereby gained an edge over their competitors. That's a big part of why they now enjoy such a dominant position.

None of this is especially controversial (you will get a very different spin from Disney of course).

If you want to comment about how business works you should read history and learn how business works first. AI luminary that you are, if you choose to remain ignorant then I guess this whole cycle will happen again with AI.

◧◩◪◨
35. rhn_mk+nn[view] [source] [discussion] 2023-05-10 14:42:01
>>gavinh+jb
I agree with you when you talk about places where companies can bully people just by threatening to sue them, and where the defender must have lots of money even if they are clearly in the right.

But AI does not change anything there. The problem of being sued into oblivion despite being right exists there even without it.

In places where defending does not cost money, this works out in favor of the individuals.

replies(1): >>gavinh+8I
◧◩
36. ramble+Jn[view] [source] [discussion] 2023-05-10 14:43:17
>>safety+fg
I'm not sure arguing for a company to rest on its laurels and keep feeding of an IP from 100 years ago is an argument for creativity and innovation.
replies(1): >>ejb999+eA
◧◩◪
37. Spivak+Qo[view] [source] [discussion] 2023-05-10 14:47:39
>>casey2+kj
The argument goes that copyright has allowed massive corporations to buy up and exert near total control over all of our shared stories. And when you own the cultural touchstones of whole generations that gives you power that no one else can ever wield.

There is a massive amount of amazing stories based on ancient myths because it's one of the few large corpora that isn't copywritten. Once you see it in media you can't unsee it. The only space where that kind of creativity can thrive anymore is fan-fiction which lives in weird limbo where it's illegal but the copyright owners don't care. And when you want to bring any of it to the mainstream you have to hide it, all of Ali Hazelwoods books are reworked fanfics because she can't use the actual characters that inspired her -- her most famous book "The Love Hypothesis" is a Reylo fic.

Go check out https://archiveofourown.org/media and see how many works are owned by a few large corporations.

◧◩
38. soperj+Gp[view] [source] [discussion] 2023-05-10 14:51:55
>>safety+fg
I think they nailed it with the original 1790 act. 14 years + 14 more is plenty.
replies(3): >>csalle+uB >>safety+YC >>aeturn+yX
◧◩
39. vlunkr+1q[view] [source] [discussion] 2023-05-10 14:53:10
>>brooks+Z2
I know it's getting pedantic, but sunscreen and seatbelts are a poor analogy. They do offer protection if you use them. robots.txt only offers protection if other people/robots choose to care about them.
◧◩
40. samwil+tq[view] [source] [discussion] 2023-05-10 14:54:59
>>safety+fg
My phrasing was absolutely not meant to be read as myself speaking for all, apologies, I certainly don't want to offend.

It has felt on HN and elsewhere that the prevailing attitude to copyright has been these two, somewhat contradictory, things. That's what I was trying to highlight with my phrasing of "we", which was also not meant to include myself but be a nod to the way a vocal group try to steer and dominate the conversion.

Both debates are important to have, I don't know the answers.

replies(2): >>safety+AJ >>accoun+tr3
◧◩
41. circui+Qq[view] [source] [discussion] 2023-05-10 14:56:15
>>IanCal+ze
Yes, I feel like this person only read the title and not the text of the post and made an assumption
replies(1): >>benatk+GB
42. hoofhe+rr[view] [source] 2023-05-10 14:58:39
>>samwil+(OP)
I don’t really agree with your sentiment.

Robots.txt have served the simple purpose of directing bots like Google to the different parts of your website since the beginning of internet time.

They still serve the same purpose, they tell bots where to go, and most importantly, they tell bots how to find your site map.

Robots.txt is not there to prevent malicious crawlers from accessing pages as you have suggested.

The robots.txt file acts simply like a garden gate. The good and honest people will honor the gate, while the more malicious might ignore it and hop the fence or something.

43. mort96+vr[view] [source] 2023-05-10 14:59:20
>>samwil+(OP)
Do you think there's a space between "you will never ever get to do anything at all with popular media until at least a hundred years after you're dead" and "anyone and any company can do anything they with everything I produce as long as it goes through an LLM"? Is it really so hard to think people may be against both of those extremes?

There's a phrase I like which describes what you're doing. It's "vaguely gesturing at imagined hypocrisy".

◧◩
44. lances+rs[view] [source] [discussion] 2023-05-10 15:03:21
>>safety+fg
> Don't recruit me into some bullshit argument that rewrites history and entrenches Disney's ill-gotten monopoly.

You don't think it's them being allowed to buy Marvel, Pixar, Lucasfilm? Is creativity ruined because I can't make a Mickey Mouse cartoon or t-shirt? Does the world need Luke Skywalker coming from any individual studio?

People are free to make the Little Mermaid, Beauty and the Beast, Hunchback of Notre Dame, Aladdin, etc. and there's nothing out there that stops them.

I've got no love for giant corporations but I see it a lot less about copyright than massive corporation gobbling up more corporations. There's no shortage of creativity out there if you look for it.

replies(7): >>ramses+ry >>placat+Jz >>safety+1B >>always+ZH >>Taywee+iJ >>butter+FK >>8note+6e2
◧◩◪
45. michae+It[view] [source] [discussion] 2023-05-10 15:08:53
>>usrusr+Vc
This seems to be assuming a very different purpose for ai.txt than the OP proposed. It sounds like they are intending ai.txt to give useful contextual information to crawlers collecting AI training data. Robots.txt does not have any of this information (although I suppose you could include it in comments).
◧◩
46. xhkkff+5u[view] [source] [discussion] 2023-05-10 15:10:10
>>safety+fg
Why does Disney have an "ill-gotten" monopoly? The people who worked for the company created something. Why shouldn't they get to control how it's used. Do you feel like you should have control over what you create? Why not others?
replies(2): >>sowbug+UF >>Taywee+BL
◧◩
47. dclowd+Ev[view] [source] [discussion] 2023-05-10 15:16:25
>>safety+fg
> Reducing copyright term extension has everything to do with restoring competition and creativity to our economy

Can you explain your line of thinking here? How does the ability to use another company’s intellectual property restore creativity? It just seems like a path to allow bootlegging.

replies(3): >>safety+UH >>Taywee+uK >>CWuest+0Q
48. shadow+Fv[view] [source] 2023-05-10 15:16:34
>>samwil+(OP)
I disagree it has failed as a system. While it does not substitute for authentication / authorization, reputable crawlers respect it, and there'd be a lot more traffic load on sites if they didn't have a way to tell reputable crawlers "please stop."

Similarly, extending robots.txt to direct AI would have a similar effect: not sufficient, but useful (if for no other reason than to make it easy to distinguish reputable AI projects from ones that feel like they own the Internet to do with as they please).

49. FpUser+Qw[view] [source] 2023-05-10 15:20:53
>>samwil+(OP)
>"All a robots.txt is is a polite request to please follow the rules in it, there is no "legal" agreement to follow those rules, only a moral imperative"

Up until the point when some person / entity with the deep pockets will put a clear license / terms of use on their site that prohibits ignoring of robots.txt and would be willing to sue the ignorant.

◧◩◪
50. irobet+1y[view] [source] [discussion] 2023-05-10 15:24:57
>>usrusr+Vc
worse, ai.txt could become an adversarial vector for attempts to trick the AI into filing your information under some semantic concept
◧◩◪
51. ramses+ry[view] [source] [discussion] 2023-05-10 15:26:45
>>lances+rs
I was "doing the analysis" w/ Toy Story not long ago. They basically invented Woody/Buzz out of whole cloth, "guilty of being an incredibly lovable toy by association" (with other incredibly lovable toys). As I'm watching Toy Story with my kid, and seeing classic toys (eg: Mousetrap in the background), all the "friends" are legit copyright classics from other companies, but Buzz and Woody are "Disney/Pixar Exclusives" and nobody else can include them. A clever mechanism that seems to have paid off over two modern generations to guarantee they can "craft" a new copyrighted character at any moment (Buzz 2.0, Space Cowboy 9000, whatever...).
replies(1): >>dcow+bS1
◧◩◪
52. placat+Jz[view] [source] [discussion] 2023-05-10 15:32:26
>>lances+rs
Both things can, and I think are, true. I see it as reduced competition in both cases, corporate consolidation making companies huge and large copyright timelines.

The long timelines stifle new creative works by keeping other, especially smaller, outfits having to make sure they don't accidentally run afoul of another copyright from decades ago. This needs capital to either be proactive in searching or to defend a suit that's brought.

Here's a recent article about the battle between the copyright holders of Let's Get It On and Ed Sheeran for Thinking Out Loud. Those two songs are separated by around 40 years. https://www.theguardian.com/music/2023/may/07/ed-sheeran-cop...

◧◩◪
53. ejb999+eA[view] [source] [discussion] 2023-05-10 15:34:21
>>ramble+Jn
but is it much different that descendants living off the interest and dividends of some large sum of money that their great-great grandparents accumulated a few hundred years ago?

To me it is pretty much the same thing - not a fan of nepo-kids living off of trust funds they didn't earn - but if you are going to fix one problem, you should try to fix all of the almost identical ones at the same time and not get upset that disney is still making money off of something they created 100 years ago, and not be upset about kennedy's, rockefellers, and the like still living of the money their great-greats generated a hundred years ago.

replies(1): >>readbe+6P
◧◩◪
54. safety+1B[view] [source] [discussion] 2023-05-10 15:37:58
>>lances+rs
The three acquisitions you mentioned all took place many years after the Copyright Term Extension Act of 1998. Without the financial benefits conferred by that law (the timing and content of which benefited them more than it did their competitors), they might not have made all of those acquisitions.

A lot of people in this thread seem to be undervaluing those old school Disney characters, yes now Disney is huge and has a much larger portfolio of IP, but in 1998 they were a far bigger percentage of Disney's portfolio than they are now.

You're not wrong that consolidation is a problem. My point is that Congress changed the law in a way that helped Disney and at least partially enabled that consolidation. (In fact, it's fairly rare to come across a monopoly or any heavily entrenched corporation that isn't enabled in some way by government collusion.)

If you shoot someone, take all his money, then build a business with it, you're still a murderer. (Just now you're a rich murderer.)

replies(3): >>guhcam+BQ >>TylerE+f62 >>fuzzfa+WE2
◧◩◪
55. csalle+uB[view] [source] [discussion] 2023-05-10 15:40:25
>>soperj+Gp
Same. The very nature of information is that it yearns to be free. Information cannot be "owned." The point of copyright should be to grant temporary monopolies to encourage creation, not to confer ownership.

Thomas Jefferson put it beautifully:

If nature has made any one thing less susceptible than all others of exclusive property, it is the action of the thinking power called an idea, which an individual may exclusively possess as long as he keeps it to himself; but the moment it is divulged, it forces itself into the possession of every one, and the receiver cannot dispossess himself of it. Its peculiar character, too, is that no one possesses the less, because every other possesses the whole of it. He who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me. That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density in any point, and like the air in which we breathe, move, and have our physical being, incapable of confinement or exclusive appropriation. Inventions then cannot, in nature, be a subject of property.

replies(4): >>fsckbo+OH >>vinayp+cY >>JumpCr+Gf1 >>btilly+Ch1
◧◩◪
56. benatk+GB[view] [source] [discussion] 2023-05-10 15:41:01
>>circui+Qq
Not an assumption. Just a response to the title of the article.
replies(1): >>circui+bH
◧◩◪
57. safety+YC[view] [source] [discussion] 2023-05-10 15:46:45
>>soperj+Gp
I would settle for 14 + 14 too :)
replies(1): >>jasonm+MI4
58. hidden+KD[view] [source] 2023-05-10 15:50:14
>>samwil+(OP)
The point of robots.txt is to inform well behaved scrapers about how to behave. It is not designed nor intended to prevent bad actors.

Which is good design: don't pretend to solve problems you can't.

59. drbawb+xE[view] [source] 2023-05-10 15:53:41
>>samwil+(OP)
It's not actually contradictory at all when you consider the root of the issue is about power asymmetry between individual creators and the corporations. Copyright terms were lobbied for, and primarily benefit the large corporations. They're symbolic of corporate overreach, that's why they're unpopular.

Meanwhile, now that the laws are inconvenient for them, tech companies are straight up ignoring labeling their training data to respect IP law. Labeling the data would be expensive, thereby eroding profits. The loss of usable data would also harm the efficacy of their models, and the time spent classifying the data will hamper their iteration time.

The ideas are only dissonant if you are looking at the trees (copyright term, DMCA, right to repair, etc.) and not the forest: which is a class struggle between a few thousand billionaires versus the rest of humanity.

◧◩◪◨
60. kmoser+CF[view] [source] [discussion] 2023-05-10 15:57:03
>>jefftk+K8
Not only do we already have lots of ways of including structured metadata, but if you want to include directives about what should/shouldn't be scraped and by whom, we already have robots.txt.

In other words, there's no need to create an ai.txt when the robots.txt standard can just be extended.

◧◩◪
61. sowbug+UF[view] [source] [discussion] 2023-05-10 15:58:22
>>xhkkff+5u
Circular reasoning. If you assume your ideas are your own, and nobody else can benefit from them without your permission, then the point of your rhetorical questions follows. The reality is that IP laws are a grafting of property-like attributes onto something that absolutely isn't property.

Do I feel I should have control over what I create? I make hammers for a living. I sell them for $10. I don't expect any control over what people do with "my" hammers once I sell them. I don't even expect to stop my neighbor from buying one, teaching herself to build hammers, and then manufacturing and selling identical ones for $9. Do you?

(To anticipate the rest of this tired conversation, the temporary monopoly tradeoff ("securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries") is facially reasonable. But it's important to recognize that the "shouldn't" and "feel" in your questions are based on a very recent recharacterization of these temporary monopolies as "intellectual property," which is probably the most financially successful propaganda term ever devised. Start with "temporary monopoly" instead, and then the better rhetorical question for you to be asking is "when should Disney's temporary monopoly end?")

◧◩◪◨⬒
62. kmoser+4G[view] [source] [discussion] 2023-05-10 15:59:01
>>Xelyne+sb
If only there was an HTML tag that let you provide a concise description of the page content. Perhaps something like <meta name="description" content="This is an example of a meta description. This will often show up in search results.">
63. waffle+LG[view] [source] 2023-05-10 16:01:34
>>samwil+(OP)
I have the belief that models should be allowed to ingest everything, just as a human is allowed. We are not yet at the stage where AI is autonomous, they currently are designed to require human agency for input, human agency for evaluation of output, and finally human agency for the dissemination of select output. This last important stage is well understood in the field of photography, but currently ignored in AI stewardship dialogues. Ultimately, it is the responsibility of the human agent who selects AI information products to determine its legality and appropriateness, just as if they had snapped a photograph and are wrestling with the decision whether or not it should be distributed in a particular medium. It takes a fairly selfish consciousness to become obsessed with the desire to prevent AI models access to information and disregard the collective benefits of rich information availability to training.
replies(1): >>musTY8+5e2
◧◩◪◨
64. circui+bH[view] [source] [discussion] 2023-05-10 16:03:03
>>benatk+GB
They said "Using robots.txt as a model for anything doesn't work." but it does work for the case described in the text
replies(1): >>benatk+Mq1
◧◩
65. pachic+gH[view] [source] [discussion] 2023-05-10 16:03:25
>>safety+fg
Unrelated but "offensive" is not necessarily bad.

We should accept that people can get offended by anything and, because of this, just demote the concept.

◧◩◪◨⬒
66. gavinh+nH[view] [source] [discussion] 2023-05-10 16:03:42
>>alphan+Eg
Yes, you can. You can do that by making the monetary value of the idea zero when it used to be non-zero.
replies(1): >>alphan+312
67. zitter+qH[view] [source] 2023-05-10 16:03:51
>>samwil+(OP)
Considering the state of ai right now you probably are better off deleting your robots.txt file
◧◩◪◨
68. fsckbo+OH[view] [source] [discussion] 2023-05-10 16:05:11
>>csalle+uB
but copyright is not for information or ideas, information and ideas cannot be copyrighted; it's for creative expression
replies(2): >>renlo+oR >>accoun+Mm3
◧◩◪
69. safety+UH[view] [source] [discussion] 2023-05-10 16:05:43
>>dclowd+Ev
Glad you asked! So copyright is a limited, temporary monopoly on a work. You create a work, the law grants you the exclusive rights to that work, for a time. Because of this monopoly the vast majority of the benefit from that work accrues to you, including financially. (All pretty fair in my opinion, you did the work, you deserve the reward!)

If let's say Star Wars falls out of copyright tomorrow, economically that has two effects. One, Disney loses a ton of future revenue. Two, countless Disney other people create derivatives of Star Wars, and they make money from those. Competition is increased.

So the expiration of a copyright results in a sharing of the wealth. The wealth generating potential along with the creative potential is passed along to all members of society. Our culture becomes richer and deeper. A great example of this is all the works that build on the mythos created by HP Lovecraft, one of the last great ones created before Congress started indefinitely extending copyright. Lovecraft wrote great literature and some of the authors that built on his world are fantastic as well, I'm sure they've come up with countless ideas he never considered. But as long as Congress keeps on extending copyright, nothing we create today will ever become like that.

There is of course an important question about what is fair and how long a copyright should last. Most people these days agree that it should last for at least the author's lifetime, maybe long enough to benefit their kids and grandkids as well. But the status quo is basically permanent copyright which prevents substantial creative and economic benefits to society.

replies(1): >>mathqu+vl1
◧◩◪
70. always+ZH[view] [source] [discussion] 2023-05-10 16:06:17
>>lances+rs
Concentration is absolutely a problem, but the second point undermines the first. The world is more interesting because anyone can adapt old stories like The Little Mermaid however they want. How could it not be even richer if the same applied to newer creations like Bugs Bunny?
◧◩◪◨⬒
71. gavinh+8I[view] [source] [discussion] 2023-05-10 16:06:51
>>rhn_mk+nn
"AI" changes things by making it even harder for individuals to defend against.

Right now, we have FOSS organizations that will help you in lawsuits against companies that don't follow licenses. With "AI" in the picture, companies can launder your code with "plausible" deniability. [1]

[1]: https://matthewbutterick.com/chron/will-ai-obliterate-the-ru...

replies(1): >>rhn_mk+lb3
◧◩
72. LawTal+cJ[view] [source] [discussion] 2023-05-10 16:11:49
>>FinnKu+Ja
This is how it is a failed system. It doesn't really do what most people who have one think it does and yet it gets rolled into law and everyone now has to deal with it constantly while still not fixing the original issue.

It's like the EU doesn't understand that bad law has a negative value.

◧◩◪
73. Taywee+iJ[view] [source] [discussion] 2023-05-10 16:11:59
>>lances+rs
The copyright system is what has enabled so few companies (and one giant corporation in particular) to become the owners, controllers, and beneficiaries of the vast majority of American fiction and culture. From visual media companies, record companies, and publishers, you can probably distill ownership of more than 90% of the culture that the average American lives in to fewer than 20 companies.

Copyright has been the most powerful tool in any media company's toolbox when it comes to consolidating power and IP and rolling into a larger and larger ball of what we call culture.

replies(1): >>roboca+9O1
◧◩◪
74. safety+AJ[view] [source] [discussion] 2023-05-10 16:13:19
>>samwil+tq
Thank you! I think the average HN'er is frankly pretty ignorant about how copyright law works, the history around it, and the arguments for and against various reforms. In fairness it's an esoteric topic and most software developers depend in some way on copyrighted work for their income so that's not a huge surprise I guess. But it probably explains the contradiction you observed!

The #1 issue with copyright today in my opinion is that if we keep on extending it forever, it will forever entrench the wealth and power of a small number of companies that hold the largest portfolios of IP. I think this is also a huge issue for AI, maybe the biggest issue, because at the end of the day an AI is really just another copyrighted work. It is not the anthropomorphized thing that countless people are acting like it is, it's a work. Change copyright and you change the nature of future AI works.

replies(1): >>pixl97+X51
◧◩◪
75. Taywee+uK[view] [source] [discussion] 2023-05-10 16:17:29
>>dclowd+Ev
Is it more creative to use your own company's IP? Most of these copyrighted and trademarked stories and characters are being made by people who didn't come up with them anyway, so what's the difference in creativity whether you happen to work at the company that owns the IP or not?

With long copyright terms, it encourages copyright holders to milk a single work for the length of the copyright (90+ years) and therefore discourages the creation of something new. It also encourages people to obtain copyrights to leverage them for profit, rather than making anything at all. A child of an artist can spend their entire life supported by their parent's copyright, and never has to make anything unique for as long as they live.

How is any of this good for creativity?

◧◩◪
76. butter+FK[view] [source] [discussion] 2023-05-10 16:18:29
>>lances+rs
"No one can do to the Disney Corporation what Walt Disney did to the Brothers Grimm." L. Lessig
replies(1): >>kemote+vP
◧◩◪
77. Taywee+BL[view] [source] [discussion] 2023-05-10 16:22:21
>>xhkkff+5u
The people who worked for the company created something.

The people who work for the company collect rent on things they didn't make.

◧◩◪◨
78. readbe+6P[view] [source] [discussion] 2023-05-10 16:38:40
>>ejb999+eA
It would be similar if "intellectual property" was property in the same sense in which a table or a vast amount of money is property. However, it is not.

Normal property ownership is something we use to manage scarcity that already exists—that there is only one of something, and we have to decide where it will go and who will be able to decide how it is used. Intellectual property, by contrast, creates artificial scarcity by means of a government-enforced monopoly (in the case of copyright, the monopoly is on the right to produce a copy of a work).

It is unfortunate (and perhaps not accidental) that we settled on the term "intellectual property" as opposed to something more descriptive like "intellectual monopoly." "Intellectual property" encourages equivocating such monopolies with normal property, a mistake that tends to muddle debates on the subject.

79. bachme+tP[view] [source] 2023-05-10 16:40:30
>>samwil+(OP)
> All a robots.txt is is a polite request to please follow the rules in it, there is no "legal" agreement to follow those rules, only a moral imperative.

I don't know that this is true for the US. As far back as I can remember, there have been questions about whether a robots.txt file means you don't have permission to engage in those activities. The CFAA is one law that has repeatedly come up. See for example https://www.natlawreview.com/article/doj-revises-policy-cfaa...

It might be the case that there is nothing there legally, but I don't think I'd describe the actions of search engines as being driven by a moral imperative.

◧◩◪◨
80. kemote+vP[view] [source] [discussion] 2023-05-10 16:40:36
>>butter+FK
I’m no fan of the life of the author plus 70 years copyright regime we have today but the Brothers Grimm died roughly 70 years before Disney started making cartoons.

So that is still something possible to do in roughly 20 years.

◧◩◪
81. CWuest+0Q[view] [source] [discussion] 2023-05-10 16:42:42
>>dclowd+Ev
Yesterday's conversation here about the Ed Sheeran lawsuit should explain much of this: https://news.ycombinator.com/item?id=35868421

Here's one key bit from the OP: - - - - -

But the lawsuits have been where he’s really highlighted the absurdity of modern copyright law. After winning one of the lawsuits a year ago, he put out a heartfelt statement on how ridiculous the whole thing was. A key part:

There’s only so many notes and very few chords used in pop music. Coincidence is bound to happen if 60,000 songs are being released every day on Spotify—that’s 22 million songs a year—and there’s only 12 notes that are available.

In the aftermath of this, Sheeran has said that he’s now filming all of his recent songwriting sessions, just in case he needs to provide evidence that he and his songwriting partners came up with a song on their own, which is depressing in its own right.

◧◩◪◨
82. guhcam+BQ[view] [source] [discussion] 2023-05-10 16:45:26
>>safety+1B
> A lot of people in this thread seem to be undervaluing those old school Disney characters

Right? There were even competitors back then. People all but forgot the Looney Tunes.

◧◩◪◨⬒
83. renlo+oR[view] [source] [discussion] 2023-05-10 16:49:00
>>fsckbo+OH
and why should "creative expression" be owned?
replies(3): >>Jarwai+AZ >>majorm+Vb1 >>jrajav+3e1
◧◩◪
84. lelant+sS[view] [source] [discussion] 2023-05-10 16:53:27
>>bileka+lb
The point of an ai.txt is that it signals intention of the copyright holder.

Anytime a business is caught using that content, they can't claim that they used publicly available information, because the ai.txt specifically signalled to everyone in a clear and unambiguous manner that the copyright granted by viewing the page is witheld from ai training.

85. Macha+WV[view] [source] 2023-05-10 17:08:07
>>samwil+(OP)
> The thing I somewhat struggle with is that after 20-30 years of calls for shorter copyright terms, lesser restrictions on content you access publicly, and what you can do with it, we are now in the situation where the arguments are quickly leaning the other way. "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

Are these mutually exclusive? If you couldn't make Avengers movie Thanos memes but all the 90s X-Men and Spiderman content was a free for all, I think a lot of people would take that trade off.

86. omoika+EW[view] [source] 2023-05-10 17:10:29
>>samwil+(OP)
There is no legal agreement to follow robots.txt, but it appears to have came up a few times (from the first search result for "court cases involving robots.txt"):

https://www.robotstxt.org/faq/legal.html

If an "ai.txt" were to exist, I hope it's a signal for opt-in rather than opt-out. Whereas "robots.txt" being an explicit signal for opt-out might be useful because people who build public websites generally want their websites to be discovered, it seemed unlikely that training unknown AI would be a use case that content creators had in mind, considering that most existing content predates current AI systems.

87. balaji+GW[view] [source] 2023-05-10 17:10:44
>>samwil+(OP)
With search engines and other crawlers, there wasn't easy ways to monetize "copyright theft" at scale. Google, which had the biggest share of eyeballs, was much more equitable in sharing revenue to content producers (who wanted to monetize). And Google was probably more just in taking action against copyright theft.

Individual high value IP was always much less accessible (not available as a webpage on the internet). Gen AI/LLMs with the internet scale data is too powerful and maybe easier to monetize.

◧◩◪
88. aeturn+yX[view] [source] [discussion] 2023-05-10 17:14:54
>>soperj+Gp
My biggest critique of copyright is that is unnecessarily collapses financial reward & creative control. It also pegs both as starting at creation - which is not a particularly meaningful point for either problem.

IMO I would rather a structure that:

- Guarantees creators (and their descendants) some number of years of financial benefit / veto (30 seems fine!) - i.e. pay me what I want or you can't use this creative work.

- Separately grant creators the ability to veto "official" projects that use their creative output in their lifetimes.

IMO, it seems like there's a productive "middle ground" between total control and anything goes. After the 30 year benefit expired, you couldn't sue for damages - just costs & to stop use.

replies(2): >>quirko+RZ >>pauldd+Mm1
◧◩◪◨
89. vinayp+cY[view] [source] [discussion] 2023-05-10 17:17:36
>>csalle+uB
> The very nature of information is that it yearns to be free.

Information wants you to stop anthropomorphizing it.

replies(2): >>sdiupI+w52 >>bohlen+cz3
◧◩◪◨⬒⬓
90. Jarwai+AZ[view] [source] [discussion] 2023-05-10 17:23:20
>>renlo+oR
It should be owned as long as people must rely on ownership to survive in our society.
◧◩◪◨
91. quirko+RZ[view] [source] [discussion] 2023-05-10 17:24:43
>>aeturn+yX
I've heard of a structure in France that's translated as "moral rights" of a work. I met a guy who was the moral rights holder for a deceased author and had the right to veto large and small elements of the representation of the characters, but received no royalties from the works.
◧◩
92. spc476+I41[view] [source] [discussion] 2023-05-10 17:47:19
>>shaneb+81
It might be a better idea to serve up a 418 ("I'm a tea pot") with a line line text file saying "I'm not an HTTP server". That solved a problem I had with bots making HTTP requests to my gopher server [1]. Serving up a 503 informs the bot that there's a server issue and it may try again later. A 418 informs the bot that it made an erroneous request and such an odd error code might get someone to look into it and stop.

[1] https://boston.conman.org/2019/09/30.2

replies(1): >>shaneb+8x7
◧◩◪◨
93. pixl97+X51[view] [source] [discussion] 2023-05-10 17:52:40
>>safety+AJ
There is contradiction because, in fact, the HN audience is more than one person and those people have different and conflicting views.
94. User23+791[view] [source] 2023-05-10 18:06:39
>>samwil+(OP)
> there is no "legal" agreement to follow those rules

Yes there certainly is[1]. The robots.txt clearly specifies authorized use and violating it exceeds that authorization. Now granted good luck getting the FBI to doorkick their friends at Google and other politically connected tech companies, but as the law is written crawlers need to honor the site owner's robots.txt.

[1] https://en.wikipedia.org/wiki/Computer_Fraud_and_Abuse_Act

◧◩◪
95. prepen+ca1[view] [source] [discussion] 2023-05-10 18:10:11
>>JohnFe+65
I think people who don’t want their content contributing to AI shouldn’t have it on the public web.

There are many ways to restrict access. Use one of them. But if you respond to an anonymous http request with content then it shouldn’t matter if it’s a robot looking at it or a human (or a man or a woman or whatever).

I think this both for simplicity and that I foresee a future where human consciousness is simulated and basically an AI. I don’t want to have rules that biological humans can view and digital humans can’t.

replies(1): >>JohnFe+sZ3
◧◩
96. majorm+qb1[view] [source] [discussion] 2023-05-10 18:15:39
>>safety+fg
> This gross generalization of other people's views on important issues is really offensive.

The "we" that has been calling for shorter terms is no more a gross generalization than the "we" that is calling for more protection against AI use of stuff.

The world outside of HN-and-similar has been much less anti-copyright than the world in here. More "neutral" seems to be dominant - we're not extending it anymore; we're not shrinking it either. And currently generally more panicked about AI taking away their jobs and rendering their skills and creativity useless.

The original post was a very fair summary of how there are now two ground-level movements competing that there weren't two years ago.

◧◩◪◨⬒⬓
97. majorm+Vb1[view] [source] [discussion] 2023-05-10 18:17:53
>>renlo+oR
Why should land be owned? None of us created the planet...

But we have selected an economic system that depends on ownership to drive exchange in a market, so... that's why.

replies(4): >>renlo+tc1 >>tick_t+GA1 >>asdkjl+gb2 >>marssa+ZF2
◧◩◪◨⬒⬓⬔
98. renlo+tc1[view] [source] [discussion] 2023-05-10 18:20:38
>>majorm+Vb1
I'd argue that land is owned because it's a finite resource, and that without property ownership people would be in conflict with one another. "Creative expression" is not finite, in fact every human possesses it, it's also intangible, it's ideas, thoughts, ... , which I personally do not believe should be owned.
replies(3): >>wwwest+7p1 >>8note+Pd2 >>jasonm+RH4
99. quenix+od1[view] [source] 2023-05-10 18:24:47
>>samwil+(OP)
I think you fundamentally misunderstood the OP's point. They're not trying to use their ai.txt as any sort of deterrent, legal or otherwise.

They are trying to use it as a form of extended metadata for training AIs. Essentially, "ah I see you're training using my website! Here's some extra info about it: [...]"

◧◩◪◨⬒⬓
100. jrajav+3e1[view] [source] [discussion] 2023-05-10 18:27:02
>>renlo+oR
Because creative expression can be exchanged for goods and services? Why should metal, wood, or special paper notes be owned? It's to represent work done and value to other people.
replies(1): >>beefie+Xt1
◧◩◪◨
101. JumpCr+Gf1[view] [source] [discussion] 2023-05-10 18:33:08
>>csalle+uB
> very nature of information is that it yearns to be free. Information cannot be "owned."

The nature of information is to dissolve into entropy.

replies(3): >>dotanc+wO1 >>sdiupI+Y52 >>accoun+om3
◧◩◪◨
102. btilly+Ch1[view] [source] [discussion] 2023-05-10 18:41:41
>>csalle+uB
Not merely the point of copyright, but also the basis for copyright in the USA.

Specifically all forms of intellectual property in the USA trace back to Article I Section 8, Clause 8 of the Constitution. Which gives Congress the power, "To promote the progress of science and useful arts, by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries".

replies(1): >>musica+vC2
103. yafbum+Mj1[view] [source] 2023-05-10 18:51:19
>>samwil+(OP)
robots.txt is a successful coordination mechanism between website operators and crawlers. It is not in any way a security mechanism meant to address adversarial situations, as you seem to make it out to be.
◧◩◪◨
104. mathqu+vl1[view] [source] [discussion] 2023-05-10 18:57:59
>>safety+UH
> If let's say Star Wars falls out of copyright tomorrow, economically that has two effects. One, Disney loses a ton of future revenue. Two, countless Disney other people create derivatives of Star Wars, and they make money from those. Competition is increased.

Three, the derivatives are made and Disney starts marketing "Disney's Star Wars" which continue to be the high-demand (and high-value) versions. The situation is unchanged.

For example, you can currently buy The Little Mermaid in non-Disney form[1], but Disney's version is what most people want.

[1] - https://www.amazon.com/s?k=little+mermaid+Hans+Christian+And...

◧◩◪◨
105. pauldd+Mm1[view] [source] [discussion] 2023-05-10 19:03:28
>>aeturn+yX
> After the 30 year benefit expired, you couldn't sue for damages - just costs & to stop use.

That's the same thing.

No one can use my stuff..........(unless you pay me royalties).

replies(1): >>aeturn+QT1
◧◩◪◨⬒⬓⬔⧯
106. wwwest+7p1[view] [source] [discussion] 2023-05-10 19:11:22
>>renlo+tc1
> "Creative expression" is not finite

It absolutely is.

Doing it at all requires time & attentive focus, which is a finite resource for anybody mortal, and moreover a resource that's scarce and has to be spent in multiple places.

Doing it well requires significant investment in practice and training, often years of it, maybe even decades in order to develop certain levels of expressive fluency.

As with any issue of scarcity, economics comes in. If you want this activity supported, one good way of doing it is enabling the investment of time. Copyright does this by giving people an economic/legal claim on how copies of their work are distributed.

Paying for copies has the usual market merits -- the economic reward and signals of value are proportional to copies acquired. There are other ways of course, common ones brought up here are patronage and merchandising, but they lose the market merits, and both are basically another way of saying "nobody should have to pay for the value in your work directly," and merchandising is even worse in that it's basically saying "yeah, you'll just need another job to support yourself while you're doing this thing", which is time taken away from investment in the creative endeavor, so you'll get less of the actual endeavor.

replies(2): >>onlypo+z02 >>musica+bE2
◧◩◪◨⬒
107. benatk+Mq1[view] [source] [discussion] 2023-05-10 19:18:36
>>circui+bH
What works? This is an idea, not anything that's been throughly tried.
108. wwwest+Ht1[view] [source] 2023-05-10 19:31:37
>>samwil+(OP)
> The thing I somewhat struggle with is that after 20-30 years of calls for shorter copyright terms, lesser restrictions on content you access publicly, and what you can do with it, we are now in the situation where the arguments are quickly leaning the other way.

There've always been solid human arguments for sustaining copyright legally. The balance is the tricky part.

On one hand we had a period where terms got too long, and some of the really aggressive legal enforcement from 20 years ago before stakeholders actually figured out how to get into digital markets were was entitled and useless. The pendulum also swung the other way with things like buffet streaming services essentially offering an economic bargain for creators with a sliver of compensatory difference from piracy but with none of piracy's actual benefits (people who simply pirate know they're not participating in a relationship of economic support with creators and might be persuaded to, someone who uses Spotify is under the illusion there's something fully legit on that front).

But the fundamental copyright bargain -- creators can recoup investments of time and effort in proportion to how popular engagement with their work is -- has always made sense.

> "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

Both these things can be true:

(1) Using a work as training data for AI is a very novel use, it's entirely plausible there should be novel considerations and rights to go with it.

(2) The incentive & benefits of copyrights have diminishing returns the longer the horizons are, while the cost in terms of social inaccessibility only increase. Where that's balanced out precisely is a debatable question, but something longer than a human lifespan is probably on the wrong side.

◧◩◪◨⬒⬓⬔
109. beefie+Xt1[view] [source] [discussion] 2023-05-10 19:32:35
>>jrajav+3e1
Nope. Metal & wood etc should be owned because it very much looks like that is very useful in creating lots of welfare for people.

The trouble with IP is that there are lots of influential people that very much would like IP to be useful in creating welfare. Unfortunately the evidence for that is surprisingly scarce. For discussion, see e.g. Boldrin & Levine

◧◩◪◨⬒⬓⬔
110. tick_t+GA1[view] [source] [discussion] 2023-05-10 20:07:37
>>majorm+Vb1
Where do you get to actually own own land like you do copyright? Maybe we can add property taxes to copyright to force people to give it up just like land.
replies(1): >>majorm+XR1
◧◩◪
111. joshua+sB1[view] [source] [discussion] 2023-05-10 20:11:28
>>usrusr+Vc
I do think that robots.txt is pretty useful. If I want my content indexed, I can help the engine find my content. If indexing my content is counterproductive, then I can ask that it be skipped. So it helps the align my interests with the search engine; I can expose my content or I can help the engine avoid wasting resources indexing something that I don't want it to see.

It would also be useful to distinguish training crawlers from indexing crawlers. Maybe I'm publishing personal content. It's useful for me to have it indexed for search, but I don't want an AI to be able to simulate me or my style.

◧◩◪◨
112. roboca+9O1[view] [source] [discussion] 2023-05-10 21:12:06
>>Taywee+iJ
That IP is sold overseas, so the USA has pushed very very hard to have copyright extended on other countries, presumably because it is a huge financial benefit to the USA (and indirectly to its citizens). Copyright extension is a non-negotiable item in a number of international agreements.
replies(1): >>accoun+Wt3
◧◩◪◨⬒
113. dotanc+wO1[view] [source] [discussion] 2023-05-10 21:14:11
>>JumpCr+Gf1
You bought a Western Digital drive too, eh?
replies(2): >>debo_+Ax2 >>bohlen+rz3
◧◩◪◨⬒⬓⬔⧯
114. majorm+XR1[view] [source] [discussion] 2023-05-10 21:33:34
>>tick_t+GA1
Taxing IP is an interesting idea; we do tax income from it, but I think an increasing scale of "you have to pay more to keep this for longer" would be pretty reasonable.
◧◩◪◨
115. dcow+bS1[view] [source] [discussion] 2023-05-10 21:35:09
>>ramses+ry
Wow I never noticed this. Thanks for sharing!
◧◩◪◨⬒
116. aeturn+QT1[view] [source] [discussion] 2023-05-10 21:45:13
>>pauldd+Mm1
Look - it's absolutely not the same thing. The point is to allow non-profit-seeking uses first. To push off the free-for-all of commercialization until after an interstitial period.

You can certainly pay the rights holder to use their property! Still! You could do it even without copyright I suppose. However, I think a space where it costs time and money for the rights holder to try to stop use and they won't get paid for it is super useful.

Consider this in the case of software as well - you get ~30 years of benefit from your work, but you can refuse to allow companies to incorporate it into their products as long as you live. Whichever companies you want! You can also not do that.

◧◩◪◨⬒⬓⬔⧯▣
117. onlypo+z02[view] [source] [discussion] 2023-05-10 22:29:44
>>wwwest+7p1
I think you're confusing creation and expression.

Expression has no value in today's digital world.

Creation has value but using expression to exchange for that value is difficult, requiring limits on expression in order for the system to work.

◧◩◪◨⬒⬓
118. alphan+312[view] [source] [discussion] 2023-05-10 22:33:51
>>gavinh+nH
Stealing requires physical force, whether that be on the property or person. By your logic, independently discovering the idea is also stealing.
replies(1): >>gavinh+U32
◧◩◪◨⬒⬓⬔
119. gavinh+U32[view] [source] [discussion] 2023-05-10 22:53:20
>>alphan+312
Stealing does not require physical force.
replies(1): >>alphan+U72
◧◩◪◨⬒
120. sdiupI+w52[view] [source] [discussion] 2023-05-10 23:02:43
>>vinayp+cY
I've said before that it gets tiring playing word games to avoid the suggestion that certain natural pressures have personal agency. Information wants to be free like a rock wants to roll downhill.
◧◩◪◨⬒
121. sdiupI+Y52[view] [source] [discussion] 2023-05-10 23:04:58
>>JumpCr+Gf1
The nature of the universe is to tend toward heat death. Meanwhile, here and now, the nature of information is to either reproduce (to be free) _or else_ dissolve into entropy.
◧◩◪◨
122. TylerE+f62[view] [source] [discussion] 2023-05-10 23:06:13
>>safety+1B
They also tend to omit that at the time of the buyout Marvel was at a sales (and in many ways creative) nadir. It was hardly a media juggernaut. They actually went bankrupt in the late 90s, and it was only Carl Icahn buying much of their outstanding debt for Pennie’s on the dollar (and then firing basically all of the then-current board) that kept them from going under totally.
◧◩◪◨⬒⬓⬔⧯
123. alphan+U72[view] [source] [discussion] 2023-05-10 23:14:10
>>gavinh+U32
So again, I guess you’re stealing from me by coming up with the same idea that I had.
replies(1): >>gavinh+0t2
◧◩◪◨⬒⬓⬔
124. asdkjl+gb2[view] [source] [discussion] 2023-05-10 23:29:58
>>majorm+Vb1
> Why should land be owned?

it shouldn't. Or, well, it should, but it should be the one and only thing taxed: https://en.wikipedia.org/wiki/Georgism

replies(1): >>Tagber+AH2
◧◩◪◨⬒⬓⬔⧯
125. 8note+Pd2[view] [source] [discussion] 2023-05-10 23:47:32
>>renlo+tc1
People are still in conflict with each other for property ownership, so it's not solved

The ownership, with heavy taxes on that ownership, pushes towards making sure people benefit from the land.

replies(1): >>justin+lp3
◧◩
126. musTY8+5e2[view] [source] [discussion] 2023-05-10 23:48:39
>>waffle+LG
I like your view. From my limited knowledge, I speculate that if AI was developed further, and given as much publicly available data across a variety of scholarly topics as reasonably possible, it could potentially use statistics and such to help us find correlations across many different fields that a human would never think of. Whether it would revolutionize analytics or not I don't know as I am not qualified to say, but it's fun to dream of positive change these tools could bring.
◧◩◪
127. 8note+6e2[view] [source] [discussion] 2023-05-10 23:48:46
>>lances+rs
> People are free to make the Little Mermaid, Beauty and the Beast, Hunchback of Notre Dame, Aladdin, etc. and there's nothing out there that stops them.

IP law reasonably does. See: https://trademarks.justia.com/852/28/the-little-mermaid-8522...

128. dragon+Be2[view] [source] 2023-05-10 23:51:27
>>samwil+(OP)
> "We" now want stricter copyright law when it comes to AI, but at the same time shorter copyright duration...

I don’t know who “we” are, but I absolutely don't want “stricter copyright law when it comes to AI”. More clarity? Sure. Narrowing fair use? No fucking way.

◧◩
129. SergeA+Ni2[view] [source] [discussion] 2023-05-11 00:17:08
>>safety+fg
But in more general view Disney being an international media giant is good for US, isn't it?
◧◩◪◨⬒⬓⬔⧯▣
130. gavinh+0t2[view] [source] [discussion] 2023-05-11 01:30:54
>>alphan+U72
You're being disingenuous.

I would be stealing if I prevented you from making money from it.

replies(1): >>alphan+nB2
◧◩◪◨⬒⬓
131. debo_+Ax2[view] [source] [discussion] 2023-05-11 02:03:53
>>dotanc+wO1
Underrated comment!
◧◩◪◨⬒⬓⬔⧯▣▦
132. alphan+nB2[view] [source] [discussion] 2023-05-11 02:39:10
>>gavinh+0t2
By doing what exactly? Selling spades because you also figured out how to put a stone on a stick? If you believe such things should be illegal, do we agree that you don’t follow the force is only justified in response to force principle?
replies(1): >>gavinh+4K2
◧◩◪◨⬒
133. musica+vC2[view] [source] [discussion] 2023-05-11 02:48:23
>>btilly+Ch1
> limited times

Technically life + 70 years - or 1 million years for that matter - is "limited" - but I imagine 14+14 is probably closer to what they had in mind.

◧◩◪◨⬒⬓⬔⧯▣
134. musica+bE2[view] [source] [discussion] 2023-05-11 03:02:47
>>wwwest+7p1
I think the concept that PP may be trying to get across is scarcity:

"goods are scarce because there are not enough resources to produce all the goods that people want to consume".(quoted at [1])

Physical books are intrinsically scarce because they require physical resources to make and distribute copies. Libraries are often limited by physical shelf space.

Ebooks are not intrinsically scarce because there are enough resources to enable anyone on the internet to download any one of millions of ebooks at close to zero marginal cost, with minimal physical space requirements per book. Archive.org and Z-Library are examples of this.

Consider also free goods:

"Examples of free goods are ideas and works that are reproducible at zero cost, or almost zero cost. For example, if someone invents a new device, many people could copy this invention, with no danger of this "resource" running out."[2]

[1] https://en.wikipedia.org/wiki/Scarcity

[2] https://en.wikipedia.org/wiki/Free_good

replies(1): >>wwwest+rK2
◧◩◪◨
135. fuzzfa+WE2[view] [source] [discussion] 2023-05-11 03:08:12
>>safety+1B
What's needed regardless of copyright or patent terms, is a similar attitude to legal predation as there is to physical stalking and threatening.

i.e. enforce egregious IP violations while criminalizing trolls.

◧◩◪◨⬒⬓⬔
136. marssa+ZF2[view] [source] [discussion] 2023-05-11 03:19:14
>>majorm+Vb1
> But we have selected an economic system that depends on ownership to drive exchange in a market, so... that's why.

For extremely loose values of "we", perhaps - I didn't select it, and I would vote "no" if the idea were proposed...

◧◩◪◨⬒⬓⬔⧯
137. Tagber+AH2[view] [source] [discussion] 2023-05-11 03:32:50
>>asdkjl+gb2
I live in Washington state where the state taxes are mainly sales tax and property tax. Both end up being regressive. Sales tax is because it is not proportional to income. You might think that property tax would hit higher income people more but what happens is that the property tax makes homes more expensive for low income people and is also passed on to renters in their monthly rent.
◧◩◪◨⬒⬓⬔⧯▣▦▧
138. gavinh+4K2[view] [source] [discussion] 2023-05-11 03:52:25
>>alphan+nB2
https://en.wikipedia.org/wiki/Loss_leader

Then once smaller competitors are out of business, raise prices.

Of course, force can go into it, such as when a big company sues a smaller company with a frivolous lawsuit that the smaller company can't afford to fight. Then the smaller company goes out of business, and the big company can use their ideas free.

◧◩◪◨⬒⬓⬔⧯▣▦
139. wwwest+rK2[view] [source] [discussion] 2023-05-11 03:55:25
>>musica+bE2
> the concept that PP may be trying to get across is scarcity:

It's pretty mysterious that you think you need to introduce this to the conversation at this point given how prominently scarcity dynamics figure into the comment you're replying to.

> Physical books are intrinsically scarce

Once their production was industrialized with printing press tech, copies of books weren't scarce, they were actually revolutionarily cheap.

The copyright bargain isn't borne out of ignorance of how changes in that direction affect the overall dynamic, it's borne out of deep understanding of what remains scarce and risky and difficult to compensate for when the marginal cost of producing copies drops drastically, and what kind of claims might help.

replies(1): >>musica+vV2
◧◩◪◨⬒⬓⬔⧯▣▦▧
140. musica+vV2[view] [source] [discussion] 2023-05-11 05:19:54
>>wwwest+rK2
Actually I was replying to both of you (sadly not an obvious structural way to do that on HN), but perhaps I should have made it clearer that the "finite" concept PP was trying to get across actually seems to be scarcity - land is scarce, paper books less so - and intangible goods such as ebooks are not scarce at all (DRM attempts notwithstanding.)

Authorship may be scarce - costly and resource intensive (LLMs notwithstanding) as you describe, while copying and distribution of intangible goods like ideas or digital media is essentially free and unlimited, as I suspect PP was trying to say.

As you correctly note, the constitutional copyright bargain permits a limited time monopoly in return for (hopefully) advancing "the progress of science and the useful arts."

◧◩◪◨⬒⬓
141. rhn_mk+lb3[view] [source] [discussion] 2023-05-11 07:21:26
>>gavinh+8I
On the other hand, you can take some (closed?) code a company wrote, feed it into AI, and launder it for your purpose. While this is not a symmetric exchange, it does reduce the power of copyright for everyone.
replies(1): >>gavinh+K18
◧◩◪◨⬒
142. accoun+om3[view] [source] [discussion] 2023-05-11 08:55:30
>>JumpCr+Gf1
Information and entropy are more or less the same thing. Ask Shannon.
◧◩◪◨⬒
143. accoun+Mm3[view] [source] [discussion] 2023-05-11 08:58:09
>>fsckbo+OH
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
replies(1): >>jasonm+tH4
◧◩◪◨⬒⬓⬔⧯▣
144. justin+lp3[view] [source] [discussion] 2023-05-11 09:19:20
>>8note+Pd2
> people benefit from the land

"which people in particular are benefitting the most" seems to be the perennial question.

◧◩◪
145. accoun+tr3[view] [source] [discussion] 2023-05-11 09:41:24
>>samwil+tq
Trying to find consistency in the prevalent (or more commonly predominantly expressed) attitudes and opinions of groups is a common fallacy. You can have a group with a large number of members holding opinion A and nother large number of members holding opionion ¬A without any member being in both groups.

Of course in reallity things are usually more complex and wer are talking about two different opinions A and B that aren't even inherently incompatibly but just some motivations for A would lead to ¬B and vice versa.

But un this particular case I think the flaw is in your assumption that the majority wants stricter copyright law for AI rather than wants the same copyright law that humans are beholden to to also apply to AI, wether that law is the current may-as-well-be-perpetual-monopoly or 0 copyright or anything in between.

◧◩◪◨⬒
146. accoun+Wt3[view] [source] [discussion] 2023-05-11 10:01:24
>>roboca+9O1
This is really a big problem with copyright - most people don't even get to vote for or against it because whatever "democratic" laws there are are only formalizing trade agreements that would be too costly to violate that doing so is not even up for discussion.
◧◩◪◨⬒
147. bohlen+cz3[view] [source] [discussion] 2023-05-11 10:48:32
>>vinayp+cY
Oh wow, I like this one!
◧◩◪◨⬒⬓
148. bohlen+rz3[view] [source] [discussion] 2023-05-11 10:50:33
>>dotanc+wO1
Why doesn’t this platform offer a “like” button for answers?
replies(2): >>jasonm+gH4 >>dotanc+V75
◧◩◪◨
149. JohnFe+sZ3[view] [source] [discussion] 2023-05-11 13:46:07
>>prepen+ca1
> I think people who don’t want their content contributing to AI shouldn’t have it on the public web.

Practically speaking, that's the only effective solution. I just think that it's a shame that's necessary. It would be better for everyone if there wasn't a disincentive to making works publicly available.

> I don’t want to have rules that biological humans can view and digital humans can’t.

This is a point we disagree on.

And "digital humans"? I would argue that such a thing can't exist, if you mean "human" in any way other than rough analogy.

◧◩◪◨⬒⬓⬔
150. jasonm+gH4[view] [source] [discussion] 2023-05-11 16:41:10
>>bohlen+rz3
You can click the up arrowhead to up-vote.
◧◩◪◨⬒⬓
151. jasonm+tH4[view] [source] [discussion] 2023-05-11 16:42:14
>>accoun+Mm3
I wonder how many people think this is a string... and don't know about these magic numbers.
◧◩◪◨⬒⬓⬔⧯
152. jasonm+RH4[view] [source] [discussion] 2023-05-11 16:44:04
>>renlo+tc1
This idea is not without detractors.

"Property is theft" is not a new idea, makes a lot of sense. Unless you have a lot of it, and then those [censored] can [censored] right off.

◧◩◪◨
153. jasonm+MI4[view] [source] [discussion] 2023-05-11 16:47:56
>>safety+YC
You probably wouldn't if you were the owner of the Marvel franchise or other such cash cows.

Copyright that doesn't expire would make "a whole lot of cents".

(I agree with you but, the ownership is the corrupting factor.)

◧◩◪◨⬒⬓⬔
154. dotanc+V75[view] [source] [discussion] 2023-05-11 18:36:50
>>bohlen+rz3
Welcome, newcomer. As flattered as I am by your want to "like", my comment was not informative and even borderline trolling (by changing the subject). On HN, such comments are better to downvote. Please don't start upvoting comments that stray from the issue under discussion, funny as they might be.
◧◩◪
155. shaneb+8x7[view] [source] [discussion] 2023-05-12 13:01:36
>>spc476+I41
This is very interesting. I've bookmarked the link. Thanks for sharing. I believe minimal is best and this might fit nicely within my larger system. Do you approach other problems with a similar mindset?
◧◩◪◨⬒⬓⬔
156. gavinh+K18[view] [source] [discussion] 2023-05-12 15:22:09
>>rhn_mk+lb3
Sure, if you can get your hands on it and if the company doesn't sue you for doing so.
replies(1): >>rhn_mk+f39
◧◩◪◨⬒⬓⬔⧯
157. rhn_mk+f39[view] [source] [discussion] 2023-05-12 19:53:55
>>gavinh+K18
Check out what I wrote two posts above. If them suing you is a problem then you have trouble in your legal system regardless of copyright.
[go to top]