zlacker

[return to "The New York Times is suing OpenAI and Microsoft for copyright infringement"]
1. kbos87+Na[view] [source] 2023-12-27 15:03:43
>>ssgodd+(OP)
Solidly rooting for NYT on this - it’s felt like many creative organizations have been asleep at the wheel while their lunch gets eaten for a second time (the first being at the birth of modern search engines.)

I don’t necessarily fault OpenAI’s decision to initially train their models without entering into licensing agreements - they probably wouldn’t exist and the generative AI revolution may never have happened if they put the horse before the cart. I do think they should quickly course correct at this point and accept the fact that they clearly owe something to the creators of content they are consuming. If they don’t, they are setting themselves up for a bigger loss down the road and leaving the door open for a more established competitor (Google) to do it the right way.

◧◩
2. belter+kl[view] [source] 2023-12-27 16:03:40
>>kbos87+Na
For all the leaks on: Secret projects, novelty training algorithms not being published anymore so as to preserve market share, custom hardware, Q* learning, internal politics at companies at the forefront of state of the art LLMs...A thunderous silence is the lack of leaks, on the exact datasets used to train the main commercial LLMs.

It is clear OpenAI or Google did not use only Common Crawl. With so many press conferences why did no research journalist ask yet from OpenAI or Google to confirm or deny if they use or used LibGen?

Did OpenAI really bought an ebook of every publication from Cambridge Press, Oxford Press, Manning, APress, and so on? Did any of investors due diligence, include researching the legality of the content used for training?

◧◩◪
3. cogman+js[view] [source] 2023-12-27 16:41:41
>>belter+kl
We all remember when Aaron Swartz got hit with a wire tapping and intent to distribute federal crime for downloading JSTR stuff right?

It's really disgusting, IMO, that corporations that go above and beyond that sort of behavior are seeing NO federal investigations for this sort of behavior. Yet a private citizen does it and it's threats of life in prison.

This isn't new, but it speaks to a major hole in our legal system and the administration of it. The Feds are more than willing to steamroll an individual but will think twice over investigating a large corporation engaged in the same behavior.

◧◩◪◨
4. SideQu+yw[view] [source] 2023-12-27 17:04:34
>>cogman+js
Circumventing computer security to copy items en masse to distribute wholesale without transformation is a far cry from reading data on public facing web pages.
◧◩◪◨⬒
5. cogman+Cx[view] [source] 2023-12-27 17:10:42
>>SideQu+yw
He didn't circumvent computer security. He had had a right to use the MIT network and pull the JSTR information. He certainly did it in a shady way (computer in a closet) but it's every bit as arguable that he did it that way because he didn't want someone stealing or unplugging his laptop while it was downloading the data.

He also did not distribute the information wholesale. What he planned on doing with the information was never proven.

OpenAI IS distributing information they got wholesale from the internet without license to that information. Heck, they are selling the information they distribute.

◧◩◪◨⬒⬓
6. anigbr+nT[view] [source] 2023-12-27 19:09:34
>>cogman+Cx
OpenAI IS distributing information they got wholesale from the internet

Facts are not subject to copyright. It's very obvious ChatGPT is more than a search engine regurgitating copies of pages it indexed.

◧◩◪◨⬒⬓⬔
7. tremon+801[view] [source] 2023-12-27 19:48:20
>>anigbr+nT
Facts are not subject to copyright

That's false; but even assuming it's true, misinformation is creative content and therefore 99% of the Internet is subject to copyright.

◧◩◪◨⬒⬓⬔⧯
8. anigbr+EY1[view] [source] 2023-12-28 03:07:35
>>tremon+801
No it is not. You can make a better argument than just BSing.

https://libraries.emory.edu/research/copyright/copyright-dat...

[go to top]