zlacker

[parent] [thread] 8 comments
1. baxtr+(OP)[view] [source] 2025-01-03 06:43:20
Part of me viscerally agrees because large corporations have monetized UGC.

Another part of me though thinks differently. We are a species that builds knowledge from generation to generation. From one person to another. Over years, over centuries.

Philosophically this part tends to think that your thoughts and ideas belong to humanity and thus need to be shared with all of us.

replies(4): >>yowayb+o1 >>Salgat+73 >>friend+e6 >>yencab+ie1
2. yowayb+o1[view] [source] 2025-01-03 07:00:43
>>baxtr+(OP)
Great take. Also agree with parent. I feel like some form of provenance would take us to the next level.
3. Salgat+73[view] [source] 2025-01-03 07:20:58
>>baxtr+(OP)
There's two decades worth of countless conversations on Reddit alone that would be buried into nothingness but instead ML has revived all that activity as useful data. ML is definitely a great way to bring back utility for a lot of old and unused data.
replies(2): >>tempes+gd >>Terr_+lp
4. friend+e6[view] [source] 2025-01-03 07:53:37
>>baxtr+(OP)
If you recall high school history, rapid, exponential "progress" happened once the knowledge was 1) written down (printing press) 2) archived for the future (libraries) 3) systematized (textbook/encyclopaedia) 4) proactively shared (public education), all on a massive scale.

The fact that some knowledge exists and is even accessible does not really matter if takes a highly trained in a very narrow field scholar to find that piece of information. You need a well established knowledge creation and distribution funnel in operation for humanity as a whole to reap the benefits of knowledge.

There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.

replies(1): >>TeMPOr+5g
◧◩
5. tempes+gd[view] [source] [discussion] 2025-01-03 09:15:36
>>Salgat+73
This seems like a reasonable take to me. I wish those downvoting you would explain where they disagree.
◧◩
6. TeMPOr+5g[view] [source] [discussion] 2025-01-03 09:44:01
>>friend+e6
Now we have 5) aggregated and internalized as a whole by computational constructs such as LLMs, which are - 4) - proactively shared (open weights, but also freemium service and dirt-cheap API access to commercial SOTA models), still on a massive scale.

> There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.

Precisely that. Which is why I often argue, that for 99%+ of the content in the training data, its marginal contribution to the training process - itself infinitesimal in isolation - is still by far the most value that content will ever bring to the world.

◧◩
7. Terr_+lp[view] [source] [discussion] 2025-01-03 11:38:28
>>Salgat+73
> revived that activity as useful data

Revived as compressed text associations, it is potentially useful data, but also potentially totally wrong in non-obvious ways. (Or, to riff on Futurama, "The worst kind of incorrect.")

replies(1): >>Salgat+BX
◧◩◪
8. Salgat+BX[view] [source] [discussion] 2025-01-03 16:16:30
>>Terr_+lp
It is used to help train the LLMs on how to "talk" like normal people, even if the topic they're discussing isn't that useful or valuable.
9. yencab+ie1[view] [source] 2025-01-03 17:58:20
>>baxtr+(OP)
The compromise that was supposed to be in place was strong, short term, copyright protection, to help the author (a person) financially during their lifetime. That compromise was destroyed by rich people using corporations as owners and extending copyright duration.

https://en.wikipedia.org/wiki/Copyright_Term_Extension_Act

[go to top]