zlacker

[parent] [thread] 1 comments
1. pwilli+(OP)[view] [source] 2023-09-01 19:56:20
Wow -- I hadn't thought of this but makes total sense. We'll need giant definitely-human-curated databases of information for AIs to consume as more information becomes generated by the AIs.
replies(1): >>dredmo+n5
2. dredmo+n5[view] [source] 2023-09-01 20:27:59
>>pwilli+(OP)
There's a long history of informational classification, going back to Aristotle and earlier ("Categories"). See especially Melville Dewey, the US Library of Congress Classification, and the work of Paul Otlet. All are based on exogenous classification, that is, subjects and/or works classification catalogues which are independent of the works classified.

Natural-language content-based classification as by Google and Web text-based search relies effectively on documents self-descriptions (that is, their content itself) to classify and search works, though a ranking scheme (e.g., PageRank) is typically layered on top of that. What distinguished early Google from prior full-text search was that the latter had no ranking criteria, leading to keyword stuffing. An alternative approach was Yahoo, originally Yet Another Hierarchical Officious Oracle, which was a curated and ontological classification of websites. This was already proving infeasible by 1997/98 as a whole, though as training data for machine classification might prove useful.

[go to top]