zlacker

[parent] [thread] 5 comments
1. turtle+(OP)[view] [source] 2020-05-14 09:09:34
There are some subtleties (e.g. hyphens, derived forms, bigrams, etc.) but the biggest problem is that most English dictionaries don't have entries for every scientific word / piece of internet slang. I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(
replies(1): >>mathew+mp
2. mathew+mp[view] [source] 2020-05-14 12:50:55
>>turtle+(OP)
> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)

replies(1): >>pbhjpb+9C
◧◩
3. pbhjpb+9C[view] [source] [discussion] 2020-05-14 14:02:27
>>mathew+mp
Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.
replies(1): >>patric+qE
◧◩◪
4. patric+qE[view] [source] [discussion] 2020-05-14 14:13:27
>>pbhjpb+9C
Someone should make a Wikipedia page of that list. Oh, wait.
replies(1): >>pbhjpb+vb5
◧◩◪◨
5. pbhjpb+vb5[view] [source] [discussion] 2020-05-15 20:45:37
>>patric+qE
I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...
replies(1): >>roryok+tx5
◧◩◪◨⬒
6. roryok+tx5[view] [source] [discussion] 2020-05-15 23:05:09
>>pbhjpb+vb5
Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.

[go to top]