zlacker

[parent] [thread] 7 comments
1. tombh+(OP)[view] [source] 2020-05-14 09:02:28
Oh, I'm surprised the "blacklist" isn't just the standard English dictionary. I'm sure I'm just being naive though. Why not just blacklist any word that already exists in English?
replies(2): >>Spare_+J >>turtle+Q
2. Spare_+J[view] [source] 2020-05-14 09:08:43
>>tombh+(OP)
That may be because there isn't a single definitive list of all words in the English language

https://www.lexico.com/explore/how-many-words-are-there-in-t...

3. turtle+Q[view] [source] 2020-05-14 09:09:34
>>tombh+(OP)
There are some subtleties (e.g. hyphens, derived forms, bigrams, etc.) but the biggest problem is that most English dictionaries don't have entries for every scientific word / piece of internet slang. I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(
replies(1): >>mathew+cq
◧◩
4. mathew+cq[view] [source] [discussion] 2020-05-14 12:50:55
>>turtle+Q
> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)

replies(1): >>pbhjpb+ZC
◧◩◪
5. pbhjpb+ZC[view] [source] [discussion] 2020-05-14 14:02:27
>>mathew+cq
Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.
replies(1): >>patric+gF
◧◩◪◨
6. patric+gF[view] [source] [discussion] 2020-05-14 14:13:27
>>pbhjpb+ZC
Someone should make a Wikipedia page of that list. Oh, wait.
replies(1): >>pbhjpb+lc5
◧◩◪◨⬒
7. pbhjpb+lc5[view] [source] [discussion] 2020-05-15 20:45:37
>>patric+gF
I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...
replies(1): >>roryok+jy5
◧◩◪◨⬒⬓
8. roryok+jy5[view] [source] [discussion] 2020-05-15 23:05:09
>>pbhjpb+lc5
Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.

[go to top]