zlacker

[parent] [thread] 30 comments
1. turtle+(OP)[view] [source] 2020-05-14 07:22:45
Got a few comments about that one and a few others! I just updated the existing word blacklist and re-deployed.
replies(11): >>mad182+W >>Tepix+62 >>r0b05+M5 >>opdahl+ha >>tombh+Sb >>sparky+Yb >>hansbo+Bd >>maaark+Qe >>Interm+ug >>Nextgr+Wh >>Wowfun+iy1
2. mad182+W[view] [source] 2020-05-14 07:30:26
>>turtle+(OP)
It gave me "Intermodulate" a minute ago. https://en.wikipedia.org/wiki/Intermodulation
replies(1): >>turtle+O1
◧◩
3. turtle+O1[view] [source] [discussion] 2020-05-14 07:37:33
>>mad182+W
That ones a little more fuzzy; intermodulate doesn't occur very much in discourse (e.g. not in the wiki article at all) even though it would naturally be related
replies(1): >>tiglio+I7
4. Tepix+62[view] [source] 2020-05-14 07:39:18
>>turtle+(OP)
I got "cyberpolice" which seems to be a real word.

I also got "deflategate" which i love - it must be an upcoming scandal! :-)

replies(2): >>zimpen+Q4 >>itroni+R4
◧◩
5. zimpen+Q4[view] [source] [discussion] 2020-05-14 08:04:40
>>Tepix+62
> it must be an upcoming scandal!

It was, in 2016.

https://en.wikipedia.org/wiki/Deflategate

◧◩
6. itroni+R4[view] [source] [discussion] 2020-05-14 08:04:44
>>Tepix+62
Deflategate was a National Football League (NFL) controversy involving the allegation that New England Patriots quarterback Tom Brady ordered the deliberate deflation of footballs used in the Patriots' victory against the Indianapolis Colts in the 2014 American Football Conference (AFC) Championship Game.

https://en.wikipedia.org/wiki/Deflategate

7. r0b05+M5[view] [source] 2020-05-14 08:12:26
>>turtle+(OP)
If you're using a blacklist, it it really machine learning? Or are you using it to re-train?
replies(2): >>codegl+3b >>Breza+6Qa
◧◩◪
8. tiglio+I7[view] [source] [discussion] 2020-05-14 08:27:08
>>turtle+O1
https://www.thisworddoesnotexist.com/w/bordellum/eyJ3IjogImJ...

Bordellum. It's a word, I think?

https://www.thisworddoesnotexist.com/w/disaproval/eyJ3IjogIm...

Disaproval. So close to a real word it looks like a misspelling.

Anyway, this is really interesting.

replies(1): >>turtle+99
◧◩◪◨
9. turtle+99[view] [source] [discussion] 2020-05-14 08:42:03
>>tiglio+I7
For the latter, press the "Write Your Own" button and it'll do exactly that

In general, I don't fix a random seed so in general you can get different definitions (sometimes: I cache data)

replies(1): >>tiglio+t9
◧◩◪◨⬒
10. tiglio+t9[view] [source] [discussion] 2020-05-14 08:44:56
>>turtle+99
Thanks!
11. opdahl+ha[view] [source] 2020-05-14 08:51:08
>>turtle+(OP)
I also got «invention» up as a result.
◧◩
12. codegl+3b[view] [source] [discussion] 2020-05-14 08:55:45
>>r0b05+M5
blacklist is probably to avoid cases where it randomly generates a real word like above two cases, so that blacklist filter is probably applied after the ml stuff.
replies(1): >>turtle+kc
13. tombh+Sb[view] [source] 2020-05-14 09:02:28
>>turtle+(OP)
Oh, I'm surprised the "blacklist" isn't just the standard English dictionary. I'm sure I'm just being naive though. Why not just blacklist any word that already exists in English?
replies(2): >>Spare_+Bc >>turtle+Ic
14. sparky+Yb[view] [source] 2020-05-14 09:03:13
>>turtle+(OP)
I got "pataphysical".
◧◩◪
15. turtle+kc[view] [source] [discussion] 2020-05-14 09:05:56
>>codegl+3b
Yup, the line for the blacklist lookup is here: https://github.com/turtlesoupy/this-word-does-not-exist/blob...
◧◩
16. Spare_+Bc[view] [source] [discussion] 2020-05-14 09:08:43
>>tombh+Sb
That may be because there isn't a single definitive list of all words in the English language

https://www.lexico.com/explore/how-many-words-are-there-in-t...

◧◩
17. turtle+Ic[view] [source] [discussion] 2020-05-14 09:09:34
>>tombh+Sb
There are some subtleties (e.g. hyphens, derived forms, bigrams, etc.) but the biggest problem is that most English dictionaries don't have entries for every scientific word / piece of internet slang. I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(
replies(1): >>mathew+4C
18. hansbo+Bd[view] [source] 2020-05-14 09:17:13
>>turtle+(OP)
I got Undercrowded

  undercrowded
  un·der·crowded
  (of a place) not full of people or vehicles
  "the area was undercrowded with traffic"
19. maaark+Qe[view] [source] 2020-05-14 09:28:37
>>turtle+(OP)
"fibroblastosis" -- appears to feature in some medical journals

"bryosphere" -- something to do with moss

"biosprint" -- a brand of yeast.

Many of these 'non-words' have already been taken....

replies(1): >>esyir+o33
20. Interm+ug[view] [source] 2020-05-14 09:43:18
>>turtle+(OP)
Unspooled and Hardstyle popped up for me. Perhaps you should do a google search for generated words before displaying them to prevent existing words from being shown.
replies(1): >>tasoga+4h
◧◩
21. tasoga+4h[view] [source] [discussion] 2020-05-14 09:49:41
>>Interm+ug
Or maybe the ML algorithm behind is just using an existing dictionary and performs no operation.
replies(1): >>turtle+pi
22. Nextgr+Wh[view] [source] 2020-05-14 09:59:24
>>turtle+(OP)
I just saw "gofundme".
◧◩◪
23. turtle+pi[view] [source] [discussion] 2020-05-14 10:03:02
>>tasoga+4h
You flatter me by thinking so; it is a bot! The source code is open here: https://github.com/turtlesoupy/this-word-does-not-exist
◧◩◪
24. mathew+4C[view] [source] [discussion] 2020-05-14 12:50:55
>>turtle+Ic
> I ended up tokenizing Wikipedia for a blacklist and still missed a lot :(

That sounds like an impressive project in itself :)

replies(1): >>pbhjpb+RO
◧◩◪◨
25. pbhjpb+RO[view] [source] [discussion] 2020-05-14 14:02:27
>>mathew+4C
Words not on Wikipedia, found on other sources, listed by frequency (perhaps with a date-weighting of the source document to reduce rating of older sources), would be an interesting way to find holes in Wikipedia's coverage.
replies(1): >>patric+8R
◧◩◪◨⬒
26. patric+8R[view] [source] [discussion] 2020-05-14 14:13:27
>>pbhjpb+RO
Someone should make a Wikipedia page of that list. Oh, wait.
replies(1): >>pbhjpb+do5
27. Wowfun+iy1[view] [source] 2020-05-14 17:23:52
>>turtle+(OP)
Multiple people managed to find it? How likely is it to generate the same non-word more than once? Is there a limited set?
◧◩
28. esyir+o33[view] [source] [discussion] 2020-05-15 02:28:54
>>maaark+Qe
Oh, that second one is cute. I read up on bryophyes (moss and friends) for an exceedingly brief stint back in my undergrad days. Pronunciation similarity to bio made for many "bryo" puns.
◧◩◪◨⬒⬓
29. pbhjpb+do5[view] [source] [discussion] 2020-05-15 20:45:37
>>patric+8R
I like how you had information, made a sarcastic comment about it, but didn't share the actual information ... just in case your comment might prove helpful ...
replies(1): >>roryok+bK5
◧◩◪◨⬒⬓⬔
30. roryok+bK5[view] [source] [discussion] 2020-05-15 23:05:09
>>pbhjpb+do5
Are you saying the URL of that Wikipedia page is “actual information” that patrickthebold failed to share?

I think that page doesn’t exist. patrickthebold wasn’t sarcastically mocking people who were too lazy to look up that page. He was just making the point that as soon as a hypothetical list like that was uploaded to Wikipedia, it should be deleted, since those words would then be words found on Wikipedia.

◧◩
31. Breza+6Qa[view] [source] [discussion] 2020-05-18 03:51:28
>>r0b05+M5
Data scientist here. It's common to define boundaries for a machine learning algorithm by hand. Think of telling a chess AI that it can't move pieces off the board.
[go to top]