I'm using a blacklist to reject "real" words but it's surprisingly hard to build for rare words. I'm up to ~600K items after parsing Wikipedia tokens and it still doesn't capture everything.
This word was used for the Patriots scandal a few years ago.
(https://www.thisworddoesnotexist.com/w/microtissue/eyJ3IjogI...)
Granted, I don't know if it's technically a dictionary word, but I've definitely seen it in technical contexts.
https://www.thisworddoesnotexist.com/w/metamucil/eyJ3IjogIm1...
edit: also got "redbull" https://www.thisworddoesnotexist.com/w/redbull/eyJ3IjogInJlZ...
[1] https://www.thisworddoesnotexist.com/w/carseat/eyJ3IjogImNhc...
handjob
hand·job
a job offered to a woman as a job or offer offered by a man
"you have won a handjob as a waitress"
https://www.thisworddoesnotexist.com/w/handjob/eyJ3IjogImhhb...Uhh.........
squarespace
squares·pace
the ability to play in a board game within a system by controlling the left and right edges of the board or a grid surrounded by squares
https://www.thisworddoesnotexist.com/w/squarespace/eyJ3IjogI...
It almost gets it right...
It's a subreddit filled entirely with bots, each user is trained on a specific subreddit's comments matching it's username (so politicsGPT2Bot is trained on comments from the politics subreddit).
Go click through a few comment sections and see how mind-bendingly real some comment chains seem. They reply quoting other comments, they generate links (they almost always go to a 404 page, but they look real and are in format that makes me think it's real every time I hover over it) entirely on their own, they have full conversations back and forth, they make jokes, they argue "opinions" (often across multiple comments back and forth keeping the context of which "side" each comment is on), and they vary from single word comments to multi-paragraph comments.
Take a look at this thread [2] specifically. The headline is made up, the link it goes to is made up, but the comments look insanely real at first glance. Some of them even seem to be quoting the contents of the article (which again, doesn't exist) in it's comments!
If you threw something like 50% "real humans" in the mix, I genuinely don't think I'd be able to pick out the bots on my own.
[1] https://www.reddit.com/r/SubSimulatorGPT2/
[2] https://www.reddit.com/r/SubSimulatorGPT2/comments/fzwso5/nr...
eicoscience
eico·science
the branch of physics that deals with the behavior, physics, and the properties of living organisms
"I started thinking about the world thinking of me and I never really went deep into the eicoscience"
https://www.thisworddoesnotexist.com/w/eicoscience/eyJ3IjogI...https://www.thisworddoesnotexist.com/w/polyfill/eyJ3IjogInBv...
https://www.thisworddoesnotexist.com/w/metaspace/eyJ3IjogIm1...
This is a Java concept: https://stuefe.de/posts/metaspace/what-is-metaspace/
https://old.reddit.com/r/SubSimulatorGPT2/comments/gj2z4f/ia...
punchjoy:
https://www.thisworddoesnotexist.com/w/punchjoy/eyJ3IjogInB1...
jockrock:
https://www.thisworddoesnotexist.com/w/jockrock/eyJ3IjogImpv...
cubot
a member of an American Indian people living in
shallow water off central Alaska and western Australia
https://www.thisworddoesnotexist.com/w/cubot/eyJ3IjogImN1Ym9... patentless
patent·less
not having or requiring a license of a particular kind and without permission
"patentless wireless communications"
a word that does not exist; it was invented, defined and used by a machine learning algorithm.
Well, not exactly [0]!https://en.wiktionary.org/wiki/bait_box
Still pretty cool. I'd totally believe some of these definitions if I didn't see the site.
Also this one which I've seen used quite a bit. overfamiliarize https://www.thisworddoesnotexist.com/w/overfamiliarize/eyJ3I...
https://en.wikisource.org/wiki/User:Hal_Bregg_II/Neologisms_...
This 1961 book predicted what Lem calls "opton", an electronic book reader with only one page between the covers (Kindle Opton special for Lem's 100 year anniversary next year would be nice!)
I wonder if someone has created sci-fi short stories with that data set yet.
• This Person Does Not Exist https://thispersondoesnotexist.com/
• These Lyrics Do Not Exist https://theselyricsdonotexist.com/
• This Cat Does Not Exist https://thiscatdoesnotexist.com/
• This Rental Does Not Exist https://thisrentaldoesnotexist.com/
• This Waifu Does Not Exist https://www.thiswaifudoesnotexist.net/
• This Resume Does Not Exist https://thisresumedoesnotexist.com/
• This Artwork Does Not Exist https://thisartworkdoesnotexist.com/
reductory
re·duc·tory
relating to or denoting products of decay
"winding and burning reductory tissue"
https://www.thisworddoesnotexist.com/w/reductory/eyJ3IjogInJ...
Edit: fixed mono spacing.
noun.
swem
the process of forming a strip of mucus before injecting intravenously
"semen started to emerge immediately after swemting"
https://www.thisworddoesnotexist.com/w/swem/eyJ3IjogInN3ZW0i...Or, if feeling contentious, other wordspaces https://gender.wikia.org/wiki/Category:Gender_Identities
I'm the developer behind http://prosecraft.io so I always enjoy learning about new linguistic toys :)
Thank you!
The early results were that it works, but noisier datasets are tough. The urban dictionary corpus also has a ton of racist definitions
couldn'nt help but thinking about trump :)
[1] https://www.thisworddoesnotexist.com/w/suckline/eyJ3IjogInN1...
"citation needed" on that completeness claim, or rather "this list is incomplete, you can help by expanding it"
1. https://www.thisworddoesnotexist.com/w/spongen/eyJ3IjogInNwb...
>a narrative, typically one that takes place on a screen or on social media
https://www.thisworddoesnotexist.com/w/chlamydrama/eyJ3IjogI...
I'm impressed, a witty wordplay.
> accommodability
> ac·com·mod·abil·ity
> The quality of being likely to be useful, effective, or useful
---
[1] https://www.thisworddoesnotexist.com/w/accommodability/eyJ3I...
https://i.imgur.com/saUivfe.png
shudder
nonhumanoid
1. not having life
"a nonhumanoid woman"
https://www.thisworddoesnotexist.com/w/nonhumanoid/eyJ3IjogI...
From memory, it picks each letter with the same probability of following the previous two letters as actual english words have. The more previous letters included in calculating probabilities, the more like actual words you get. My list is on the wild side. Not novel, but was fun to do. Good for writing Jabberwocky-type poetry.
"You're arguing against yourself!"
Wow, the links are just base64'ing the whole text because of course there's no way to trivially reproduce it...
https://www.thisworddoesnotexist.com/w/glosscoat/eyJ3IjogImd...
https://www.thisworddoesnotexist.com/w/epidemiograph/eyJ3Ijo...
Someone used Epidemiography
https://www.sciencedirect.com/science/article/pii/0166218X89...
https://www.thisworddoesnotexist.com/w/tuberculant/eyJ3IjogI...
That's brilliant, and I'm going to use it.
https://www.thisworddoesnotexist.com/w/infernus/eyJ3IjogImlu...
https://www.thisworddoesnotexist.com/w/transclusion/eyJ3Ijog...
https://www.thisworddoesnotexist.com/w/airpods/eyJ3IjogImFpc...
headbutter. One who strikes other people with one's head. You are becoming known as a headbutter, so unless you want the league to suspend you, I suggest that you stop playing dirty! Headbutter - Idioms by The Free Dictionary https://idioms.thefreedictionary.com/headbutter
nonadjective
non·ad·jec·tive
not based on a principle or theory
"the philosophical and political alternatives to the totalitarianism of nonadjective politics"
https://www.thisworddoesnotexist.com/w/backpressure/eyJ3Ijog...
refactoring
the systematic and systematic reworking of a piece of text to reduce unnecessary redundancies
Interesting that the made-up definition is pretty much the real one.https://www.thisworddoesnotexist.com/w/refactoring/eyJ3IjogI...
https://www.reddit.com/r/SubSimulatorGPT2/comments/caaq82/we...
https://www.google.com/search?q=sniglets
The one I remember most is cheetle, the orange powder left on your fingers after eating Cheetos
-- https://old.reddit.com/r/SubSimulatorGPT2/comments/gj7ony/cm...
I guess sometimes it errors out :D
> a very small, sweet, pouched porcupine; a cricket
> "soak in a splash of chocolate red liqueur to ensure that the fluffy bits eventually turn into tiger-necked poodle poodle"
https://www.thisworddoesnotexist.com/w/tiger-necked+poodle/e...
† https://www.thisworddoesnotexist.com/w/dolky/eyJ3IjogImRvbGt...
Bordellum. It's a word, I think?
https://www.thisworddoesnotexist.com/w/disaproval/eyJ3IjogIm...
Disaproval. So close to a real word it looks like a misspelling.
Anyway, this is really interesting.
https://www.lexico.com/explore/how-many-words-are-there-in-t...
It’s not too far off Mycogen: As Asimov explains in Prelude to Foundation,[21] their name is formed from the Greek stems myco- (meaning 'yeast' or other types of fungi) and -gen (meaning 'maker' or 'producer').
https://en.m.wikipedia.org/wiki/Galactic_Empire_(Isaac_Asimo...
What this means, building off of what the evidence I gave, is that this model is not learning the morphemes (smallest root meaning). This exact characteristic is part of why these words sound weird. It is the same problem as the one brought up by tasogare.
So "transgate" also sounds weird because it has opposing ideas. "through" + "block". But we need to look at morphemes to see why. At least (IIRC) it made this word a verb.
I am not totally happy with the results but have not had a chance to train my own