That's what makes it such a good giveaway. I'm happy to be told that I'm wrong, and that you do actually use the proper double long dash in your writing, but I'm guessing that you actually use the human slang for an emdash, which is visually different and easily sets your writing apart as not AI writing!
Also, phone keyboards make it easy. Just hold down the - and you can select various types.
"the formal emdash"?
> AIs are very consistent about using the proper emdash—a double long dash with no spaces around it
Setting an em-dash closed is separate from whether you using an em-dash (and an em-dash is exactly what it says, a dash that is the width of the em-width of the font; "double long" is fine, I guess, if you consider the en-dash "single long", but not if, as you seem to be, you take the standard width as that of the ASCII hyphen-minus, which is usually considerably narrower than en width in a proportional font.)
But, yes, most people who intentionally use em-dashes are doing so because they care about detail enough that they are also going to set them closed, at least in the uses where that is standards. (There are uses where it is conventional to set them half-closed, but that's not important here.)
> whereas humans almost always tend to use a slang version - a single dash with spaces around it.
That's not an em-dash (and its not even an approximation of one, using a hyphen-minus set open—possibly doubled—is an approximation of the typographic convention of using an en-dash set open – different style guides prefer that for certain uses for which other guides prefer an em-dash set closed.) But I disagree with your claim that "most humans" who describe themselves as using em-dashes instead are actually just approximating the use of en-dashes set open with the easier-to-type hyphen-minus.
In certain places it does seem to do the substitution - Notes for example - but in comment boxes on here and (old) Reddit at least it doesn't.
Still less obvious than the emails I see sent out which contain emojis, so maybe I'm overthinking things...
They’re simple enough key combinations (on a Mac) that I wouldn’t be surprised if I guessed them. I certainly find it confusing to imagine someone who has to write professionally or academically not working out how to type them for those purposes at least.
on Macintosh: option+shift+-
on Linux: compose - - -
We're the training data.
On Linux, I use Compose-hyphen-hyphen-hyphen.
I don't use it as often as I used to; but when I was younger, I was enough of a nerd to use it in my writing all the time. And yes, always careful to use it correctly, and not confuse it with an en-dash. Also used to write out proper balanced curly quotes on macOS, before it was done automatically in many places.
Being able to insert self-interjections and such with the correct character would undoubtedly be more widespread if it were more accessible to insert for most.
>That's not an em-dash (blahblahblah...
What, exactly, did you thing "slang" in the phrase "slang version" meant?
Examples within the last week include >>44996702 , >>44989129 , >>44991769 , >>44989444 . I typed all of those.
I never use space-hyphen-space instead of an em dash. I do sometimes use TeX's " --- ".
There’s a subculture effect: this has been trivial on Apple devices for a long time—I’m pretty sure I learned the Shift-Option-hyphen shortcut in the 90s, long before iOS introduced the long-press shortcut—and that’s also been a world disproportionately popular with the kind of people who care about this kind of detail. If you spend time in communities with designers, writers, etc. your sense of what’s common is wildly off the average.
No longer. Just like you can no longer bold key phrases, you can no longer use emdashes if your writing being ID'd as "AI" is important (or not).
The LLM is first trained as an extreneley large Markov model predicting text scraped from the entire Internet. Ideally, a well trained such Markov model would use em dashes approximately as frequently as they appear in real texts.
But that model is not the LLM you actually interact with. The LLM you interact with is trained by somethig called Reinforcement Learning from Human Feedback, which involves people reading, rating and editing its responses, biasing the outputs and giving the model a "persona".
That persona is the actual LLM you interact with. Since em dash usage was rated highly by the people providing the feedback, the persona learned to use it much more frequently.
I've found that people who say this sort of thing rarely change their beliefs, even after being given evidence that they are wrong. The fact is, as numerous people have pointed out, Word and other editors/word processors change '--' to an em-dash. And the "slang version" of an em-dash is "I went to work--but forgot to put on pants", not "I went to work - but forgot to put on pants".
BTW, "humans almost always tend to use" is very poor writing--pick one or the other between "almost always" and "tend to". It wouldn't be a bad thing if LLMs helped increase human literacy, so I don't know why people are so gung ho on identifying AI output based on utterly non-substantive markers like em-dashes. Having an LLM do homework is a bad thing, but that's not what we're talking about. And someone foolishly using the presence of em-dashes to detect LLM output will utterly fail against someone using an editor macro to replace em-dashes with the gawdawful ' - '.
I'm gonna use it more thanks to this tip. Thanks!
I don't care if people or robots think I'm a robot.
I'd be suspicious of people doing their writing in Word and copying it over into random comment fields, too.
> And the "slang version" of an em-dash is "I went to work--but forgot to put on pants", not "I went to work - but forgot to put on pants".
The fun thing about slang is that different groups have different slangs! I use the latter pretty regularly, but have never done the former.
> BTW, "humans almost always tend to use" is very poor writing--pick one or the other between "almost always" and "tend to".
Nah.
> It wouldn't be a bad thing if LLMs helped increase human literacy,
Where "literacy" is defined as strictly following arbitrary rules without any concern for whether it actually helps people read it?
And, on the assumption that those rules actually are meaningful, wouldn't you rather have people learn them for themselves?
Sigh.
I’m not the person you asked, but I do.
> the proper emdash—a double long dash with no spaces around it
The spaces around it depend on style guide, it is not universal that they should not exist.
> That's because most keyboards don't have an emdash key
Nor do they have keys for proper quotes and apostrophes or interrobangs, yet it doesn’t stop people from using them. The keys don’t need to exist.
> That's what makes it such a good giveaway.
It’s not. It might be one signal but it is far from sufficient.
> I'm happy to be told that I'm wrong, and that you do actually use the proper double long dash in your writing
I do use the proper em-dash in my writing—and many other characters too—and my HN history is ample proof. I explained at length in another comment how I insert the characters, plus how simple it is if you use any Apple OS.
Both make sense, to a degree. On the one hand you can argue that the em-dash—being longer—should require and extra key, but on the other hand it has more uses so it should not have the extra key to be more accessible.
I reject everything else about that poorly reasoned "suspicious" response as well.
I never use hyphens where em dashes would be correct.
I do have issues determining when a two-word phrase should or shouldn't be hyphenated. It surely doesn't help that I grew up in a bilingual English/German household, so that my first instinct is often to reject either option, and fully concatenate the two words instead.
(Whether that last comma is appropriate opens a whole other set of punctuation issues ... and yes, I do tend to deliberately misuse ellipses for effect.)
I would argue that LLMs overuse the emdash more because they overuse specific rhetorical devices, e.g. antithesis, than because they are being too correct about punctuation.
Also you can ctrl-z immediately after an autocorrect to undo it.
I do them without surrounding spaces, because that's... how you're supposed to use them, and it's also less typing.
They also used to be a really good Shibboleth to tell if someone was using a Mac—the key combo on there is easy, and also easy to remember, so Mac users were far more likely than the median to employ em-dashes. It wasn't a sure tell, but it was pretty reliable.
It took centuries for the written word to acquire spaces between words, and then the US decided to jam them back together again.
Curious why folk are using two hyphens "--" instead of en-dash.
Been using shift+option+hyphen to make and use em-dashes (sans spaces) since at least 2005, when I got my first publishing job and also started blogging (so writing a ton more). I also use option+hyphen (en-dash) for date and number ranges. In my experience, ChatGPT consistently adds spaces around both.
So... that's just to say that people who are exposed to the sorts of can't-unsee-it-now typesetting OCD that LaTeX and various popular extension packages within that ecosystem exposes can learn to write write "--" as en-dash.
It's sort of like being unable to return to the blissful state of not being hyperaware that Ariel and Helvetica are different.