zlacker

Show HN: Hacker News em dash user leaderboard pre-ChatGPT

submitted by tkgall+(OP) on 2025-08-30 03:40:23 | 377 points 266 comments
[view article] [source] [go to bottom]

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933


NOTE: showing posts with links only show all posts
◧◩
10. dang+B2[view] [source] [discussion] 2025-08-30 04:16:37
>>userbi+p2
I'm only #2 but all mine are guaranteed hand-made, done this way: >>45071823
11. dang+D2[view] [source] 2025-08-30 04:17:05
>>tkgall+(OP)
There's also >>27787448
◧◩
23. tkgall+k4[view] [source] [discussion] 2025-08-30 04:53:30
>>IAmGra+83
As mentioned in the thread that included dang’s suggestion [1], examples of one’s use of em dashes timestamped before ChatGPT could be used as a defense if one is accused, on the basis of em dashes, of having written with AI.

Whether this is interesting or not, well…

[1] >>45046883

58. astahl+1d[view] [source] 2025-08-30 07:04:31
>>tkgall+(OP)
I started using emdashes in my academic career, after my advisor pointed me to the subtle differences. And since then, I like and use emdash a lot. In Latex, it is easily produced, just keep the spacing rules in mind. The Punctuation Guide is a nice reference on it https://www.thepunctuationguide.com/
◧◩◪◨
61. notpus+jd[view] [source] [discussion] 2025-08-30 07:08:53
>>machin+f6
You can install a custom layout on Windows, like the one I made: https://typo.ale.sh/
69. chrism+xe[view] [source] 2025-08-30 07:24:02
>>tkgall+(OP)
As #10 on this list, here’s how I do it on my laptop.

I remap a key to the right of Space to Compose, and add various custom sequences. Before long, I was completely comfortably and casually typing dashes and curly quotes and more, and in fact it takes conscious effort for me to limit myself to ASCII when typing prose. (Writing code, writing *, /, -, ' and " is easy. But writing prose, I genuinely will write ×, ÷ if it feels the right one in that place, −, ‘/’ and “/”.)

On one previous laptop keyboard I mapped Menu, on my current one RAlt is more suitable.

When on Windows, I use WinCompose. On Linux, I used to just use it bare, which had advantages and disadvantages—apps implement a Compose key inconsistently, some messing things up related to includes and some handling overlapping sequences differently. More recently I wanted to be able to type Telugu and installed fcitx5 which is no longer mostly broken under Wayland like it was last time I tried, so now fcitx5 is handling the Compose sequences across the entire system, and working more consistently. Also I can use Ctrl+Alt+Shift+U and get a popup where I can search Unicode by code or description. Now if only that pesky popup would handle Shift+Space and Ctrl+Backspace itself rather than letting them fall through to the parent…

In my ~/.config/sway/config:

  input * {
      xkb_options "caps:backspace,compose:ralt"
  }
(caps:backspace isn’t entirely relevant here, but it’s on the same line and I choose to mention it. When people are remapping Caps Lock, I’ve never understood why so many seem to choose to make it Escape. Just extend the left hand and slap the corner of the keyboard with the ring finger, it’s not a huge movement and is easy to reach and return. Backspace, however, tends to be needed at least as often (and yes, I say that despite using Vim), and is much harder to hit. In my mind, a far better candidate for shifting to that prime real estate.)

For my ~/.XCompose, I start with the defaults and one good set of additions, https://raw.githubusercontent.com/kragen/xcompose/master/dot...:

  include "/usr/share/X11/locale/en_US.UTF-8/Compose"
  include "/home/chris/.XCompose-kragen"
Then I add all kinds of additions. Lots of fine typography stuff like zero-width space and non-joiner, narrow no-break space, thin space… a few more hyphen/dash mappings… and lots of other things like nice emoji sequences, music notation stuff, Greek letters matching Vim digraphs, superscript ordinals (ˢᵗ, ⁿᵈ, ʳᵈ, ᵗʰ), the keyboard shortcut symbols macOS uses (⌘⌃⌥⇧⌫ and another dozen less common ones), control pictures like ␆, and a handful of other things.

When all’s said and done:

• Compose - - - gets me — EM DASH (stock)

• Compose - - . gets me – EN DASH (stock)

• Compose - - = gets me − MINUS SIGN (custom)

• Compose - - w gets me ⸺ TWO EM DASH (custom; w for wide)

• Compose - - W gets me ⸻ THREE EM DASH (custom; W for Wider)

The last two I use occasionally, the other three I use very frequently. I went through a phase of using HYPHEN and SOFT HYPHEN, now I seldom use them.

I also like to write &c. (italic where supported) for et cetera.

For quotation marks, I also use custom mappings:

  <Multi_key> <semicolon> <semicolon>   : "‘"   U2018 # LEFT SINGLE QUOTATION MARK
  <Multi_key> <apostrophe> <apostrophe> : "’"   U2019 # RIGHT SINGLE QUOTATION MARK
  <Multi_key> <colon> <colon>           : "“"   U201c # LEFT DOUBLE QUOTATION MARK
  <Multi_key> <quotedbl> <quotedbl>     : "”"   U201d # RIGHT DOUBLE QUOTATION MARK
Think about how you physically type them, and I reckon these mappings make a lot of sense, very easy to type. Much better than the stock bindings (<' >' <" >") or kragen ones (`Space 'Space `` ''; or 6' 9' 6" 9").

—⁂—

(Oh yeah, that one’s <Multi_key> <h> <r> : "—⁂—".)

Now, I have one question I’d like answered. Overlapping sequences. If you have -> → and <- ← you’re fine, but when you add <-> ↔, I can’t find any way of using the <- sequence any more. Before fcitx5, some apps would ignore one or the other (in ways difficult to explain which I think involved the fact that some definitions came from includes), and some would let you terminate the sequence early and match the shorter one (e.g. Compose < - Enter). Is there some proper solution I’ve missed?

I have plans for an article on my keyboard arrangements, including sharing a full .XCompose, but I’m going to finish my next major revision to my website first. Because then I’ll be able to draw things instead of just writing.

—⁂—

On mobile, I think I use FUTO keyboard at present, which lets me access most of these things, but not elegantly. I want to make my own keyboard layout that lets me access the good stuff more easily, but I haven’t got to it yet.

Also: anyone want to join me in advocating for completion dictionaries and libraries to replace their ' apostrophes with ’, or at least to support both approaches equally? I’m fed up with not having this stuff, Vim is the only place where it was straightforward to get it about right, and mobile is just a mess.

70. tkgall+Je[view] [source] 2025-08-30 07:26:42
>>tkgall+(OP)
Due to the interest in this project, I created a second, more comprehensive version of the leaderboard:

https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...

This second version was vibe-coded with Codex CLI. I also tried Gemini CLI, but it didn’t work very well. The SQL scripts I ran at BigQuery were by Claude.

I am not a programmer or web designer, so I will leave these pages as they are, warts and all. It was a fun project, though. I never would have attempted something like this pre-vibe-coding.

◧◩
79. Symbio+hg[view] [source] [discussion] 2025-08-30 07:47:21
>>ThatMe+ea
Enable the Compose key and you'll get even more easy symbols, and they're reasonably guessable.

  Compose ` e produces è
          " a produces ä
          v s produces š
          v S produces Š
          a e produces æ
          C = produces €
          l - produces £
          - > produces → 
        ( 1 ) produces ①
          ^ 1 produces ¹
          _ 1 produces ₁
          1 8 produces ⅛
        - - - produces —
        - - . produces –
          . . produces …
          . - produces ·
          | - produces †
          | = produces ‡
          " < produces “
          x x produces ×
          m u produces µ
          > = produces ≥
See /usr/share/X11/locale/en_US.UTF-8/Compose for the list and https://en.wikipedia.org/wiki/Compose_key

I have also configured Shift+Compose to send the code 'dead_greek' using ~/.Xmodmap:

  keycode 135 = Multi_key dead_greek Multi_key Multi_key
Then I can type α, β, γ, Δ, Ε, Ζ easily, although I hardly ever need this nowadays.
◧◩◪◨⬒
83. notpus+0h[view] [source] [discussion] 2025-08-30 07:55:10
>>Moru+bg
Good news! Compose key is available in Linux natively, and for Windows there’s WinCompose by Sam Hocevar: https://wincompose.info/
◧◩◪◨
86. JimDab+Ch[view] [source] [discussion] 2025-08-30 08:01:48
>>iamacy+Ef
iOS 11, released in September 2017, added the Smart Punctuation feature, which included turning a double hyphen into an em dash:

https://daringfireball.net/2018/02/ios_messages_smart_punctu...

96. Symbio+Bj[view] [source] 2025-08-30 08:26:41
>>tkgall+(OP)
Using the HN public dataset in Google BigQuery [0], which I think fits easily in the amount of free queries allowed:

  SELECT 
    EXTRACT(YEAR FROM timestamp) AS year, 
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) AS withDash, 
    COUNT(*) AS total, 
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction
  FROM `bigquery-public-data.hacker_news.full` 
    WHERE type = 'comment' 
  GROUP BY year 
  ORDER BY year;

  year with—   total  frac
  2006     0      12 0.000
  2007    13   70858 0.000
  2008   461  247922 0.001
  2009  1497  491034 0.003
  2010  3835  842438 0.005
  2011  4719 1044913 0.005
  2012  5648 1246782 0.005
  2013  7881 1665185 0.005
  2014  8400 1510814 0.006
  2015  9967 1642912 0.006
  2016 12081 2093612 0.006
  2017 14530 2361709 0.006
  2018 19246 2384086 0.008
  2019 23662 2755063 0.009
  2020 27316 3243173 0.008
  2021 32863 3765921 0.009
  2022 34657 4062159 0.009
  2023 36611 4221940 0.009
  2024 32543 3339861 0.010
  2025 30608 2231919 0.014
So there's definitely been an increase.

Querying for the users who use "—" most as a proportion of all their comments:

  SELECT
    `by`,
    SUM(CASE WHEN text LIKE '%—%' THEN 1 ELSE 0 END) / COUNT(*) AS fraction,
    COUNT(*) AS total,
    MIN(timestamp) AS minTime,
    MAX(timestamp) AS maxTime
  FROM `bigquery-public-data.hacker_news.full` 
  WHERE 
    type = 'comment' AND 
    timestamp < '2022-11-30' 
  GROUP BY `by`
  HAVING COUNT(*) > 100
  ORDER BY fraction DESC
  LIMIT 250;
zmgsabst uses them the most [1], westoncb [2] is an older account that uses them fourth-most.

[0] https://console.cloud.google.com/marketplace/product/y-combi...

[1] https://news.ycombinator.com/threads?id=zmgsabst

[2] https://news.ycombinator.com/threads?id=westoncb

◧◩◪
106. perihe+Xl[view] [source] [discussion] 2025-08-30 08:54:08
>>Moru+nc
I remember participating in a small thread on how to type an em-dash, on different OS's. It was in March 2023, so before the em-dash meme had started—it was an innocent question then.

https://news.ycombinator.com/item?id=35118338#35118598

117. dns_sn+fp[view] [source] 2025-08-30 09:37:05
>>tkgall+(OP)
Slightly tweaked, a leaderboard of em dash containing comments after ChatGPT release, limited to users who used them in fewer than 1% of comments before ChatGPT release, and who posted at least 200 comments before and after ChatGPT release. Data is recent (August 28th).

Of course this doesn't mean they're using ChatGPT either, they could've switched devices or started using them because they felt like it.

  #   user           before_chatgpt after_chatgpt  
  1   fao_           9/1777 (1 %)   36/225 (16 %)
  2   tlogan         1/962 (0 %)    59/399 (15 %)
  3   whynotminot    1/250 (0 %)    36/356 (10 %)
  4   unclebucknasty 13/2566 (1 %)  38/378 (10 %)
  5   iLemming       0/793 (0 %)    61/628 (10 %)
  6   nostrebored    10/1045 (1 %)  32/331 (10 %)
  7   freeone3000    0/2128 (0 %)   74/791 (9 %) 
  8   pdabbadabba    6/932 (1 %)    20/225 (9 %) 
  9   thebooktocome  4/632 (1 %)    18/208 (9 %) 
  10  tnecniv        0/671 (0 %)    34/446 (8 %) 
  11  dkersten       39/5092 (1 %)  24/318 (8 %) 
  12  stared         8/1565 (1 %)   29/392 (7 %) 
  13  ETH_start      3/385 (1 %)    75/1029 (7 %)
  14  tcbawo         2/792 (0 %)    15/218 (7 %) 
  15  jbm            2/406 (0 %)    22/350 (6 %) 
Query [2]:

  WITH by_user AS (
    SELECT
      `by` AS user,
      COUNTIF(text LIKE '%—%') AS match_count,
      COUNT(*) AS total_count,
      (timestamp >= '2022-11-30') AS after_chatgpt
    FROM `bigquery-public-data.hacker_news.full` 
    WHERE type = 'comment'
    GROUP BY user, after_chatgpt
  ),
  combined AS (
    SELECT
      user,
      MAX(IF(NOT after_chatgpt, match_count, 0)) AS match_before_chatgpt,
      MAX(IF(NOT after_chatgpt, total_count, 0)) AS total_before_chatgpt,
      MAX(IF(after_chatgpt, match_count, 0)) AS match_after_chatgpt,
      MAX(IF(after_chatgpt, total_count, 0)) AS total_after_chatgpt,
    FROM by_user
    GROUP BY user
    HAVING total_before_chatgpt >= 200 AND total_after_chatgpt >= 200
  ),
  with_fractions AS (
    SELECT
      *,
      SAFE_DIVIDE(match_before_chatgpt, total_before_chatgpt)  AS fraction_before_chatgpt,
      SAFE_DIVIDE(match_after_chatgpt, total_after_chatgpt) AS fraction_after_chatgpt
    FROM combined
  )
  SELECT
    user,
    FORMAT('%d/%d (%.0f %%)', match_before_chatgpt, total_before_chatgpt, ROUND(fraction_before_chatgpt*100)) AS before_chatgpt,
    FORMAT('%d/%d (%.0f %%)', match_after_chatgpt, total_after_chatgpt, ROUND(fraction_after_chatgpt*100)) AS after_chatgpt
  FROM with_fractions
  WHERE fraction_before_chatgpt < 0.01
  ORDER BY fraction_after_chatgpt DESC
  LIMIT 15
[1] >>45072937

[2] https://console.cloud.google.com/marketplace/product/y-combi...

◧◩
120. dns_sn+8q[view] [source] [discussion] 2025-08-30 09:49:46
>>PUSH_A+5b
HN is burying my comments (thanks!) but here it is: >>45073287
◧◩◪◨
123. JdeBP+fs[view] [source] [discussion] 2025-08-30 10:22:12
>>sebast+Tp
You'll need to delve into history back quite a number of years. (-:

* >>18439869

158. sjs382+IQ[view] [source] 2025-08-30 14:30:08
>>tkgall+(OP)
You can count your own with this snippet. Just replace my username with your own. My count before this comment was 46.

  curl -s "https://hn.algolia.com/api/v1/search?tags=comment,author_sjs382&hitsPerPage=10000" \
    | jq -r '.hits[].comment_text' \
    | grep -o "—" \
    | wc -l
◧◩
159. svat+YQ[view] [source] [discussion] 2025-08-30 14:31:29
>>latexr+b4
Try it here (you may have to create a Google Cloud project, but you don't have to enable billing or start the free trial):

https://console.cloud.google.com/bigquery?p=bigquery-public-...

Click on the `+` (white over blue background) in the tab bar at the top that says "SQL query" on popup, and type the following (I use the GoogleSQL pipe syntax (https://cloud.google.com/bigquery/docs/reference/standard-sq... / >>41347188 ) below, but you can also use standard SQL if you prefer):

    FROM `bigquery-public-data.hacker_news.full` 
    |> WHERE type = 'comment' AND timestamp < '2022-11-30'
    |> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
    |> EXTEND with_em / total AS fraction_with_em
    |> ORDER BY fraction_with_em DESC
    |> WHERE total > 100 AND fraction_with_em > 0.1
(I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)

Edit: As you also asked about timestamps:

    FROM `bigquery-public-data.hacker_news.full`
    |> WHERE type = 'comment' AND timestamp < '2022-11-30'
    |> EXTEND text LIKE '%—%' AS has_em
    |> AGGREGATE
        COUNT(*) AS total,
        COUNTIF(has_em) AS with_em,
        MIN(timestamp) AS first_comment_timestamp,
        MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
        TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
        TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
      GROUP BY `by`
    |> EXTEND with_em / total AS fraction_with_em
    |> ORDER BY fraction_with_em DESC
    |> WHERE total > 100 AND fraction_with_em > 0.1
for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.
◧◩◪◨
161. weston+pR[view] [source] [discussion] 2025-08-30 14:35:59
>>Symbio+el
I actually tweeted like a month ago that I was the reason LLMs use em dashes so much lol: https://x.com/Westoncb/status/1961802304698671407
◧◩
168. Rendel+JW[view] [source] [discussion] 2025-08-30 15:14:57
>>chatma+sR
Me too, I had 11 hits for the `en`, 7 for "--", and only 4 for the `em` using the curl script:

>>45074990

Also see the relevant XKCD:

https://www.xkcd.com/3126/

◧◩
181. Adlopa+v51[view] [source] [discussion] 2025-08-30 16:30:12
>>bhicke+wv
The style in the UK – for professional writing, at least – has generally been ‘word en-dash word’. My understanding was that ‘wordem-dashword’ was a US style thing and I don’t think I’ve ever seen it used in a UK publication. (I suspect few non ‘writers’ know the difference between an en-dash and a hyphen and some publications also seem to be relaxed about it.)

So it was no surprise to me that ChatGPT used em dashes (I assume a US bias to its training data) and I immediately told it to stop using them (along with Title Case titles). (Source: professional writer for 30 years.)

https://www.theguardian.com/guardian-style-guide-d

188. dang+ew1[view] [source] 2025-08-30 20:02:59
>>tkgall+(OP)
v1 (the submitted URL) was https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo....

We've replaced it now with v2, for more complex analytical em dash explorations :) - see >>45075379 and >>45072635 .

◧◩
191. dang+Gx1[view] [source] [discussion] 2025-08-30 20:15:37
>>firest+JA
Generated comments and bots have never been allowed on HN (other than https://news.ycombinator.com/user?id=whoishiring of course), since long before ChatGPT. I've written about this a few times, e.g.:

>>33950747 (Dec 2022)

>>24189762 (Aug 2020)

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Whether to add it to the formal guidelines (https://news.ycombinator.com/newsguidelines.html) is a different question, of course. I'm reluctant to do that, partly because it arguably follows from what's there, partly because this is still a pretty fuzzy area that is rapidly evolving, and partly because the community is currently handling this issue pretty well. This may change of course.

One important thing to know: plenty of things not allowed in HN don't show up explicitly in the site guidelines. They are in no way a comprehensive list!

◧◩◪
203. nullc+YE1[view] [source] [discussion] 2025-08-30 21:19:35
>>manana+NX
I started using them in 2008 or so (I think) when I created a custom keymap to added greek characters and nbsp. I stopped using them after MacOS changed to make them automatically because then their use started to be an obvious sign of being an apple user (see also: https://www.jstor.org/stable/2096459).

Someone recently created some long list of my reddit comments using them as a farcical claim of having used ChatGPT to author many dozens of 2010 comments.

◧◩◪◨⬒⬓⬔⧯
210. andrew+rK1[view] [source] [discussion] 2025-08-30 22:12:33
>>d1sxey+BM
I configured my Markdown renderer to replace ` -- ` with " — ". Hopefully those narrow spaces make it through HN's rendering — it's much easier when your tooling can do the job for you.

https://github.com/andrewaylett/aylett.co.uk/blob/d338d35a3d...

◧◩◪◨
213. BlueTe+aN1[view] [source] [discussion] 2025-08-30 22:42:04
>>Chris_+Vb
See also :

https://norme-azerty.fr/en/

(Also provides access to the Greek alphabet.)

◧◩
215. JdeBP+cO1[view] [source] [discussion] 2025-08-30 22:50:52
>>Andrew+Tz1
In this particular case, the options for mobile 'phone keyboards are greater rather than fewer. The em dash is a first class citizen on the "writer" layouts in ThumbKey, for example.

* https://github.com/dessalines/thumb-key

◧◩◪◨⬒
254. card_z+tp2[view] [source] [discussion] 2025-08-31 07:00:33
>>JKCalh+oZ1
Not at all, no. Here's a few historical examples:

1903 edition of The Wizard of Oz — https://archive.org/details/newwizardofoz00baum/page/2/mode/...

A page from Life magazine, 1894 — https://archive.org/details/sim_life_1894-08-23_24_608/page/...

The Illustrated London News, 1843 — https://archive.org/details/illustrated-london-news-v002-184...

The em dash pretty much just joins the two glyphs together. It's supposed to look that way.

[go to top]