Ask HN: Should HN ban ChatGPT/generated responses?

>>djtrip+(OP)
Dr. Casey Fiesler recently tweeted

> The @huggingface GPT detector works very well on ChatGPT-created text. I ran 5 student essays and 5 ChatGPT essays for the same prompt through it, and it was correct every time with >99.9% confidence.

How about adding a %human/%GPT statistic to posts and comments?

>>pjmorr+J4
I would argue that this test isn't particularly informative. Given 5 attempts and 5 successes, even though the point estimate of accuracy is 1, the 95% CI ranges from 0.48 to 1:

    > binom.test(5,5,0.5)

     Exact binomial test

    data:  5 and 5
    number of successes = 5, number of trials = 5, p-value = 0.0625
    alternative hypothesis: true probability of success is not equal to 0.5
    95 percent confidence interval:
     0.4781762 1.0000000

In other words, we don't have enough data in that small sample to reject the possibility that the model is 50% accurate, much less 99.9% accurate.

>>carboc+8n1
I think the message was claiming something else, specifically that each classification was given a score of how confident the model was in the answer and the answers were given 99.9%+ in those cases.

See the app: https://huggingface.co/openai-detector/ - it gives a response as % chance it's genetic or chat bot.

>>virapt+0p1
With 5 samples, we have no way to assess whether the app’s 99.9% self-assessment is remotely well calibrated. (As noted above, 5/5 is also consistent with a model that is right 50% of the time.)

zlacker