Tracking the Fake GitHub Star Black Market

>>kaeruc+(OP)
Goodhart's law: if you rely on a social signal to tell you what's good, you'll break that signal.

Very soon, the domain of bullshit will extend to actual text. We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments -- and you can get them to say "this GitHub repo is the best", or "this startup is the real deal". Won't that be fun?

>>perihe+ca
> We'll be able to buy HN comments by the thousand -- expertly wordsmithed, lucid AI comments

You're forgetting the millions of additional comments that will be written by humans to trick the AI into promoting their content.

Even worse, currently if you ask Chat GPT to write you some code, it will make up an API endpoint that doesn't exist and then make up a URL that doesn't exist where you can register for an API key. People are already registering these domains, and parking fake sites on them to scam people. ChatGPT is creating a huge market for creating fake companies to match the fake information it's generating.

The biggest risk may not be people using AI-generated comments to promote their own repos, but rather registering new repos to match the fake ones that the AI is already promoting.

>>Alex39+Dp
I feel like you’re overstating this as a long term issue. sure it’s a problem now, but realistically how long before code hallucinations are patched out?

>>permo-+Vq
An aside: what do people mean when they say “hallucinations” generally? Is it something more refined than just “wrong”?

As far as I can tell most people just use it as a shorthand for “wow that was weird” but there’s no difference as far as the model is concerned?

>>trippi+SI
Most people don’t understand the technology and maths at play in these systems. That’s normal, as is using familiar words that make that feel less awful. If you have a genuine interest in understanding how and why errant generated content emerges, it will take some study. There isn’t (in my opinion) a quick helpful answer.

>>mlhpdx+AV
I genuinely want to understand whether there’s a meaningful difference between non-hallucinatory and hallucinatory content generation other than “real world correctness”.

>>trippi+252
I’m far from an expert but as I understand it the reference point isn’t so much the “real world” as it is the training data. If the model generates a strongly weighted association that isn’t in the data, and shouldn’t exist perhaps at all. I’d prefer a word like “superstition”, it seems more relatable.

zlacker