Cursor IDE support hallucinates lockout policy, causes user cancellations

>>scared+(OP)
There is a certain amount of irony that people try really hard to say that hallucinations are not a big problem anymore and then a company that would benefit from that narrative gets directly hurt by it.

Which of course they are going to try to brush it all away. Better than admitting that this problem very much still exists and isn’t going away anytime soon.

>>nerdjo+A84
https://www.anthropic.com/research/tracing-thoughts-language...

The section about hallucinations is deeply relevant.

Namely, Claude sometimes provides a plausible but incorrect chain-of-thought reasoning when its “true” computational path isn’t available. The model genuinely believes it’s giving a correct reasoning chain, but the interpretability microscope reveals it is constructing symbolic arguments backward from a conclusion.

https://en.wikipedia.org/wiki/On_Bullshit

This empirically confirms the “theory of bullshit” as a category distinct from lying. It suggests that “truth” emerges secondarily to symbolic coherence and plausibility.

This means knowledge itself is fundamentally symbolic-social, not merely correspondence to external fact.

Knowledge emerges from symbolic coherence, linguistic agreement, and social plausibility rather than purely from logical coherence or factual correctness.

>>lyngui+Y75
I haven’t used Cursor yet. Some colleagues have and seemed happy. I’ve had GitHub Copilot on for what feels like a couple years, a few days ago VS Code was extended to provide an agentic workflow, MCP, bring-your-own-key, it interprets instructions in a codebase. But the UX and the outputs are bad in over 3/4 of cases. It’s a nuisance to me. It injects bad code even though it has the full context. Is Cursor genuinely any better?

To me it feels like people that benefit from or at least enjoy that sort of assistance and I solve vastly different problems and code very differently.

I’ve done exhausting code reviews on juniors’ and middles’ PRs but what I’ve been feeling lately is that I’m reviewing changes introduced by a very naive poster. It doesn’t even type-check. Regardless of whether it’s Claude 3.7, o1, o3-mini, or a few models from Hugging Face.

I don’t understand how people find that useful. Yesterday I literally wasted half an hour for a test suite setup a colleague of mine introduced to the codebase that wasn’t good, and I tried delegating that fix to several of the Copilot models. All of them missed the point, some even introduced security vulnerabilities in the process invalidating JWT validation, I tried “vide coding” it till it works, until I gave up in frustration and just used an ordinary search engine, which led me to the docs, in which I immediately found the right knob. I reverted all that crap and did the simple and correct thing. So my conclusion was simple: vibe coding and LLMs made the codebase unnecessarily more complicated and wasted my time. How on earth do people code whole apps with that?

zlacker