Lately it's gotten entirely flaky, where chat's will just stop working, simply ignoring new prompots, and otherwise go unresponsive. I wondered if maybe I'm pissing them off somehow like the author of this article did.
Now even worse is Claude seemingly has no real support channel. You get their AI bot, and that's about it. Eventually it will offer to put you through to a human, and then tell you that don't wait for them, they'll contact you via email. That email never comes after several attempts.
I'm assuming at this point any real support is all smoke and mirrors, meaning I'm paying for a service now that has become almost unusable, with absolutely NO means of support to fix it. I guess for all the cool tech, customer support is something they have not figured out.
I love Claude as it's an amazing tool, but when it starts to implode on itself that you actually require some out-of-box support, there is NONE to be had. Grok seems the only real alternative, and over my dead body would I use anything from "him".
They’re growing too fast and it’s bursting the seams of the company. If there’s ever a correction in the AI industry, I think that will all quickly come back to bite them. It’s like Claude Code is vibe-operating the entire company.
I had this start happening around August/September and by December or so I chose to cancel my subscription.
I haven't noticed this at work so I'm not sure if they're prioritizing certain seats or how that works.
Max plan and in average I use it ten times a day? Yeah, I am cancel. Guess they don't need me
What have you found it useful for? I'm curious about how people without software backgrounds work with it to build software.
And kudos for refusing to use anything from the guy who's OK with his platform proliferating generated CSAM.
Now I've been using it to build on my MCP server I now call endpoint-mcp-server (coming soon to github near you), which I've modularized with plugins, adding lots more features and a more versatile qt6 gui with advanced workspace panels and widgets.
At least I was until Claude started crapping the bed lately.
So yeah, codex kinda sucks to me. Maybe I'll try mistral.
Isn’t the future of support a series of automations and LLMs? I mean, have you considered that the AI bot is their tech support, and that it’s about to be everyone else’s approach too?
Claude iOS app, Claude on the web (including Claude Code on the web) and Claude Code are some of the buggiest tools I have ever had to use on a daily basis. I’m including monstrosities like Altium and Solidworks and Vivado in the mix - software that actually does real shit constrained by the laws of physics rather than slinging basic JSON and strings around over HTTP.
It’s an utter embarrassment to the field of software engineering that they can’t even beat a single nine of reliability in their consumer facing products and if it wasn’t for the advantage Opus has over other models, they’d be dead in the water.
This now lets me use my IT and business experience to apply toward making bespoke code for my own uses so far, such as firewall config parsers specialized for wacky vendor cli's and filling in gaps in automation when there are no good vendor solutions for a given task. I started building my mcp server enable me to use agents to interact with the outside world, such as invoking automation for firewalls, switches, routers, servers, even home automation ideally, and I've been successful so far in doing so, still not having to know any code.
I'm sure a real dev will find it to be a giant pile of crap in the end, but I've been doing like applying security frameworks, code style guidelines using ruff, and things like that to keep it from going too wonky, and actually working it up to a state I can call it as a 1.0 and plan to run a full audit cycle against it for security audits, performance testing, and whatever else I can to avoid it being entirely craptastic. If nothing else, it works for me, so others can take it or not once I put it out there.
Even being NOT a developer, I understand the need for applying best practices, and after watching a lot of really terrible developers adjacent to me over the years make a living, think I can offer a thing or two in avoiding that as it is.
I enjoy programming but it is not my interest and I can't justify the time required to get competent, so I let Claude and ChatGPT pick up my slack.
https://github.com/anthropics/claude-code/issues
Codex has less but they also had quite a few outages in December. And I don't think Codex is as popular as Claude Code but that could change.
(on the flip side, Codex seems like it's being SO efficient with the tokens it can be hard to understand its answers sometimes, it rarely includes files without you doing it manually, and often takes quite a few attempts to get the right answer because it's so strict what it's doing each iteration. But I never run out of quota!)
The advice I got when scouring the internets was primarily to close everything except the file you’re editing and maybe one reference file (before asking Claude anything). For added effect add something like 'Only use the currently open file. Do not read or reference any other files' to the prompt.
I don't have any hard facts to back this up, but I'm sure going to try it myself tomorrow (when my weekly cap is lifted ...).
I've run out of quota on my Pro plan so many times in the past 2-3 weeks. This seems to be a recent occurrence. And I'm not even that active. Just one project, execute in Plan > Develop > Test mode, just one terminal. That's it. I keep getting a quota reset every few hours.
What's happening @Anthropic ?? Anybody here who can answer??
When you have a conversation with an AI, in simple terms, when you type a new line and hit enter, the client sends the entire conversation to the LLM. It has always worked this way, and it's how "reasoning tokens" were first realized. you allow a client to "edit" the context, and the client deletes the hallucination, then says "Wait..." at the end of the context, and hits enter.
the LLM is tricked into thinking it's confused/wrong/unsure, and "reasons" more about that particular thing.
Quota's basically a count of tokens, so if a new CC session starts with that relatively full, that could explain what's going on. Also, what language is this project in? If it's something noisy that uses up many tokens fast, even if you're using agents to preserve the context window in the main CC, those tokens still count against your quota so you'd still be hitting it awkwardly fast.
The only way Anthropic has two or three nines is in read only mode, but that’s be like measuring AWS using the console uptime while ignoring the actual control plane.
they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"
This fixed subscription plan with some hardly specified quotas looks like they want to extract extra money from these users who pay $200 and don't use that value, at the same time preventing other users from going over $200. Like I understand that it might work at scale, but just feels a bit not fair to everyone?
It's also a way to improve performance on the things their customers care about. I'm not paying Anthropic more than I do for car insurance every month because I want to pinch ~~pennies~~ tokens, I do it because I can finally offload a ton of tedious work on Opus 4.5 without hand holding it and reviewing every line.
The subscription is already such a great value over paying by the token, they've got plenty of space to find the right balance.
I've been using CC until I run out of credits and then switch to Cursor (my employer pays for both). I prefer Claude but I never hit any limits in Cursor.
Main one is that it's ~3 times slower. This is the real dealbreaker, not quality. I can guarantee that if tomorrow we woke up and gpt-5.2-codex became the same speed as 4.5-opus without a change in quality, a huge number of people - not HNers but everyone price sensitive - would switch to Codex because it's so much cheaper per usage.
The second one is that it's a little worse at using tools, though 5.2-codex is pretty good at it.
The third is that its knowledge cutoff is further in the past than both Opus 4.5 and Gemini 3 that it's noticeable and annoying when you're working with more recent libraries. This is irrelevant if you're not using those.
For Gemini 3 Pro, it's the same first two reasons as Codex, though the tool calling gap is even much bigger.
Mistral is of course so far removed in quality that it's apples to oranges.
It's the most commented issue on their GitHub and it's basically ignored by Anthropic. Title mentions Max, but commenters report it for other plans too.
lol
I work for hours and it never says anything. No clue why you’re hitting this.
$230 pro max.
I'm not really sure LLMs have made it worse. They also haven't made it better, but it was already so awful that it just feels like a different flavor of awful.
I'm 99.9999% sure Gemini has a dynamic scaling system that will route you to smaller models when its overloaded, and that seems to be when it will still occasionally do things like tell you it edited some files without actually presenting the changes to you or go off on other strange tangents.
We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.
This happens to me more often than not both in the Claude Desktop and in web. It seems that longer the conversation goes the more likely it is to happen. Frustrating.
The best you can get today with consumer hardware is something like devstral2-small(24B) or qwen-coder30b(underwhelming) or glm-4.7-flash (promising but buggy atm). And you'll still need beefy workstations ~5-10k.
If you want open-SotA you have to get hardware worth 80-100k to run the big boys (dsv3.2, glm4.7, minimax2.1, devstral2-123b, etc). It's ok for small office setups, but out of range for most local deployments (esp considering that the workstations need lots of power if you go 8x GPUs, even with something like 8x 6000pro @ 300w).
OpenCode is incentivized to make a good product that uses your token budget efficiently since it allows you to seamlessly switch between different models.
Anthropic as a model provider on the other hand, is incentivized to exhaust your token budget to keep you hooked. You'll be forced to wait when your usage limits are reached, or pay up for a higher plan if you can't wait to get your fix.
CC, specifically Opus 4.5, is an incredible tool, but Anthropic is handling its distribution the way a drug dealer would.
I've done RL training on small local models, and there's a strong correlation between length of response and accuracy. The more they churn tokens, the better the end result gets.
I actually think that the hyper-scalers would prefer to serve shorter answers. A token generated at 1k ctx length is cheaper to serve than one at 10k context, and way way cheaper than one at 100k context.
As someone with 2x RTX Pro 6000 and a 512GB M3 Ultra, I have yet to find these machines usable for "agentic" tasks. Sure, they can be great chat bots, but agentic work involves huge context sent to the system. That already rules out the Mac Studio because it lacks tensor cores and it's painfully slow to process even relatively large CLAUDE.md files, let alone a big project.
The RTX setup is much faster but can only support models ≤192GB, which severely limits its capabilities as you're limited to low Q GLM 4.7, GLM 4.7 Flash/Air/ GPT OSS 120b, etc.
Waiting for Anthropic to somehow blame this on users again. "We investigated, turns out the reason was users used it too much".
They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.
If you look at tool calls like MCP and what not you can see it gets ridiculous. Even though it's small for example calling pal MCP from the prompt is still burning tokens afaik. This is "nobody's" fault in this case really but you can see how the incentives are and we all need to think how to make this entire space more usable.
At least it did not turn against them physically... "get comfortable while I warm up the neurotoxin emitters"
i'd need to see real numbers. I can trigger a thinking model to generate hundreds of tokens and return a 3 word response (however many tokens that is), or switch to a non-thinking model of the same family that just gives the same result. I don't necessarily doubt your experience, i just haven't had that experience tuning SD, for example; which is also xformer based
I'm sure there's some math reason why longer context = more accuracy; but is that intrinsic to transformer-based LLMs? that is, per your thought that the 'scalers want shorter responses, do you think they are expending more effort to get shorter, equivalent accuracy responses; or, are they trying to find some other architecture or whatever to overcome the "limitations" of the current?
(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).
Controlling the coding tool absolutely is a major asset, and will be an even greater asset as the improvements in each model iteration makes it matter less which specific model you're using.
The Reasonable Man might think that an AI company OF ALL COMPANIES would be able to use AI to triage bug tickets and reproduce them, but no! They expect humans to keep wasting their own time reproducing, pinging tickets and correcting Claude when it makes mistakes.
Random example: https://github.com/anthropics/claude-code/issues/12358
First reply from Anthropic: "Found 3 possible duplicate issues: This issue will be automatically closed as a duplicate in 3 days."
User replies, two of the tickets are irrelevant, one didn't help.
Second reply: "This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes."
Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.
It really leads me to wonder if it’s just my questions that are easy, or maybe the tone of the support requests that go unanswered is just completely different.
This made me chuckle.
I think they are just focusing on where the dough is.
That is, you and most of claude users arn't paying the actual cost. You're like a Uber customer a decade ago.
Growth isn't a problem unless you dont actually pay for the cost of every user you subscribe. Uber, but for poorly profitable business models.
Upcoming Anthropic Press Release: By using Claude to direct users to existing bugs reports, we have reduced tickets requiring direct action by xx% and even reduced the rate of incoming tickets
That's limited to accessing the models through code/desktop/mobile.
And while I'm also using their subscriptions because of the cost savings vs direct access, having the subscription be considerably cheaper than the usage billing rings all sorts of alarm bells that it won't last.
I can't be alone . Literally the worst customer experience I've ever had with the most expensive personal dot com subscription I've ever paid for.
Never again. When Google sets the customer service bar there are MAJOR issues.
API request method might have no cap, but they do cap Claude Code even on Max licenses, so easier to throttle as well if needed to control costs. Seems straightforward to me at any rate. Kinda like reserved instance vs. spot pricing models?
Which was nothing new itself of course. Conflicts of interest didn't begin with computers, or probably even writing.
You can stop most of this with
export DISABLE_NON_ESSENTIAL_MODEL_CALLS=1
And might as well disable telemetry, etc: export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
I also noticed every time you start CC, it sends off > 10k tokens preparing the different agents. So try not to close / re-open it too often.
It’s effectively a multi-tenant interface.
I also used individual acc but on corp e-mail, previously.
You could generate a new multi-use CC in your vibe-bank app (as Revolut), buy burner (e) sim for sms (5 eur in NL); then rewrite all requests at your mitm proxy to substitute a device id to one, not derived from your machine.
But same device id, same phone could be perfectly legitimate use case: you registered on corp e-mail then you changed your work place, using the same machine.
or you lost access to your e-mail (what a pity)
But to get good use of it, someone should compose proper requests to ClickHouse or whatever they use, for logs, build some logic to run as a service or web hook to detect duplicates with a pipeline to act on it.
And a good percentage of flags wouldn’t have been ToC violations.
That’s a bad vibe, can you imagine how much trial and error prompting it requires?..
They can’t vibe the way though the claude code bugs alone, on time!
My experience on Claude Max (still on it till end-of-month) has been frequent incomplete assignments and troubling decision making. I'll give you an example of each from yesterday.
1. Asked Claude to implement the features in a v2_features.md doc. It completed 8 of 10 but 3 incorrectly. I gave GPT-5.1-Codex-Max (high) the same tasks and it completed 10 of 10 but took perhaps 5-10x as long. Annoyingly, with LLM variability, I can't know for sure if I tried Claude again it would get it correct. The only thing I do know is that GTP-5.2 and 5.1 do a lot more "double-checking" both prior to executing and after.
2. I asked Claude to update a string being displayed in the UI of my app to display something else instead. The string is powered by a json config. Claude searched the code, somehow assumed it was being loaded by a db, did not find the json and opted to write code to overwrite whatever comes out of the 'db' (incorrect) to be what I asked for. This is... not desired behavior and the source of a category of hidden bugs that Claude has created in the past (other models do this as well but less often). Max took its time, found the source json file, and made the update in the correct place.
I can only "sit back and let an agent code" if I trust that it'll do the work right. I don't need it fast, I need it done right. It's already saving me hours where I can do other things in parallel. So, I don't get this argument.
That said, I have a Claude Max and OpenAI Pro subscription and use them both. I instead typically have Claude Opus work on UI and areas where I can visually confirm logic quickly (usually) and Codex in back-end code.
I often wonder how much the complexity of codebases affects how people discuss these models.
That has reliably worked for me with Gemini, Codex, and Opus. If you can get them to check-off features as they complete them, works even better (i.e, success criteria and an empty checkbox for them to mark off).
> Since its founding in 2009, Uber has incurred a cumulative net loss of approximately $10.9 billion.
Now, Uber has become profitable, and will probably become a bit more profitable over time.
But except for speculators and probably a handful of early shareholders, Uber will have lost everyone else money for 20 years since its founding.
For comparison, Lyft, Didi, Grab, Bolt are in the same boat, most of them are barely turning profitable after 10+ years. Turns out taxis are a hard business, even when you ramp up the scale to 11. Though they might become profitable over the long term and we'll all get even worse and more abusive service, and probably more expensive than regular taxis would have been, 15-20 years from now.
I mean, we got some better mobile apps from taxi services, so there's that.
Oh, also a massive erosion of labor rights around the world.
The whole thing I needed was to let AI reach out and touch things, be my hands essentially. This is why I built my tmux/worker system, I built out an xdg-portal integration to let it screen shot and soon interact with my desktop as a poc.
I could let it just start logging into devices and letting them modify configs, but it's pretty dumb about stuff like modifying fortigate configurations at times what it thinks it should do vs what the cli actually let's it do, so I have to proof much of it, but that's why I'm building it to be able to run ansible/terraform jobs instead using frameworks that are provided by the vendors for direct configurations to allow for atomic config changes as much as vendor implementations allow for.
I don't see the current investments turning a profit. Maybe the datacenters will, but most of AI is going to be washed out when somewhere, someone wants to take out their investment and the new Bernie Madoff can't find another sucker.
Yes I like my crack at 20, bucks a month, but the tech is going to need to improve quick to keep that up.