They’re growing too fast and it’s bursting the seams of the company. If there’s ever a correction in the AI industry, I think that will all quickly come back to bite them. It’s like Claude Code is vibe-operating the entire company.
Claude iOS app, Claude on the web (including Claude Code on the web) and Claude Code are some of the buggiest tools I have ever had to use on a daily basis. I’m including monstrosities like Altium and Solidworks and Vivado in the mix - software that actually does real shit constrained by the laws of physics rather than slinging basic JSON and strings around over HTTP.
It’s an utter embarrassment to the field of software engineering that they can’t even beat a single nine of reliability in their consumer facing products and if it wasn’t for the advantage Opus has over other models, they’d be dead in the water.
https://github.com/anthropics/claude-code/issues
Codex has less but they also had quite a few outages in December. And I don't think Codex is as popular as Claude Code but that could change.
(on the flip side, Codex seems like it's being SO efficient with the tokens it can be hard to understand its answers sometimes, it rarely includes files without you doing it manually, and often takes quite a few attempts to get the right answer because it's so strict what it's doing each iteration. But I never run out of quota!)
The advice I got when scouring the internets was primarily to close everything except the file you’re editing and maybe one reference file (before asking Claude anything). For added effect add something like 'Only use the currently open file. Do not read or reference any other files' to the prompt.
I don't have any hard facts to back this up, but I'm sure going to try it myself tomorrow (when my weekly cap is lifted ...).
I've run out of quota on my Pro plan so many times in the past 2-3 weeks. This seems to be a recent occurrence. And I'm not even that active. Just one project, execute in Plan > Develop > Test mode, just one terminal. That's it. I keep getting a quota reset every few hours.
What's happening @Anthropic ?? Anybody here who can answer??
Quota's basically a count of tokens, so if a new CC session starts with that relatively full, that could explain what's going on. Also, what language is this project in? If it's something noisy that uses up many tokens fast, even if you're using agents to preserve the context window in the main CC, those tokens still count against your quota so you'd still be hitting it awkwardly fast.
The only way Anthropic has two or three nines is in read only mode, but that’s be like measuring AWS using the console uptime while ignoring the actual control plane.
they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"
This fixed subscription plan with some hardly specified quotas looks like they want to extract extra money from these users who pay $200 and don't use that value, at the same time preventing other users from going over $200. Like I understand that it might work at scale, but just feels a bit not fair to everyone?
It's also a way to improve performance on the things their customers care about. I'm not paying Anthropic more than I do for car insurance every month because I want to pinch ~~pennies~~ tokens, I do it because I can finally offload a ton of tedious work on Opus 4.5 without hand holding it and reviewing every line.
The subscription is already such a great value over paying by the token, they've got plenty of space to find the right balance.
I've been using CC until I run out of credits and then switch to Cursor (my employer pays for both). I prefer Claude but I never hit any limits in Cursor.
It's the most commented issue on their GitHub and it's basically ignored by Anthropic. Title mentions Max, but commenters report it for other plans too.
lol
I work for hours and it never says anything. No clue why you’re hitting this.
$230 pro max.
We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.
The best you can get today with consumer hardware is something like devstral2-small(24B) or qwen-coder30b(underwhelming) or glm-4.7-flash (promising but buggy atm). And you'll still need beefy workstations ~5-10k.
If you want open-SotA you have to get hardware worth 80-100k to run the big boys (dsv3.2, glm4.7, minimax2.1, devstral2-123b, etc). It's ok for small office setups, but out of range for most local deployments (esp considering that the workstations need lots of power if you go 8x GPUs, even with something like 8x 6000pro @ 300w).
OpenCode is incentivized to make a good product that uses your token budget efficiently since it allows you to seamlessly switch between different models.
Anthropic as a model provider on the other hand, is incentivized to exhaust your token budget to keep you hooked. You'll be forced to wait when your usage limits are reached, or pay up for a higher plan if you can't wait to get your fix.
CC, specifically Opus 4.5, is an incredible tool, but Anthropic is handling its distribution the way a drug dealer would.
I've done RL training on small local models, and there's a strong correlation between length of response and accuracy. The more they churn tokens, the better the end result gets.
I actually think that the hyper-scalers would prefer to serve shorter answers. A token generated at 1k ctx length is cheaper to serve than one at 10k context, and way way cheaper than one at 100k context.
As someone with 2x RTX Pro 6000 and a 512GB M3 Ultra, I have yet to find these machines usable for "agentic" tasks. Sure, they can be great chat bots, but agentic work involves huge context sent to the system. That already rules out the Mac Studio because it lacks tensor cores and it's painfully slow to process even relatively large CLAUDE.md files, let alone a big project.
The RTX setup is much faster but can only support models ≤192GB, which severely limits its capabilities as you're limited to low Q GLM 4.7, GLM 4.7 Flash/Air/ GPT OSS 120b, etc.
Waiting for Anthropic to somehow blame this on users again. "We investigated, turns out the reason was users used it too much".
They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.
If you look at tool calls like MCP and what not you can see it gets ridiculous. Even though it's small for example calling pal MCP from the prompt is still burning tokens afaik. This is "nobody's" fault in this case really but you can see how the incentives are and we all need to think how to make this entire space more usable.
At least it did not turn against them physically... "get comfortable while I warm up the neurotoxin emitters"
i'd need to see real numbers. I can trigger a thinking model to generate hundreds of tokens and return a 3 word response (however many tokens that is), or switch to a non-thinking model of the same family that just gives the same result. I don't necessarily doubt your experience, i just haven't had that experience tuning SD, for example; which is also xformer based
I'm sure there's some math reason why longer context = more accuracy; but is that intrinsic to transformer-based LLMs? that is, per your thought that the 'scalers want shorter responses, do you think they are expending more effort to get shorter, equivalent accuracy responses; or, are they trying to find some other architecture or whatever to overcome the "limitations" of the current?
(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).
Controlling the coding tool absolutely is a major asset, and will be an even greater asset as the improvements in each model iteration makes it matter less which specific model you're using.
The Reasonable Man might think that an AI company OF ALL COMPANIES would be able to use AI to triage bug tickets and reproduce them, but no! They expect humans to keep wasting their own time reproducing, pinging tickets and correcting Claude when it makes mistakes.
Random example: https://github.com/anthropics/claude-code/issues/12358
First reply from Anthropic: "Found 3 possible duplicate issues: This issue will be automatically closed as a duplicate in 3 days."
User replies, two of the tickets are irrelevant, one didn't help.
Second reply: "This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes."
Every ticket I ever filed was auto-closed for inactivity. Complete waste of time. I won't bother filing bugs again.
I think they are just focusing on where the dough is.
Growth isn't a problem unless you dont actually pay for the cost of every user you subscribe. Uber, but for poorly profitable business models.
Upcoming Anthropic Press Release: By using Claude to direct users to existing bugs reports, we have reduced tickets requiring direct action by xx% and even reduced the rate of incoming tickets
That's limited to accessing the models through code/desktop/mobile.
And while I'm also using their subscriptions because of the cost savings vs direct access, having the subscription be considerably cheaper than the usage billing rings all sorts of alarm bells that it won't last.
API request method might have no cap, but they do cap Claude Code even on Max licenses, so easier to throttle as well if needed to control costs. Seems straightforward to me at any rate. Kinda like reserved instance vs. spot pricing models?
Which was nothing new itself of course. Conflicts of interest didn't begin with computers, or probably even writing.
You can stop most of this with
export DISABLE_NON_ESSENTIAL_MODEL_CALLS=1
And might as well disable telemetry, etc: export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
I also noticed every time you start CC, it sends off > 10k tokens preparing the different agents. So try not to close / re-open it too often.
> Since its founding in 2009, Uber has incurred a cumulative net loss of approximately $10.9 billion.
Now, Uber has become profitable, and will probably become a bit more profitable over time.
But except for speculators and probably a handful of early shareholders, Uber will have lost everyone else money for 20 years since its founding.
For comparison, Lyft, Didi, Grab, Bolt are in the same boat, most of them are barely turning profitable after 10+ years. Turns out taxis are a hard business, even when you ramp up the scale to 11. Though they might become profitable over the long term and we'll all get even worse and more abusive service, and probably more expensive than regular taxis would have been, 15-20 years from now.
I mean, we got some better mobile apps from taxi services, so there's that.
Oh, also a massive erosion of labor rights around the world.
I don't see the current investments turning a profit. Maybe the datacenters will, but most of AI is going to be washed out when somewhere, someone wants to take out their investment and the new Bernie Madoff can't find another sucker.
Yes I like my crack at 20, bucks a month, but the tech is going to need to improve quick to keep that up.