zlacker

I was banned as well, out of the blue suddenly and without warning. I believe it was because I was either doing something like what OP was doing AND/OR using the allowed limits to their fullest extent.

It completely blew me away and I felt suddenly so betrayed. I was paying $200/mo to fully utilize a service they offered and then without warning I apparently did something wrong and had no recourse. No one to ask, no one to talk to.

My advice is to be extremely wary of Anthropic. They paint themselves as the underdog/good guys, but they are just as faceless as the rest of them.

Oh, and have a backup workflow. Find / test / use other LLMs and providers. Don't become dependent on a single provider.

replies(3): >>777777+v9 >>Camper+Pd >>kachap+Ky

>>TyrunD+(OP)
Can you elaborate on "using the allowed limits to their fullest extent?"

replies(1): >>xattt+gr

>>TyrunD+(OP)
Oh, and have a backup workflow. Find / test / use other LLMs and providers. Don't become dependent on a single provider.

I have pro subscriptions to all three major providers, and have been planning to drop one eventually. Anthropic may end up making the decision for me, it sounds like, even though (or perhaps because) I've been using Claude CLI more than the others lately.

What I'd really like to do is put a machine in the back room that can do 100 tts or more with the latest, greatest Deepseek or Kimi model at full native quantization. That's the only way to avoid being held hostage by the big 3 labs and their captive government, which I'm guessing will react to the next big Chinese model release by prohibiting its deployment by any US hosting providers.

Unfortunately it will cost about $200K to do this locally. The smart money says (but doesn't act like) the "AI bubble" will pop soon. If the bubble pops, that hardware will be worth 20 cents on the dollar if I'm lucky, making such an investment seem reckless. And if the bubble doesn't pop, then it will probably cost $400K next year.

First-world problems, I guess...

replies(2): >>pstuar+Bk >>LTL_FT+3I

>>Camper+Pd
I'm hoping that advances in MoE and other improvements in LLMs will translate to allowing self-hosting to cover a good chunk of developer needs, with extending out to providers when it needs more horsepower.

In effect like traditional on-prem services that have cloud services to handle peak loads...

The tech is still relatively new and there's bound to be changes that can enable this -- just like how we went from 8088 to 386 (six years later). That was a ground breaking change and while Moore's law may be dead I expect the cost to drop significantly over time.

One can dream at least.

>>777777+v9
Likely using timers/alarms to keep track of when jobs can resume.

replies(1): >>oblio+Np1

>>TyrunD+(OP)
this is just wrong, I have several 3x x20 accounts running full tilt hitting limits every week I did get few accounts banned, but that's because my proxy was leaking nginx headers.

replies(1): >>oblio+Rp1

>>Camper+Pd
I mean, you could put together a cluster of dgx sparks (8 of them) and hit 100tps with high concurrency:

https://forums.developer.nvidia.com/t/6x-spark-setup/354399/...

Or a single user at about 10tps.

This is probably around $30k if you go with the 1tb models.

replies(2): >>Camper+tO >>bayind+5P

>>LTL_FT+3I
10 tps, maybe, given the Spark's hobbled memory bandwidth. That's too slow, though. That thread is all about training, which is more compute-intensive.

A couple of DGX Stations are more likely to work well for what I have in mind. But at this point, I'd be pleasantly surprised if those ever ship. If they do, they will be more like $200K each than $100K.

replies(1): >>LTL_FT+po1

>>LTL_FT+3I
I'd love more people to try to enable local LLMs at the speeds they wish to use and face the music of the fans, heat and power bills.

When people talk about the cost and requirements of AI, other people can't grasp what they are talking about.

>>Camper+tO
I linked results where the user ran Kimi k2 across his 8-node cluster. Inference results are listed for 1,10,100 concurrent requests.

Edit to add:

Yeah, those stations with the GB300 look more along the lines of what I would want as well but I agree, they’re probably way beyond my reach.

>>xattt+gr
That sounds benign but I'm guessing all of that was in a 24/7 loop and probably running in parallel a bunch of times.

It's like the "unlimited Gmail storage" that's now stuck at 15GB since 2012, despite the cost of storage probably going down probably 20x since 2012.

Companies launch products with deceptive marketing and features they can't possibly support and then when they get called on their bluff, they have to fall back to realistic terms and conditions.

>>kachap+Ky
> but that's because my proxy was leaking nginx headers.

What do you mean with this?

replies(1): >>kachap+Hg2

>>oblio+Rp1
utilizing their anti-compeititve pricing to my advantage, proxy bypasses their protections