zlacker

The real gut-punch for this is a reminder how far behind most engineers are in this race. With web 1.0 and web 2.0 at least you could rent a cheap VPS for $10/month and try out some stuff. There is almost no universe where a couple of guys in their garage are getting access to 1000+ H100s with a capital cost in the multiple millions. Even renting at that scale is $4k/hour. That is going to add up quickly.

I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous. Even the tiny corp's tinybox [1] is $15k and I don't know how much actual work one could get done on it.

If the majority of startups are just "wrappers around OpenAI (et al.)" the reason is pretty obvious.

1. https://tinygrad.org/

replies(7): >>tedivm+J2 >>latchk+W5 >>ed+nn >>dereal+sq >>luckyt+Uv >>sbierw+Lw >>brianj+DA1

>>zoogen+(OP)
I'd argue that you really don't need 1000+ H100s to test things out and make a viable product.

When I was at Rad AI we managed just fine. We took a big chunk of our seed round and used it to purchase our own cluster, which we setup at Colovore in Santa Clara. We had dozens, not hundreds, of GPUs and it set us back about half a million.

The one thing I can't stress enough- do not rent these machines. For the cost of renting a machine from AWS for 8 months you can own one of these machines and cover all of the datacenter costs- this basically makes it "free" from the eight month to three year mark. Once we decoupled our training from cloud prices we were able to do a lot more training and research. Maintenance of the machines is surprisingly easy, and they keep their value too since there's such a high demand for them.

I'd also argue that you don't need the H100s to get started. Most of our initial work was on much cheaper GPUs, with the A100s we purchased being reserved for training production models rapidly. What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves (rather than just compensating with more data and training). That was what really made the difference for Rad AI.

replies(4): >>latchk+T4 >>zoogen+g6 >>somsak+K7 >>wing-_+Ib

>>tedivm+J2
Your (ex)-company literally has the name AI in it, so yea, it makes sense to buy compute, not rent.

That said, a lot of other businesses don't want to take on the capex, but they do need to train some models... and those models can't run on just a half a million worth of hardware. In that case, someone else is going to have to do it for you.

It works both ways and there are no absolutes here.

replies(1): >>tedivm+J5

>>latchk+T4
I've found most larger companies are more concerned about opex than capex. Large companies aren't going to have much of an issue there.

My response was more for these folks the OP mentioned-

> There is almost no universe where a couple of guys in their garage are getting access to 1000+ H100s with a capital cost in the multiple millions.

I'm pointing out that this isn't true. I was the founding engineer at Rad AI- we had four people when we started. We managed to build LLMs that are in production today. If you've had a CT, MRI, or XRay in the last year there's a real chance your results were reviewed by the Rad AI models.

My point is simply that people are really overestimating the amount of hardware actually needed, as well as the costs to use that hardware. There absolutely is a space for people to jump in and build out LLM companies right now, and the don't need to build a datacenter or raise nine figures of funds to do it.

replies(1): >>latchk+cd

>>zoogen+(OP)
I wouldn't spend a single dollar on George.

The guy could wake up tomorrow and decide he didn't feel like developing this stuff any more and you're going to be stuck with a dead project. In fact, he already did that once when he found a bug in the driver.

People RIP on Google for killing projects all the time and now you want to bet your business on a guy who livestreams in front of a pirate flag? Come on.

Never mind that even in my own personal dealings with him, he's been a total dick and I'm far from the only person who says that.

replies(1): >>monolo+Gp

>>tedivm+J2
I did choose the 1000+ H100 case as the outlier. But even what you are describing, $500k for dozens of A100s or whatever entry level looks like these days, is a far step away from the $10/month for previous generations. This suggests we will live in a world where VCs have even more power than they did before.

Even if I validate my idea on a RTX 4090, the path to scaling any idea gets expensive fast. 15k to move up to something like a tinybox (probably capable of running 65B model but is it realistic to train or fine-tune 65B model?). Then maybe $100k in cloud costs. Then maybe $500k in research sized cluster. Then $10m+ for enterprise grade. I don't see that kind of ramp happening outside well-financed VC startups.

replies(1): >>tedivm+Qb

>>tedivm+J2
keep in mind that if you're a bigger customer, aws discounts are huge (often >50% off of sticker). if the payback were 16 months instead of 8, it becomes a much tougher sell (esp with GPUs improving rapidly over time)

replies(1): >>tedivm+oa

>>somsak+K7
AWS does not offer great discounts on the GPUs at this point, as they don't have nearly enough of them to meet demand. I'm no longer at a startup, and have worked at a couple of larger companies.

That said I'm mostly responding to the "two guys in a garage" comment with this. Larger companies are going to have different needs altogether.

>>tedivm+J2
>What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves

Serious question: Where does an aspiring AI/ML dev get that expertise. From looking at OMCS I'm not convinced even a doctorate from Georgia Tech would get me the background I need...

replies(1): >>tedivm+qc

>>zoogen+g6
Not every company is OpenAI. What OpenAI is trying to do is solve a generic problem, and that requires their models to be huge. There's a ton of space for specialized models though, and those specialized ones still outperform the more general one. Startups can focus on smaller problems than "solve everything, but with kind of low quality". Solving one specific problem well can bring in a lot of customers.

To put it another way, the $10m+ for enterprise grade just seems wrong to me. It's more like $10m+ for mediocre responses to a lot of things. Rad AI didn't spend $10m on their models, but they absolutely are professional grade and are in use today.

I also think it's important to consider capital costs that are a one time thing, versus long term costs. Once you purchase that $10m cluster you have that forever, not just for a single model, and because of the GPU scarcity right now that cluster isn't losing value nearly as rapidly as most hardware does. If you purchase a $500k cluster, use it for three years, and then sell it for $400k you're really not doing all that bad.

replies(1): >>zoogen+mk

>>wing-_+Ib
Everyone I've met with these skills has either a masters degree or a PhD. I do know several people who got their PhD earlier in their careers who are really into AI now, but they had the foundational math skills to keep current as new papers were published.

I can't tell you if one program is better than another, as it's a bit out of my area of expertise.

replies(1): >>KRAKRI+Lf

>>tedivm+J5
> I've found most larger companies are more concerned about opex than capex.

Another absolute. I try to not be so focused on single points of input like that.

From what I can tell, sitting on the other side of the wall (GPU provider), there is metric tons of demand from all sides.

>>tedivm+qc
The foundational math skills are linear algebra, calculus, and statistics. They are bog standard math anyone with a university education in the sciences should be comfortable with. The only math that's possibly more obscure are the higher level statistics tricks like graphical models, but those can be picked up from a textbook.

replies(1): >>j16sdi+pRi

>>tedivm+Qb
That is a decent point, in that it reminds me of a startup that posted on HN a couple of months ago that did background removal from images using AI models. They claimed this was a mature market now where bulk pricing was bringing the cost down to some marginal over the price of compute. I suspect those kinds of models are comparatively small compared to the general intelligence LLMs we are seeing and might reasonably be trainable on 250k clusters. There is likely a universe of low-hanging fruit for those kinds of problems and those who are capable. That is definitely not a market I would want to compete in since once a particular problem is sufficiently solved then it becomes a race to the bottom in cost.

But my (totally amateur and outsider informed) intuition is that the innovative work will still happen at the edge of model size for the next few years. We literally just got the breakthroughs in LLM capabilities around the 30b parameter mark. These capabilities seemed to accelerate with larger models. There appears to be a gulf in the capabilities from 7B to 70B parameter LLMs that makes me not want to bother with LLMs at all unless I can get that higher level performance of the massive models. But even if I did want to play around at 30B or whatever I have to pay 15k-100k.

I think we are just in a weird spot right now where the useful model sizes for a large class of potential applications is at a price point that many engineers will find prohibitively expensive to experiment with on their own.

replies(1): >>tedivm+5s

>>zoogen+(OP)
There was a period in the 90’s when it was necessary to raise money and assemble a team just to make web products. Frameworks didn’t exist, we didn’t have the patterns we do now, everything was built for the first time and as such was 100% custom. The time of $10 VPS’s came much later.

>>latchk+W5
what are you talking about. George has been working on comma.ai for years. It's shipping actual products and has revenue.

We need more people who "think different" and push back against the status quo instead of carrying out ad hominem attacks on public forums.

replies(2): >>latchk+Nw >>Improb+DQ

>>zoogen+(OP)
You're comparing apples to oranges.

Should I complain that to drill oil I need hundreds of millions of dollars to even start?

Your VPS example was doing barely any computation. You're conflating web 1.0 and web 2.0 with neural networks and they are nothing alike in terms of FLOPS.

>>zoogen+mk
For the first example, I think that was just due to the specific problem being solved. I can tell you there are a ton of places that aren't yet "solved" yet, and that aren't trivial to solve either. One thing we haven't discussed in this conversation is the data itself, and cleaning up that data. Rad AI probably spent more money on staff cleaning up data than they did on model training. This isn't trivial- for medical grade stuff you need physician data scientists to help out, and that field has only really existed since 2018 (which was the first time the title was listed in any job listing). The reason background removal is "mature" is because it's not that hard of a problem and there's a good amount of data out there.

I also think that you're way off on the second point. I'm not saying that to be rude, because it does seem to be a popular opinion. It's just that if you read papers most people publishing aren't using giant clusters. There's a whole field of people who are finding ways to shrink models down. Once we understand the models we can also optimize them. You see this happen in all sorts of fields beyond "general intelligence"- tasks that used to take entire clusters to run can work on your cell phone now. Optimization is important not just because it opens up more people to work on things, but also because it drops down the costs that these big companies are paying.

Lets think about this in another direction. ML models are based off of how the brain is thought to work. The human brain is capable of quite a bit, but it uses very little power: about 10 watts. It is clearly better optimized than ML models are. That means there's a huge gap we still have to fit on efficiency.

replies(1): >>zoogen+wB

>>zoogen+(OP)
> I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous

It's not that bad; there are lots of things you can do with a hobbyist budget. For example, a consumer GPU with 12 or 24 GB VRAM costs $1000-2000 and can let you run many models and do fine-tuning on them. The next step up, for fine-tuning larger models, is to rent an instance on vast.ai or something similar for a few hours with a 4-8 GPU instance, which will set you back maybe $200—still within the range of a hobbyist budget. Many academic fine-tuning efforts, like Stanford Alpaca, cost a few hundred dollars to fine-tune. It's only when you want to pretrain a large language model from scratch that you need thousands of GPUs and millions in funding.

replies(1): >>zoogen+FE

>>zoogen+(OP)
1) This is just what happens when an industry matures. If you want to start a new company to drill oil wells, you're going to spend a lot of money. Same if you're starting a new railroad, a new car company, a new movie studio...

2) Speaking of VPSes and web 1.0 in the same breath is a little anachronistic. Servers had much lower capacity in 1999, and cost much more. Sun was a billion dollar company during the bubble because it was selling tens of thousands of unix servers to startups in order to handle the traffic load. Google got a lot of press because they were the oddballs who ran on commodity x86 hardware.

>>monolo+Gp
It is certainly possible to "think different" and not be a wannabe steve jobs.

>>tedivm+5s
> It's just that if you read papers most people publishing aren't using giant clusters.

There is a massive difference between what is necessary to prove a scientific thesis and what is necessary to run a profitable business. And what do you mean "giant clusters" in this context. What is the average size of the clusters used in ground breaking papers and what is their cost? Is that cost a reasonable amount for a boot-strapped startup to experiment with or are we getting into the territory where only VC backed ventures can even experiment?

> There's a whole field of people who are finding ways to shrink models down

Of course the cost of running models is going to come down. The literal article we are responding to is a major part of that equation. You seem to be making arguments about how the future will be as support for an argument against how the present is.

Presently, hardware costs are insanely high and not coming down soon (as per the article). Presently, useful models for a large set of potential applications require significant cluster sizes. That makes it presently difficult for many engineers to jump in and play around.

My opinion is that the cost has to come down to the point that hobbiest engineers can play with the high-quality LLMs at the model sizes that are most useful. That doesn't imply that there are no model sizes for other use-cases that can't be developed today. It doesn't imply that the price of the hardware and size of the models will not fall. It just implies that dreaming of a business based around a capable LLM means your realistic present day costs are in the 10's of thousands at a minimum.

>>luckyt+Uv
The question is what happens once you want to transition from your RTX 4090 to a business. It might be cute to generate 10 tokens per second or whatever you can get with whatever model you have to delight your family and friends. But once you want to scale that out into a genuine product - you're up against the ramp. Even a modest inference rig is going to cost a chunk of change in the hundreds of thousands. You have no real way to validate your business model without making some big investment.

Of course, it is the businesses that find a way to make this work that will succeed. It isn't an impossible problem, it is just a seemingly difficult one for now. That is why I mentioned VC funding as appearing to have more leverage over this market than previous ones. If you can find someone to foot the 250k+ cost (e.g. AI Grant [1] where they offer 250k cash and 350k cloud compute) then you might have a chance.

1. https://aigrant.org/

replies(1): >>pas+AG1

>>monolo+Gp
They're talking about the meltdown he had on stream [1] (in front of the mentioned pirate flag), that ended with him saying he'd stop using AMD hardware [2]. He recanted this two weeks after talking with AMD [3].

Maybe he'll succeed, but this definitely doesn't scream stability to me. I'd be wary of investing money into his ventures (but then I'm not a VC, so what do I know).

[1] https://www.youtube.com/watch?v=Mr0rWJhv9jU

[2] https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...

[3] https://twitter.com/realGeorgeHotz/status/166980346408248934...

replies(1): >>rohit8+b01

>>Improb+DQ
If his grievances actually got through to the AMD CEO, I'd say he's already had a bigger impact than most.

replies(1): >>latchk+u21

>>rohit8+b01
It is good press at a time when AMD is very visibly looking like they lost out on the first round of AI. His project is open source, AMD likes that and can benefit from the free developer feedback. I'd say this is less about George and more about Lisa being the insanely smart and talented business person that she is.

replies(1): >>rohit8+qW2

>>zoogen+(OP)
I want a tinybox so bad.

>>zoogen+FE
You can use a lower performance model, you can use one LLM-as-a-service, etc.

If you want to compete on the actual model, then yes, this is not the time for garage shops.

If your business plan is so good, then it will work without H100 "cards" too, or if it's even better and you know it'll print money with H100 cards then great, just wait.

>>latchk+u21
Lisa is smart but this only got to her because George got publicly upset.

replies(1): >>latchk+CW3

>>rohit8+qW2
He emailed her after he had his meltdown. It wasn't like she saw the meltdown and wrote to him. He is nowhere on her radar.

By the way, I also got a bug in the AMD drivers fixed too [0]. That bug fix enabled me to fully automate the performance tuning of 150,000 AMD gpus that I was managing. This is something nobody else had done before, it was impossible to do without this bug fix. We were doing this by hand before! The only bummer was that I had to upgrade the kernel on 12k+ systems... that took a while.

I went through the proper channels and they fixed it in a week, no need for a public meltdown or email to Lisa crying for help.

[0] https://patchwork.freedesktop.org/patch/470297/?series=99134...

>>KRAKRI+Lf
I think the skill is on how to ingest science paper, choosing which paper worth time to read, etc...

These skills are developed when they study phd

replies(1): >>Michae+HLL

>>j16sdi+pRi
Bill Gates knew how to do that and he dropped out not even halfway into his undergrad.