AFAICT it consists of a bunch of anecdotes by thought-leader types followed by a corny-ass song.
HN, you can do better. I believe in you. Try harder.
Ironically, Jensen Huang did something like this many years ago. In an interview for his alma mater, he tells the story about how he had bet the existence of Nvidia on the successful usage of a new circuit simulation computer from a random startup that allowed Nvidia to complete the design of their chip.
Successful startups are successful because they do exactly that. Successfully.
>"HN, you can do better"
- indeed.
On the other hand if you can't get H100s then nothing to lose!
Lumi: https://www.lumi-supercomputer.eu/lumis-full-system-architec...
If you know what your application would be and have the $300 million custom chips may be way more wise. Something you'd only get if you make things in-house/at startups.
PS: The song is also very good.
Perhaps they think using GPUs for computation is a passing fad? They hate money? Their product is actually terrible and they dont want to get found out (that one might be true for intel)?
[1]_https://www.reddit.com/r/Amd/comments/140uct5/geohot_giving_...
Of course they could do with more GPUs. If you gave them 1,000x their current number, they'd think up ways of utilising all of them, and have the same demand for more. This is how it should be.
Like... how you feel when you use them? (-:
Also:
> For visualization workloads LUMI has 64 Nvidia A40 GPUs.
I can't imagine the GPU would cost more than $100 at scale, unless they have extremely poor yields.
H100s have 80GB of 5120-bit HBM with SXM NVLink for 8-at-a-time in a rack.
HUGE difference in bandwidth when doing anything where the inferring the model needs to be spread over multiple GPUs, which all LLM's are. And even more of a difference when training is in play.
They haven’t had gfx card driver issues in years now and people still say “oh I don’t want AMD cos their drivers don’t work”.
Yes, much needed.
Here's a list of possible "monopoly breakers" I'm going to write about in another post - some of these are things people are using today, some are available but don't have much user adoption, some are technically available but very hard to purchase or rent/use, and some aren't yet available:
* Software: OpenAI's Triton (you might've noticed it mentioned in some of "TheBloke" model releases and as an option in the oobabooga text-generation-webui), Modular's Mojo (on top of MLIR), OctoML (from the creators of TVM), geohot's tiny corp, CUDA porting efforts, PyTorch as a way of reducing reliance on CUDA
* Hardware: TPUs, Amazon Inferentia, Cloud companies working on chips (Microsoft Project Athena, AWS Tranium, TPU v5), chip startups (Cerebras, Tenstorrent), AMD's MI300A and MI300X, Tesla Dojo and D1, Meta's MTIA, Habana Gaudi, LLM ASICs, [+ Moore Threads]
The A/H100 with infiniband are still the most common request for startups doing LLM training though.
The current angle I'm thinking about for the post would be to actually use them all. Take Llama 2, and see which software and hardware approaches we can get inference working on (would leave training to a follow-up post), write about how much of a hassle it is (to get access/to purchase/to rent, and to get running), and what the inference speed is like. That might be too ambitious though, I could see it taking a while. If any freelancers want to help me research and write this, email is in my profile. No points for companies that talk a big game but don't have a product that can actually be purchased/used, I think - they'd be relegated to a "things to watch for in future" section.
How would that even play out then? Is everyone in the world simply stuck waiting for Nvidia's capacity to meet demand?
There is obviously a huge incentive now to be competitive here, but is it realistic that anyone else might meaningfully meet demand before Nvidia can?
I wonder how much a TPU company would be worth if Google spun it off and it started selling them?
These TPUs obviously aren't the ones deployed in Google's datacenters. That being said, I'm not sure how practical it would be to deploy TPUs elsewhere.
Also, Amazon's Infinera (sp?) gets a fair bit of usage in industrial settings. It's just that these nvidia GPUs offer an amazing breeding ground for research and cutting edge work.
Would be good to have more on enterprise companies like Pepsi, BMW, Bentley, Lowes, as well as other HPC uses like oil and gas, others in manufacturing, others in automotive, weather forecasting.
The H100s are actually very good for inference..
> Elon Musk says that “GPUs are at this point considerably harder to get than drugs.”
Does Elon have a hard time getting drugs?
You can buy all the GPUs you can possibly find. If you want to deploy 10MW+, it just doesn't exist.
These things need redundant power/cooling, real data centers, and can't just be put into chicken farms. Anything less than 10MW isn't enough compute now either for large scale training and you can't spread it across data centers because all the data needs to be in one place.
So yea... good luck.
As for driver: https://www.tomshardware.com/news/adrenalin-23-7-2-marks-ret...
Texas has a lot of wind. At this scale, it is mostly grid power anyway. Grid is a mixture of everything. Oh and solar has this pesky issue of not working in the evening, so then you have another problem... storage. ;-)
I should add... you want backup generators for your UPS systems? Those are a 4.5 years backlog.
And yes obviously renewables won't cover 24/7 but if I have a choice between no data center and a 60% time data center.. give me the 60%.
When you have 100x the margin on a product you only need to sell 1% as many to almost double profit. 10% would be more than 10x profit.
But it is surprisingly hard to find investors who are willing to wait, even though we know that this stuff is going to last for decades.
username@gmail if anyone would like to have real conversations about this.
HN and casual media are heavily attention biased towards big money and big datacenters, yes.
I hope we find a path to at least fine-tuning medium sized models for prices that aren't outrageous. Even the tiny corp's tinybox [1] is $15k and I don't know how much actual work one could get done on it.
If the majority of startups are just "wrappers around OpenAI (et al.)" the reason is pretty obvious.
Let's click 6 wind turbines down off the coast, shove our H100s underneath them for water cooling, and ah...separate the water/oxygen into tanks for hydrogen power when it ain't blowy no more? Or something? Someone help me out here.
When I was at Rad AI we managed just fine. We took a big chunk of our seed round and used it to purchase our own cluster, which we setup at Colovore in Santa Clara. We had dozens, not hundreds, of GPUs and it set us back about half a million.
The one thing I can't stress enough- do not rent these machines. For the cost of renting a machine from AWS for 8 months you can own one of these machines and cover all of the datacenter costs- this basically makes it "free" from the eight month to three year mark. Once we decoupled our training from cloud prices we were able to do a lot more training and research. Maintenance of the machines is surprisingly easy, and they keep their value too since there's such a high demand for them.
I'd also argue that you don't need the H100s to get started. Most of our initial work was on much cheaper GPUs, with the A100s we purchased being reserved for training production models rapidly. What you need, and is far harder to get, is researchers who actually understand the models so they can improve the models themselves (rather than just compensating with more data and training). That was what really made the difference for Rad AI.
I highly recommend Colovore in Santa Clara. They got purchased by DR not too long ago, but are run independently as far as I can tell. Their team is great, and they have the highest power density per rack out of anyone. I had absolutely no problem setting up a DGX cluster there.
That said, a lot of other businesses don't want to take on the capex, but they do need to train some models... and those models can't run on just a half a million worth of hardware. In that case, someone else is going to have to do it for you.
It works both ways and there are no absolutes here.
(to their credit AMD is also getting serious lately, they put out a listing for like 30 ROCm developers a few weeks after geohot's meltdown, and they were in the process of doing a Windows release (previously linux-only) of ROCm with support for consumer gaming GPUs at the time as well. The message seems to have finally been received, it's a perennial topic here and elsewhere and with the obvious shower of money happening, maybe management was finally receptive to the idea that they needed to step it up.)
My response was more for these folks the OP mentioned-
> There is almost no universe where a couple of guys in their garage are getting access to 1000+ H100s with a capital cost in the multiple millions.
I'm pointing out that this isn't true. I was the founding engineer at Rad AI- we had four people when we started. We managed to build LLMs that are in production today. If you've had a CT, MRI, or XRay in the last year there's a real chance your results were reviewed by the Rad AI models.
My point is simply that people are really overestimating the amount of hardware actually needed, as well as the costs to use that hardware. There absolutely is a space for people to jump in and build out LLM companies right now, and the don't need to build a datacenter or raise nine figures of funds to do it.
The guy could wake up tomorrow and decide he didn't feel like developing this stuff any more and you're going to be stuck with a dead project. In fact, he already did that once when he found a bug in the driver.
People RIP on Google for killing projects all the time and now you want to bet your business on a guy who livestreams in front of a pirate flag? Come on.
Never mind that even in my own personal dealings with him, he's been a total dick and I'm far from the only person who says that.
Even if I validate my idea on a RTX 4090, the path to scaling any idea gets expensive fast. 15k to move up to something like a tinybox (probably capable of running 65B model but is it realistic to train or fine-tune 65B model?). Then maybe $100k in cloud costs. Then maybe $500k in research sized cluster. Then $10m+ for enterprise grade. I don't see that kind of ramp happening outside well-financed VC startups.
That said I'm mostly responding to the "two guys in a garage" comment with this. Larger companies are going to have different needs altogether.
Serious question: Where does an aspiring AI/ML dev get that expertise. From looking at OMCS I'm not convinced even a doctorate from Georgia Tech would get me the background I need...
To put it another way, the $10m+ for enterprise grade just seems wrong to me. It's more like $10m+ for mediocre responses to a lot of things. Rad AI didn't spend $10m on their models, but they absolutely are professional grade and are in use today.
I also think it's important to consider capital costs that are a one time thing, versus long term costs. Once you purchase that $10m cluster you have that forever, not just for a single model, and because of the GPU scarcity right now that cluster isn't losing value nearly as rapidly as most hardware does. If you purchase a $500k cluster, use it for three years, and then sell it for $400k you're really not doing all that bad.
I can't tell you if one program is better than another, as it's a bit out of my area of expertise.
Another absolute. I try to not be so focused on single points of input like that.
From what I can tell, sitting on the other side of the wall (GPU provider), there is metric tons of demand from all sides.
But my (totally amateur and outsider informed) intuition is that the innovative work will still happen at the edge of model size for the next few years. We literally just got the breakthroughs in LLM capabilities around the 30b parameter mark. These capabilities seemed to accelerate with larger models. There appears to be a gulf in the capabilities from 7B to 70B parameter LLMs that makes me not want to bother with LLMs at all unless I can get that higher level performance of the massive models. But even if I did want to play around at 30B or whatever I have to pay 15k-100k.
I think we are just in a weird spot right now where the useful model sizes for a large class of potential applications is at a price point that many engineers will find prohibitively expensive to experiment with on their own.
Right now in the US there's about as much proposed renewable production planned and awaiting permitting as their is currently installed. It's the grid connections that are the long pole in expanding renewable use right now. And since the voltage that a solar panel outputs is pretty close to the voltage a GPU consumes you've got some more savings there.
There are still a lot of challenges with that but in general I think people should be looking for ways to collocate intermittent production of various things with solar farms right now, from AI models to amonia.
We need more people who "think different" and push back against the status quo instead of carrying out ad hominem attacks on public forums.
Should I complain that to drill oil I need hundreds of millions of dollars to even start?
Your VPS example was doing barely any computation. You're conflating web 1.0 and web 2.0 with neural networks and they are nothing alike in terms of FLOPS.
I also think that you're way off on the second point. I'm not saying that to be rude, because it does seem to be a popular opinion. It's just that if you read papers most people publishing aren't using giant clusters. There's a whole field of people who are finding ways to shrink models down. Once we understand the models we can also optimize them. You see this happen in all sorts of fields beyond "general intelligence"- tasks that used to take entire clusters to run can work on your cell phone now. Optimization is important not just because it opens up more people to work on things, but also because it drops down the costs that these big companies are paying.
Lets think about this in another direction. ML models are based off of how the brain is thought to work. The human brain is capable of quite a bit, but it uses very little power: about 10 watts. It is clearly better optimized than ML models are. That means there's a huge gap we still have to fit on efficiency.
- low-budget: tax payer supercomputer for tax payer phd students
- high-risk tolerance: tolerate AI cluster arriving 5 years late (Intel and Aurora), lack of AI SW stack, etc.
- High FP64 FLOPs constraint: nobody doing AI cares about FP64
Private companies whose survival depend on very expensive engineers (10x EU phd student salary) quickly generating value from AI in a very competitive market are completely different kind of "AI customers".
It's not that bad; there are lots of things you can do with a hobbyist budget. For example, a consumer GPU with 12 or 24 GB VRAM costs $1000-2000 and can let you run many models and do fine-tuning on them. The next step up, for fine-tuning larger models, is to rent an instance on vast.ai or something similar for a few hours with a 4-8 GPU instance, which will set you back maybe $200—still within the range of a hobbyist budget. Many academic fine-tuning efforts, like Stanford Alpaca, cost a few hundred dollars to fine-tune. It's only when you want to pretrain a large language model from scratch that you need thousands of GPUs and millions in funding.
2) Speaking of VPSes and web 1.0 in the same breath is a little anachronistic. Servers had much lower capacity in 1999, and cost much more. Sun was a billion dollar company during the bubble because it was selling tens of thousands of unix servers to startups in order to handle the traffic load. Google got a lot of press because they were the oddballs who ran on commodity x86 hardware.
There is a massive difference between what is necessary to prove a scientific thesis and what is necessary to run a profitable business. And what do you mean "giant clusters" in this context. What is the average size of the clusters used in ground breaking papers and what is their cost? Is that cost a reasonable amount for a boot-strapped startup to experiment with or are we getting into the territory where only VC backed ventures can even experiment?
> There's a whole field of people who are finding ways to shrink models down
Of course the cost of running models is going to come down. The literal article we are responding to is a major part of that equation. You seem to be making arguments about how the future will be as support for an argument against how the present is.
Presently, hardware costs are insanely high and not coming down soon (as per the article). Presently, useful models for a large set of potential applications require significant cluster sizes. That makes it presently difficult for many engineers to jump in and play around.
My opinion is that the cost has to come down to the point that hobbiest engineers can play with the high-quality LLMs at the model sizes that are most useful. That doesn't imply that there are no model sizes for other use-cases that can't be developed today. It doesn't imply that the price of the hardware and size of the models will not fall. It just implies that dreaming of a business based around a capable LLM means your realistic present day costs are in the 10's of thousands at a minimum.
Of course, it is the businesses that find a way to make this work that will succeed. It isn't an impossible problem, it is just a seemingly difficult one for now. That is why I mentioned VC funding as appearing to have more leverage over this market than previous ones. If you can find someone to foot the 250k+ cost (e.g. AI Grant [1] where they offer 250k cash and 350k cloud compute) then you might have a chance.
We took 30 MW outside the US but also some inside the US
Maybe he'll succeed, but this definitely doesn't scream stability to me. I'd be wary of investing money into his ventures (but then I'm not a VC, so what do I know).
[1] https://www.youtube.com/watch?v=Mr0rWJhv9jU
[2] https://github.com/RadeonOpenCompute/ROCm/issues/2198#issuec...
[3] https://twitter.com/realGeorgeHotz/status/166980346408248934...
I've enabled nearly all GFX9 and GFX10 GPUs as I have packaged the libraries for Debian. I haven't tested every library with every GPU, but my experience has been that they pretty much all work. I suspect that will also be true of GFX11 once we move rocm-hipamd to LLVM 16.
If you want to compete on the actual model, then yes, this is not the time for garage shops.
If your business plan is so good, then it will work without H100 "cards" too, or if it's even better and you know it'll print money with H100 cards then great, just wait.
usually community projects have different target constraints, right? (for example Mastodon doesn't even want to be Twitter, it wants its own thing, it wants to be a different answer to the social network/media question, even if there are obvious and fundamental similarities.) how does this play out in the open source AI communities?
As far as other storage methods, they're really cool but water and trains require a lot of space, and flywheels typically aren't well suited for storing energy for long amounts of time. That being said, pumped water is still about 10x more common than batteries right now and flywheels are useful if you want to normalize a peaky supply of electricity.
I'd like to believe we'll see more innovative stuff like you're suggesting, but I think for the time being the regulatory environment is too complicated and the capex is probably too high for anyone outside of the MAMA companies to try something like that right now.
[0] - https://www.energy.gov/policy/articles/deployment-grid-scale...
1 bad driver update is not indicative of anything. Nvidia has had bad driver updates but you’re not shutting all over them. And running Nvidias own drivers on linux is still a pain point.
(And don’t try claim I’m an AMD fanboy when I don’t even have any AMD stuff at the moment. It’s all Intel/Nvidia)
By the way, I also got a bug in the AMD drivers fixed too [0]. That bug fix enabled me to fully automate the performance tuning of 150,000 AMD gpus that I was managing. This is something nobody else had done before, it was impossible to do without this bug fix. We were doing this by hand before! The only bummer was that I had to upgrade the kernel on 12k+ systems... that took a while.
I went through the proper channels and they fixed it in a week, no need for a public meltdown or email to Lisa crying for help.
[0] https://patchwork.freedesktop.org/patch/470297/?series=99134...
These skills are developed when they study phd