zlacker

[return to "Stargate Project: SoftBank, OpenAI, Oracle, MGX to build data centers"]
1. TheAce+9g[view] [source] 2025-01-22 00:03:02
>>tedsan+(OP)
I'm confused and a bit disturbed; honestly having a very difficult time internalizing and processing this information. This announcement is making me wonder if I'm poorly calibrated on the current progress of AI development and the potential path forward. Is the key idea here that current AI development has figured out enough to brute force a path towards AGI? Or I guess the alternative is that they expect to figure it out in the next 4 years...

I don't know how to make sense of this level of investment. I feel that I lack the proper conceptual framework to make sense of the purchasing power of half a trillion USD in this context.

◧◩
2. dauhak+Ur[view] [source] 2025-01-22 01:30:26
>>TheAce+9g
> Is the key idea here that current AI development has figured out enough to brute force a path towards AGI?

My sense anecdotally from within the space is yes people are feeling like we most likely have a "straight shot" to AGI now. Progress has been insane over the last few years but there's been this lurking worry around signs that the pre-training scaling paradigm has diminishing returns.

What recent outputs like o1, o3, DeepSeek-R1 are showing is that that's fine, we now have a new paradigm around test-time compute. For various reasons people think this is going to be more scalable and not run into the kind of data issues you'd get with a pre-training paradigm.

You can definitely debate on whether that's true or not but this is the first time I've been really seeing people think we've cracked "it", and the rest is scaling, better training etc.

◧◩◪
3. Nitpic+xY[view] [source] 2025-01-22 06:18:04
>>dauhak+Ur
I agree with your take, and actually go a bit further. I think the idea of "diminishing returns" is a bit of a red herring, and it's instead a combination of saturated benchmarks (and testing in general) and expectations of "one llm to rule them all". This might not be the case.

We've seen with oAI and Anthropic, and rumoured with Google, that holding your "best" model and using it to generate datasets for smaller but almost as capable models is one way to go forward. I would say that this shows the "big models" are more capable than it would seem and that they also open up new avenues.

We know that Meta used L2 to filter and improve its training sets for L3. We are also seeing how "long form" content + filtering + RL leads to amazing things (what people call "reasoning" models). Semantics might be a bit ambitious, but this really opens up the path towards -> documentation + virtual environments + many rollouts + filtering by SotA models => new dataset for next gen models.

That, plus optimisations (early exit from meta, titans from google, distillation from everyone, etc) really makes me question the "we've hit a wall" rhetoric. I think there are enough tools on the table today to either jump the wall, or move around it.

[go to top]