zlacker

[parent] [thread] 0 comments
1. martin+(OP)[view] [source] 2025-07-25 00:48:28
But "agentic AI" (and LLMs in general) are far less about compute than everyone talks about IMO. I know what you mean FWIW but she does have a point I think.

1) Context memory requirements scale quadratically with length. 2) "Agentic" AI requires a shittonne of context IME. Like a horrifying amount. Tool definitions alone can add up to thousands upon thousands of tokens, plus schemas and a lot of 'back and forth' context use between tool(s). If you just import a moderately complicated OpenAPI/Swagger schema and use it "as is" you will probably run into the hundreds of thousands of tokens within a few tool calls. 3) Finally, compute actually isn't the bottleneck, its memory bandwidth.

There is a massive opportunity for someone to snipe nvidia for inference at least. Inference is becoming pretty 'standardized' at least with the current state of play. If someone can come along with a cheaper GPU with a lot of VRAM and a lot of memory bandwidth, NVidia's moat is far less software wise than it is for CUDA as a whole. I think AMD are very close to reaching that FWIW.

I suspect training and R&D will remain more in NVidias sphere but if Intel got its act together there is definitely room for competition here.

[go to top]