zlacker

I think we are too quick to discount the possibility that this flaw is slightly intentional, in the sense that the optimization has a tight budget to work with (equivalent of ~3000 tokens) so why would it waste capacity on this when it could improve capabilities around reading small text in obscured images? Sort of like humans have all these rules of thumbs that backfire in all these ways but that's the energy efficient way to do things.

replies(1): >>runarb+x2

>>energy+(OP)
Even so, that doesn’t take away from my point. Traditional specialized models can do these things already, for much cheaper and without expensive optimization. What traditional models cannot do is the toy aspect of LLM, and that is the only usecase I see for this technology going forward.

Lets say you are right and these things will be optimized, and in, say, 5 years, most models from the big players will be able do things like reading small text in an obscure image, draw a picture of a glass of wine filled to the brim, draw a path through a maze, count the legs of a 5 footed dog, etc. And in doing so finished their last venture capital subsidies (bringing the actual cost of these to their customers). Why would people use LLMs for these when a traditional specialized model can do it for much cheaper?

replies(2): >>energy+v6 >>a1j9o9+Y11

>>runarb+x2
> Why would people use LLMs for these when a traditional specialized model can do it for much cheaper?

This is not too different from where I see things going. I don't think a monolithic LLM that does everything perfectly is where we'll go. An LLM in a finite-compute universe is never going to be better at weather forecasting than GraphCast. The LLM will have a finite compute budget, and it should prioritize general reasoning, and be capable of calling tools like GraphCast to extend its intelligence into the necessary verticals for solving a problem.

I don't know exactly what that balance will look like however, and the lines between specialist application knowledge and general intelligence is pretty blurred, and what the API boundaries (if any) should be are unclear to me. There's a phenomenon where capabilities in one vertical do help with general reasoning to an extent, so it's not a completely zero-sum tradeoff between specialist expertise and generalist abilities, which makes it difficult to know what to expect.

>>runarb+x2
Having one tool that you can use to do all of these things makes a big difference. If I'm a financial analyst at a company I don't need to know how to implement and use 5 different specialized ML models, I can just ask one tool (that can still use tools on the backend to complete the task efficiently)

replies(1): >>runarb+Rj1

>>a1j9o9+Y11
I‘m sorry but this may come across as condescending, but if you are a financial analysis, isn’t doing statistics a part of your job. And doesn’t your expertise involve knowing which kinds of statistical analysis are available to tackle a given problem? It just seems weird to me that you would opt to not use your expertise and instead use a generalized model which is both more expensive and has poorer results as traditional models.