https://www.reddit.com/r/LocalLLaMA/comments/17vcr9d/llm_com...
Convergence (evolutionary computing) https://en.wikipedia.org/wiki/Convergence_(evolutionary_comp...
Convergence (disambiguation) > Science, technology, and mathematics https://en.wikipedia.org/wiki/Convergence#Science,_technolog...
In the “Merge Process” section they at least give the layer ranges.
The papers referenced here get into this: https://cacm.acm.org/blogs/blog-cacm/276268-can-llms-really-...
It makes complete sense and has been a part of my own usage for well over a year now, but it's been cool seeing it demonstrated in research across multiple models.
It's why the bunch of linear algebra on the weights works to do this particular task, and how it will respond to any particular task that is a bit mysterious.
Like imagine someone gave you the Taylor series expansion of the inverse Kepler equation[1]. So you just have a bunch of crazy fractions of powers of x that you add up. And then they say ok now the this function will very accurately explain the orbit of the planets.
You'd be able to do the steps - you're just adding up fractions. You'd be able to verify the answer you got corresponded to the orbit of a given celestial body.
But if you didn't have all the pieces in the middle (calculus mainly) there's no way you'd be able to explain why this particular set of fractions corresponds to the movement of the planets and some other set doesn't.
[1] https://en.wikipedia.org/wiki/Kepler%27s_equation scroll down a bit
https://www.google.com/search?q=%22python+app+calculate+road...
If you leave off the quotes (which were present in the comment I responded to) then of course you will get millions of irrelevant hits. Somewhere in that chaff there is some Python code that alleges to have something to with road trips, though it's not always clear what. If I give the same prompt to ChatGPT I get a nicely formatted box with a program that uses the Google Maps Distance Matrix API to calculate distance and duration, without a bunch of junk to wade through. (I haven't tried it so it could be a complete hallucination.)
Actually this part does seem in recent research to be encoded in LLMs at an abstract level in a linear representation...