zlacker

> What's the plan?

Call me old school, but I find the workflow of "divide and conquer" to be as helpful when working with LLMs, as without them. Although what is needed to be considered a "large scale task" varies by LLMs and implementation. Some models/implementations (seemingly Copilot) struggles with even the smallest change, while others breeze through them. Lots of trial and error is needed to find that line for each model/implementation :/

replies(3): >>mjburg+z4 >>noneth+H4 >>safety+0a

>>diggan+(OP)
The relevant scale is the number of hard constraints on the solution code, not the size of task as measured by "hours it would take the median programmer to write".

So eg., one line of code which needed to handle dozens of hard-constraints on the system (eg., using a specific class, method, with a specific device, specific memory management, etc.) will very rarely be output correctly by an LLM.

Likewise "blank-page, vibe coding" can be very fast if "make me X" has only functional/soft-constraints on the code itself.

"Gigawatt LLMs" have brute-forced there way to having a statistical system capable of usefully, if not universally, adhreading to one or two hard constraints. I'd imagine the dozen or so common in any existing application is well beyond a Terawatt range of training and inference cost.

replies(1): >>cyanyd+N8

>>diggan+(OP)
Its hard for me to think of a small, clearly defined coding problem an LLM cant solve.

replies(2): >>jodrel+n6 >>mrguyo+NY

>>noneth+H4
"Find a counter example to the Collatz conjecture".

>>mjburg+z4
Keep in mind that the model of using LLM assumes the underlying dataset converges to production ready code. Thats never been proven, cause we know they scraped sourcs code without attribution.

>>diggan+(OP)
I mean I guess this isn't very ambitious, but it's a meaningful time saver if I basically just write code in natural language, and then Copilot generates the real code based on that. I don't have to look up syntax details, or what some function somewhere was named, etc. It will perform very accurately this way. It probably makes me 20% more efficient. It doubles my efficiency in a language I'm unfamiliar with.

I can't fire half my dev org tomorrow with that approach, I can't really fire anyone, so I guess it would be a big letdown for a lot of execs. Meanwhile though we just keep incrementally shipping more stuff faster at higher quality so I'm happy...

This works because it treats the LLM like what it actually is: an exceptionally good if slightly random text transformer.

>>noneth+H4
There are several in the linked post, primarily:

"Your code does not compile" and "Your tests fail"

If you have to tell an intern that more than once on a single task, there's going to be conversations.