zlacker

That's exactly what it did (author here).

replies(1): >>majorm+I

>>martin+(OP)
I'm having trouble reconciling "30 sheet mind numbingly complicated Excel financial model" and "Two or three prompts got it there, using plan mode to figure out the structure of the Excel sheet, then prompting to implement it. It even added unit tests to the Python model itself, which I was impressed with!"

"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.

And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?

replies(1): >>martin+j1

>>majorm+I
No it didn't make a giant plan of every detail. It made a plan of the core concepts and then when it was in implementation mode it kept checking the excel file to get more info. It took around ~30 mins in implementation mode to build it.

I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.

replies(1): >>majorm+J1

>>martin+j1
Ah, I see.

Did it build a test suite for the Excel side? A fuzzer or such?

It's the cross-concern interactions that still get me.

80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).