I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.
"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.
And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?
I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.
Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.
Did it build a test suite for the Excel side? A fuzzer or such?
It's the cross-concern interactions that still get me.
80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).
the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money
https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...
Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
I have no idea why it had so much trouble with this generally easy task. Bizarre.
When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?
I have, in my early careers, gone knee deep into Excel macros and worked on c# automation that will create excel sheet run excel macros on it and then save it without the macros.
in the entire process, I saw dozens of date time mistakes in VBA code, but no tests that would catch them...
Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.
It's like a CPU that's almost 100% reliable... in that it fails only once every 1 million clock cycles.
Tell me if I am wrong, but surely Claude cannot even access execution coverage.
This reminded of something that happened to me last year. Not Claude (I think it was GPT 4.0 maybe?), but I had it running in VS Code's Copilot and asked it to fix a bug then add a test for the case.
Well, it kept failing to pass its own test, so on the third try, it sat there "thinking" for a moment, then finally spit out the command `echo "Test Passed!"`, executed it, read it from the terminal, and said it was done.
I was almost impressed by the gumption more than anything.
1) it wants to run X command
2) it notices a hook preventing it from running X
3) it creates a Python application or shell script that does X and runs it instead
Whoops.
When shit hits the fan, execs need answers yesterday and the 30 sheet Excel monstrosity is producing the wrong numbers - who fixes it?
It was done by Sue, who left the company 4 years ago, people have been using it since and nobody really understands it.
I have seen Excel used for financial planning
I have seen Excel used for managing people's health data.
I have BUILT a test suite for a government offical use communication device - inside Excel. The original was a mish-mash of Excel formulas and VBA. I improved the VBA part of it by adding a web cam to the mix.
I don't sleep well at night knowing how many very very essential things are running on top of Excel sheets passed down like stories around a campfire.
I’ve also heard plenty of horror stories of bus factor employees leaving (or threatening to leave) behind an excel monstrosity and companies losing 6 months of sales, so maybe there’s a win for AI somewhere in there.
0: https://github.com/mbcrawfo/vibefun/blob/main/.claude/hooks/...