zlacker

[parent] [thread] 2 comments
1. moregr+(OP)[view] [source] 2026-02-02 02:11:22
I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?

Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.

Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.

replies(1): >>skybri+Zk
2. skybri+Zk[view] [source] 2026-02-02 06:02:10
>>moregr+(OP)
If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for.

Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.

replies(1): >>gregor+yw
◧◩
3. gregor+yw[view] [source] [discussion] 2026-02-02 08:10:37
>>skybri+Zk
For starters, Python uses IEEE 754, and Excel uses IEEE 754 (with caveats). I wonder if that's being emulated.
[go to top]