Even for my usual toy coding problems it would get simple things wrong and require some poking to get to it.
A few times it got stuck in thinking loops and I had to cancel prompts.
This was using the recommended settings from the unsloth repository. It's always possible that there are some bugs in early implementations that need to be fixed later, but so far I don't see any reason to believe this is actually a Sonnet 4.5 level model.
3.7 was not all that great. 4 was decent for specific things, especially self contained stuff like tests, but couldn't do a good job with more complex work. 4.5 is now excellent at many things.
If it's around the perf of 3.7, that's interesting but not amazing. If it's around 4, that's useful.