zlacker

[parent] [thread] 0 comments
1. storus+(OP)[view] [source] 2026-02-03 21:33:32
That's why you can use latest open coding models locally that reportedly reached the performance of Sonet-4.5 so almost SOTA. And then you can think of tricks like I mentioned above to directly manipulate GPU RAM for context cleanup when needed which is not possible with cloud models unless their provider enables that.
[go to top]