zlacker

[parent] [thread] 0 comments
1. buggle+(OP)[view] [source] 2025-01-21 23:49:25
Uh, they invented multilatent attention and since the method for creating o1 was never published, they’re the only documented example of producing a model of comparable quality. They also demonstrated massive gains to the performance of smaller models through distillation of this model/these methods, so no, not really. I know this is the internet, but we should try to not just say things.
[go to top]