zlacker

[parent] [thread] 1 comments
1. throwa+(OP)[view] [source] 2025-05-23 17:56:08
why is it unreasonable that giving the llm a spot to think and collate long range attention and summarize without the pressure of building a meaningful next token so quickly would result in higher effectiveness?
replies(1): >>x_flyn+r01
2. x_flyn+r01[view] [source] 2025-05-24 04:42:34
>>throwa+(OP)
It's more about the lack of semantic meaning in the intermediate tokens, not that they aren't effective (even when the intermediates are wrong)
[go to top]