zlacker

why is it unreasonable that giving the llm a spot to think and collate long range attention and summarize without the pressure of building a meaningful next token so quickly would result in higher effectiveness?

replies(1): >>x_flyn+r01

>>throwa+(OP)
It's more about the lack of semantic meaning in the intermediate tokens, not that they aren't effective (even when the intermediates are wrong)