zlacker

[parent] [thread] 0 comments
1. averev+(OP)[view] [source] 2023-11-20 21:38:17
Parsing is faster than generating, so having a small model produce a whole output and then have Goliath only produce "good/bad" single token response evaluation would be faster than having Goliath produce everything. This would be the extreme, adhoc and iterative version of speculative decoding, which is already a thing and would probably give the best compromise
[go to top]