zlacker

[parent] [thread] 0 comments
1. thepti+(OP)[view] [source] 2025-05-24 16:53:07
Thanks! So, how does this impact Deliberative Alignment[1], where IIUC the intermediate tokens are assessed (eg for referencing the appropriate policy fragment)?

Does you see your result as putting that paradigm in question, or does the explicit reasoning assessment perhaps ameliorate the issue?

[1]: https://arxiv.org/html/2412.16339v2

[go to top]