LLMs cannot find reasoning errors, but can correct them

>>koie+(OP)
I was just testing Bard with some very simple coding exercises and it did well.

I noticed that they automatically create at least three other draft responses.

I assume that this is a technique that allows them to try multiple times and then select the best one.

Just mentioning it because it seems like another example of not strictly "zero-shot"ing a response. Which seems important for getting good results with these models.

I'm guessing they use batching for this. I wonder if it might become more common to run multiple inference subtasks for the same main task inside of a batch, for purposes of self-correcting agent swarms or something. The outputs from step one are reviewed by the group in step 2, then they try again in step 3.

I guess that only applies for a small department where there is frequently just one person using it at a time.

>>ilaksh+Hm
I don't like this. It forces me to read 2 prompts instead of 1 so that I can help train their LLM. ChatGPT and Bard already have regenerate buttons if I don't like their response, it doesn't need to be that in my face.

>>erhaet+VA
How else do you expect the LLM you use to become better? I'm more than happy to provide feedback. Unless you want it to only scrape data, I can't imagine why you'd be opposed to improving a product you use especially when that's really the only way to do it. If you don't care, just pick one and don't think about it, they're usually extremely similar anyway. I'm not sure I've come across an option where one was acceptable and one wasn't. They are literally giving you options that that don't need to give and you're complaining.

zlacker