I noticed that they automatically create at least three other draft responses.
I assume that this is a technique that allows them to try multiple times and then select the best one.
Just mentioning it because it seems like another example of not strictly "zero-shot"ing a response. Which seems important for getting good results with these models.
I'm guessing they use batching for this. I wonder if it might become more common to run multiple inference subtasks for the same main task inside of a batch, for purposes of self-correcting agent swarms or something. The outputs from step one are reviewed by the group in step 2, then they try again in step 3.
I guess that only applies for a small department where there is frequently just one person using it at a time.
It forces you to remind yourself of the stochastic nature of the model and RILHF, maybe the data even helps to improve the latter.
I liked this trait of Bard from the start and hope they keep it.
It provides a sense of agency and reminds to not anthropomorphize the transformer chatbot too much.