GitHub accused of varying Copilot output to avoid copyright allegations

>>belter+(OP)
Eh, their argument is simply that they tuned temperature settings to encourage the model to output slight variations on memorized data. But this is kind of just one of many things you do with a language model and certainly doesn’t imply intent to avoid copyright allegations.

Just implies they tuned it for user experience.

I was expecting there to be some discovery around them deliberately fine tuning their model to output modifications if and only if the code had a certain license.

>>Shamel+ba
Why else bother with such an input? Are randomizations more likely to be correct or more useful?

>>Brian_+Ud
Potentially more correct, yes. It frees the model to choose lower probability tokens to some degree, technically it boosts their probabilities, which may be more correct depending on the task.

There are also sampling schemes, top_p and top_k which can each individually help choose tokens that are less probable (but still highly probable) but more correct, and they are often used together for the best effect.

And then there are various decoding methods like beam search where choosing the most optimal beam may not mean the most optimal individual token.

By default a simple greedy search is used which always chooses the next highest probability token.

zlacker