Just implies they tuned it for user experience.
I was expecting there to be some discovery around them deliberately fine tuning their model to output modifications if and only if the code had a certain license.
There are also sampling schemes, top_p and top_k which can each individually help choose tokens that are less probable (but still highly probable) but more correct, and they are often used together for the best effect.
And then there are various decoding methods like beam search where choosing the most optimal beam may not mean the most optimal individual token.
By default a simple greedy search is used which always chooses the next highest probability token.
Specifically for GPT models, the temperature parameter is used to get outputs wihch are a bit more "creative" and less deterministic. https://help.promptitude.io/en/ai-providers/gpt-temperature
> When evaluating pass@k, it is important to optimize sampling temperature for the particular value of k. In Figure 5, we plot pass@k against the number of samples k and the sampling temperature. We find that higher temperatures are optimal for larger k, because the resulting set of samples has higher diversity, and the metric rewards only whether the model generates any correct solution.
> In particular, for a 679M parameter model, the optimal temperature for pass@1 is T∗ = 0.2 and the optimal temperature for pass@100 is T∗ = 0.8. With these temperatures, we find that pass@1 and pass@100 scale smoothly as a function of model size (Figure 6).
So even with pass@1 (likelihood of getting the right answer in 1 attempt) you don't use T=0, so there will be slight variations in the output each time.
But here we are talking about autocompleting code. I don't think programmers want the autocompleter to be creative. They want the exact same solution everyone uses, hopefully the right one, with only minor changes so that it matches their style and use their own variable names. In my case, I am the programmer, I decide what to do, I just want my autocompleter to save me some keystrokes and copy-pasting boilerplate from the web, the more it looks like existing code the better. I have enough work fixing my own bugs, thank you.
Speaking about bugs, how come everyone talks about code generation that, I think, doesn't bring that much value. Sure, it saves a few keystrokes and copy-pasting from StackOverflow, but I don't feel like it is the thing programmers spend most of the time doing. Dealing with bugs is. By bugs, there are the big ones that have tickets and can take days to analyze and fix, but also the ones that are just a normal part of writing code, like simple typos that result in compiler errors. I think that machine learning could be of great help here.
Just a system that tells me "hey, look here, this is not what I expected to see" would be of great help. Unexpected doesn't mean there is a bug, but it is something worth paying attention to. I know it has been done, but few people seem to talk about it. Or maybe a classifier trained on bug fix commits. If a piece of code looks like code that has been changed in a bug fix commit, there is a good chance it is also a bug. Have it integrated to the IDE, highlight the suspicious part as I type, just as modern IDEs highlight compilation errors in real time.