zlacker

Eh, their argument is simply that they tuned temperature settings to encourage the model to output slight variations on memorized data. But this is kind of just one of many things you do with a language model and certainly doesn’t imply intent to avoid copyright allegations.

Just implies they tuned it for user experience.

I was expecting there to be some discovery around them deliberately fine tuning their model to output modifications if and only if the code had a certain license.

replies(2): >>Brian_+J3 >>keving+Tt

>>Shamel+(OP)
Why else bother with such an input? Are randomizations more likely to be correct or more useful?

replies(8): >>ianbut+C4 >>slashd+25 >>brooks+75 >>2greml+ha >>cubefo+og >>seanhu+kz >>golemo+bS >>GuB-42+bh1

>>Brian_+J3
Potentially more correct, yes. It frees the model to choose lower probability tokens to some degree, technically it boosts their probabilities, which may be more correct depending on the task.

There are also sampling schemes, top_p and top_k which can each individually help choose tokens that are less probable (but still highly probable) but more correct, and they are often used together for the best effect.

And then there are various decoding methods like beam search where choosing the most optimal beam may not mean the most optimal individual token.

By default a simple greedy search is used which always chooses the next highest probability token.

>>Brian_+J3
I don't know much about AI, but I think one reason you might do that is to learn which variations are preferred (which are committed unmodified) so you can tune the model in the future. I don't know if Github does that, but given they've cited how often code from copilot is committed without modification, I assume they are measuring it at least in some cases.

replies(1): >>Brian_+7c

>>Brian_+J3
Huge topic, worth Googling. Short version is that too little randomness limits the solution space, so retrying suboptimal results yields the same problems.

>>Brian_+J3
Ye olde Bias-Variance tradeoff

>>slashd+25
makes sense

>>Brian_+J3
Well, temperature 0 means the completion is always the most "likely" (or "best", after fine-tuning) token, while temperature 1 means to choose the next tokens stochastically according to their probability (or "goodness" after fine-tuning). Usually some temperature in between is chosen, like 0.7. It's not a priori clear to me which is the best way to do it.

>>Shamel+(OP)
What's the value of slight variations? Isn't it more likely that the memorized data was already known to be good and effective? It doesn't seem like a useful change unless your goal is to avoid infringement. I don't see how randomly permuting the suggestions improves UX.

replies(1): >>moyix+uA

>>Brian_+J3
Generally the reason behind adding randomness to machine learning is avoiding "local minima" in the search space of the optimization function(s) used for training the model. If your training produces a very smooth descent to an optimum it can lead to the model converging on a solution that is not globally the best. Adding some randomness helps to avoid this.

Specifically for GPT models, the temperature parameter is used to get outputs wihch are a bit more "creative" and less deterministic. https://help.promptitude.io/en/ai-providers/gpt-temperature

>>keving+Tt
The lowest temperature isn't always the one that results in working code! This was shown in the original Codex paper:

> When evaluating pass@k, it is important to optimize sampling temperature for the particular value of k. In Figure 5, we plot pass@k against the number of samples k and the sampling temperature. We find that higher temperatures are optimal for larger k, because the resulting set of samples has higher diversity, and the metric rewards only whether the model generates any correct solution.

> In particular, for a 679M parameter model, the optimal temperature for pass@1 is T∗ = 0.2 and the optimal temperature for pass@100 is T∗ = 0.8. With these temperatures, we find that pass@1 and pass@100 scale smoothly as a function of model size (Figure 6).

So even with pass@1 (likelihood of getting the right answer in 1 attempt) you don't use T=0, so there will be slight variations in the output each time.

>>Brian_+J3
Yes.

>>Brian_+J3
It is worthwhile with creative writing. For example if you ask ChatGPT to write a short story, you want some originality. Even when asking for an explanation it can be useful as you may want to try different things for the explanation that speaks to you the most.

But here we are talking about autocompleting code. I don't think programmers want the autocompleter to be creative. They want the exact same solution everyone uses, hopefully the right one, with only minor changes so that it matches their style and use their own variable names. In my case, I am the programmer, I decide what to do, I just want my autocompleter to save me some keystrokes and copy-pasting boilerplate from the web, the more it looks like existing code the better. I have enough work fixing my own bugs, thank you.

Speaking about bugs, how come everyone talks about code generation that, I think, doesn't bring that much value. Sure, it saves a few keystrokes and copy-pasting from StackOverflow, but I don't feel like it is the thing programmers spend most of the time doing. Dealing with bugs is. By bugs, there are the big ones that have tickets and can take days to analyze and fix, but also the ones that are just a normal part of writing code, like simple typos that result in compiler errors. I think that machine learning could be of great help here.

Just a system that tells me "hey, look here, this is not what I expected to see" would be of great help. Unexpected doesn't mean there is a bug, but it is something worth paying attention to. I know it has been done, but few people seem to talk about it. Or maybe a classifier trained on bug fix commits. If a piece of code looks like code that has been changed in a bug fix commit, there is a good chance it is also a bug. Have it integrated to the IDE, highlight the suspicious part as I type, just as modern IDEs highlight compilation errors in real time.