zlacker

[parent] [thread] 0 comments
1. moyix+(OP)[view] [source] 2023-06-10 18:22:10
The lowest temperature isn't always the one that results in working code! This was shown in the original Codex paper:

> When evaluating pass@k, it is important to optimize sampling temperature for the particular value of k. In Figure 5, we plot pass@k against the number of samples k and the sampling temperature. We find that higher temperatures are optimal for larger k, because the resulting set of samples has higher diversity, and the metric rewards only whether the model generates any correct solution.

> In particular, for a 679M parameter model, the optimal temperature for pass@1 is T∗ = 0.2 and the optimal temperature for pass@100 is T∗ = 0.8. With these temperatures, we find that pass@1 and pass@100 scale smoothly as a function of model size (Figure 6).

So even with pass@1 (likelihood of getting the right answer in 1 attempt) you don't use T=0, so there will be slight variations in the output each time.

[go to top]