zlacker

[parent] [thread] 5 comments
1. attemp+(OP)[view] [source] 2025-06-02 22:05:11
>I feel the opposite, and pretty much every metric we have shows basically linear improvement of these models over time.

Wait, what kind of metric are you talking about? When I did my masters in 2023 SOTA models where trying to push the boundaries by minuscule amounts. And sometimes blatantly changing the way they measure "success" to beat the previous SOTA

replies(1): >>mounta+Wb
2. mounta+Wb[view] [source] 2025-06-02 23:18:51
>>attemp+(OP)
Almost every single major benchmark, and yes progress is incremental but it adds up, this has always been the case
replies(1): >>attemp+PS
◧◩
3. attemp+PS[view] [source] [discussion] 2025-06-03 06:35:10
>>mounta+Wb
We were talking about linear improvements and I have yet to see it
replies(1): >>mounta+Md2
◧◩◪
4. mounta+Md2[view] [source] [discussion] 2025-06-03 17:04:48
>>attemp+PS
check the benchmarks or make one of your own
replies(1): >>attemp+mL2
◧◩◪◨
5. attemp+mL2[view] [source] [discussion] 2025-06-03 20:20:04
>>mounta+Md2
I checked the BlEU-Score and Perplexity of popular models and both have stagnated around 2021. As a disclaimer this was a cursory check and I didn't dive into the details of how individuals scores were evaluated.
replies(1): >>mounta+NV4
◧◩◪◨⬒
6. mounta+NV4[view] [source] [discussion] 2025-06-04 16:37:03
>>attemp+mL2
on what benchmarks? pretty much every major one is linear improvement
[go to top]