zlacker

[return to "My AI skeptic friends are all nuts"]
1. ofjcih+21[view] [source] 2025-06-02 21:18:27
>>tablet+(OP)
I feel like we get one of these articles that addresses valid AI criticisms with poor arguments every week and at this point I’m ready to write a boilerplate response because I already know what they’re going to say.

Interns don’t cost 20 bucks a month but training users in the specifics of your org is important.

Knowing what is important or pointless comes with understanding the skill set.

◧◩
2. mounta+S3[view] [source] 2025-06-02 21:33:43
>>ofjcih+21
I feel the opposite, and pretty much every metric we have shows basically linear improvement of these models over time.

The criticisms I hear are almost always gotchas, and when confronted with the benchmarks they either don’t actually know how they are built or don’t want to contribute to them. They just want to complain or seem like a contrarian from what I can tell.

Are LLMs perfect? Absolutely not. Do we have metrics to tell us how good they are? Yes

I’ve found very few critics that actually understand ML on a deep level. For instance Gary Marcus didn’t know what a test train split was. Unfortunately, rage bait like this makes money

◧◩◪
3. attemp+g9[view] [source] 2025-06-02 22:05:11
>>mounta+S3
>I feel the opposite, and pretty much every metric we have shows basically linear improvement of these models over time.

Wait, what kind of metric are you talking about? When I did my masters in 2023 SOTA models where trying to push the boundaries by minuscule amounts. And sometimes blatantly changing the way they measure "success" to beat the previous SOTA

◧◩◪◨
4. mounta+cl[view] [source] 2025-06-02 23:18:51
>>attemp+g9
Almost every single major benchmark, and yes progress is incremental but it adds up, this has always been the case
◧◩◪◨⬒
5. attemp+521[view] [source] 2025-06-03 06:35:10
>>mounta+cl
We were talking about linear improvements and I have yet to see it
◧◩◪◨⬒⬓
6. mounta+2n2[view] [source] 2025-06-03 17:04:48
>>attemp+521
check the benchmarks or make one of your own
◧◩◪◨⬒⬓⬔
7. attemp+CU2[view] [source] 2025-06-03 20:20:04
>>mounta+2n2
I checked the BlEU-Score and Perplexity of popular models and both have stagnated around 2021. As a disclaimer this was a cursory check and I didn't dive into the details of how individuals scores were evaluated.
◧◩◪◨⬒⬓⬔⧯
8. mounta+355[view] [source] 2025-06-04 16:37:03
>>attemp+CU2
on what benchmarks? pretty much every major one is linear improvement
[go to top]