A statistical approach to model evaluations

>>RobinH+(OP)
This does feel a bit like under grad introduction to statistical analysis and surprising anyone felt the need to explain these things. But I also suspect most AI people out there now a days have limited math skills so maybe it’s helpful?

>>fnordp+Are
As an ML researcher who started in physics (this seems common among physics/math turned ML people. Which Evan is included), I cannot tell you how bad is it... One year at CVPR when diffusion models hit the scenes I was asking what people's covariance was (I had overestimated the model complexity), and the most common answer I got was "how do I calculate that?" People do not understand things like what "pdf" means. People at top schools! I've been told I'm "gatekeeping" for saying that you should learn math (I say "you don't need math to build good models, but you do to understand why they're wrong"). Not that you need to, but should. (I guess this explains why Mission Impossible Language Models won best paper...)

I swear, the big reason models are black boxes are because we _want_ them to be. There's clear anti-sentiment mentality against people doing theory and the result of this shows. I remember not too long ago Yi Tay (under @agihippo but main is @YiTayML) said "fuck theorists". I guess it's not a surprise Deep Mind recently hired him after that "get good" stuff.

Also, I'd like to point out, the author uses "we" but the paper only has one author on it. So may I suggest adding their cat as a coauthor? [0]

[0] https://en.wikipedia.org/wiki/F._D._C._Willard

>>godels+5Ke
Personal sad story, but hopefully relevant: during my recent PhD I worked on a problem where I used a Dirichlet Process in my solution. That paper has been bouncing around for the past few years getting rejected from every venue I have submitted it to. My interpretation is that most reviewers (there are exceptions - too few to impact the final voting) don't understand any non-DL theory anymore and are not willing to read up for the sake of a fair review. This is based on their comments, where we have been told that our solution is complex (maybe? - but no one suggests an alternative), exposition is not clear (we have rewritten the paper a few times - we rewrite it based on comments from venue i to submit to venue i+1 - its a wild goose chase), and in one case, someone said the paper is derivative because it uses Blackwell-MacQueen sampling; their evidence? - they skimmed through a paper we had cited that also used the sampling algorithm. This is like saying a paper is derivative because it uses SGD.

I am on the review panel of some conferences too and it is not uncommon to be assigned a paper outside of my comfort zone. That doesn't mean I cut and bail. You set aside time, read up on the area, ask authors questions, and judge accordingly. Unfortunately this doesn't happen most of the time - people seem to be in a rush to finish their review no matter the quality. At this point, we just mechanically keep resubmitting the paper every once a while.

Sorry, end of rant :)

>>abhgh+Rrf
Ah, Dirichlet Processes, such lovely things.

Reading this paper, I was struck by how obvious most of the solutions were given my own background from grad school benchmarking quantum annealers and other classical solvers for spin lattices (mostly thermal sampling inspired approaches). I'd argue one could do an even better job than the analysis in Anthropic's paper, but it's astonishing how basic questions like "well how sure are we this is real?" just aren't asked seemingly in ML papers.

I developed a passion for Bayesian statistics approaches in grad school, and had a lovely time specifically thinking quite a bit about DPs, Bayesian bootstraps, etc. I'm sorry your paper is bouncing around. I think folks underestimate these days the value of really thinking about what you know and how you know it, and how to really model uncertainty, and definitely underrate non-DL approaches to problems.

>>joshjo+Nch
Thanks, yes, lot of good ideas in ML seem to be slowly vanishing from the collective awareness. I have nothing against the current spate of methodologies which are empirically great - and if one needs proof, I am a "happy customer" at my day job which is mostly DL and a lot of LLMs - but it seems we are buying into a world where it is one versus the other. And this it need not be. Great ideas are great ideas irrespective of age and there is value in preserving them.

Anyway, since this thread surprisingly evoked a mini-discussion on Dirichlet Processes (DP), if someone needs an intro, I have tried to balance math and intuition in a description in my thesis: Section 2.2 in [1].

[1] https://drive.google.com/file/d/1zf_MIWyLY7nxEr5UioUQ7KhOQ1_...

EDIT: I looked at the description and I confess it still has a lot of math (since it is part of thesis). I will probably translate this to be more friendly and put it on my blog.

zlacker