zlacker

[return to "A statistical approach to model evaluations"]
1. fnordp+Are[view] [source] 2024-11-29 18:56:21
>>RobinH+(OP)
This does feel a bit like under grad introduction to statistical analysis and surprising anyone felt the need to explain these things. But I also suspect most AI people out there now a days have limited math skills so maybe it’s helpful?
◧◩
2. godels+5Ke[view] [source] 2024-11-29 21:26:36
>>fnordp+Are
As an ML researcher who started in physics (this seems common among physics/math turned ML people. Which Evan is included), I cannot tell you how bad is it... One year at CVPR when diffusion models hit the scenes I was asking what people's covariance was (I had overestimated the model complexity), and the most common answer I got was "how do I calculate that?" People do not understand things like what "pdf" means. People at top schools! I've been told I'm "gatekeeping" for saying that you should learn math (I say "you don't need math to build good models, but you do to understand why they're wrong"). Not that you need to, but should. (I guess this explains why Mission Impossible Language Models won best paper...)

I swear, the big reason models are black boxes are because we _want_ them to be. There's clear anti-sentiment mentality against people doing theory and the result of this shows. I remember not too long ago Yi Tay (under @agihippo but main is @YiTayML) said "fuck theorists". I guess it's not a surprise Deep Mind recently hired him after that "get good" stuff.

Also, I'd like to point out, the author uses "we" but the paper only has one author on it. So may I suggest adding their cat as a coauthor? [0]

[0] https://en.wikipedia.org/wiki/F._D._C._Willard

◧◩◪
3. abhgh+Rrf[view] [source] 2024-11-30 06:22:58
>>godels+5Ke
Personal sad story, but hopefully relevant: during my recent PhD I worked on a problem where I used a Dirichlet Process in my solution. That paper has been bouncing around for the past few years getting rejected from every venue I have submitted it to. My interpretation is that most reviewers (there are exceptions - too few to impact the final voting) don't understand any non-DL theory anymore and are not willing to read up for the sake of a fair review. This is based on their comments, where we have been told that our solution is complex (maybe? - but no one suggests an alternative), exposition is not clear (we have rewritten the paper a few times - we rewrite it based on comments from venue i to submit to venue i+1 - its a wild goose chase), and in one case, someone said the paper is derivative because it uses Blackwell-MacQueen sampling; their evidence? - they skimmed through a paper we had cited that also used the sampling algorithm. This is like saying a paper is derivative because it uses SGD.

I am on the review panel of some conferences too and it is not uncommon to be assigned a paper outside of my comfort zone. That doesn't mean I cut and bail. You set aside time, read up on the area, ask authors questions, and judge accordingly. Unfortunately this doesn't happen most of the time - people seem to be in a rush to finish their review no matter the quality. At this point, we just mechanically keep resubmitting the paper every once a while.

Sorry, end of rant :)

◧◩◪◨
4. godels+JAf[view] [source] 2024-11-30 09:08:20
>>abhgh+Rrf

  > Sorry, end of rant :)
Don't be. Issues like this are the reason I haven't defended yet. The fact that an AC didn't laugh at that "critique" is itself indicative of a problem. They're as checked out as the reviewers. I was doing work in a more mathy area and could not get assigned reviewers that understood what was being done. To try to get something through I tried a more popular domain, won a bet with my advisor that I could get SOTA on a very popular dataset in a few months, but I have no compute left. I can beat big labs on one dataset with far less compute, but how can I compete when reviewers want dozens? Even if others weren't held to that standard... There's not enough compute for that. You can always have "more experiments"

For review, I set aside hours for each paper, and more the further out of my domain that they are (I'm also very happy to increase my score with a rebuttal and mark lower confidence (I frequently write what would change my mind to help authors). My best post rebuttal ever was "The authors answered all my questions, but due to the lack of novelty I'm lowering my score"). I'll keep doing this, but to be honest, after I defend I have no intention to push to conferences or journals. I just fail to see the value. It has caused me to spend more time rewriting and taking away from research. It just makes me upset and isn't making me a better researcher. I crave for someone to actually _criticize_ my work. I have a good citation count and h-index. My best paper is "unpublished", has hundreds of citations, resulted in a very popular blog post, and years later people are still messaging me for using it in their work. I don't think I'm a top researcher, but I don't think I'm well below the pack.

I just hate that my research directions are pigeonholed. That you need to do topics that people care about. That you need to evaluate with large scale. As if we can't have conclusions beyond the empirical. As if this isn't about communicating our work. That I need to write to those that are not "peers" (niche domain experts, as opposed to broader domain experts). As if experiments aren't proxies, but are demonstrations of a product. I think this significantly slows down the progress to AGI since it causes us to railroad to build from large models from big companies, and there is so little interest in anything else. How can we explore more architectures, learning methods, and all that if we're required to get SOTA out of the gate?

I don't want to say too much about my work since it is still bouncing around in review and I don't want to dox myself. But I'll say something about a work that I __reviewed__. It was for Neural PDEs. Review was for a workshop, and it was clear to me that this paper was rejected from the main conference. What was not clear is why. Until I got to see the reviews form my peers. Their complaints had the standard "novelty" and "not well written" (it was very well written btw), but the kicker for them was that the datasets were synthetic... Like... what?! Why does that even matter? They're solving equations! Luckily they had low confidence and I got the paper through. I wasn't surprised when a few months later I stumbled upon the paper again and found out it was from Welling's group.

  > At this point, we just mechanically keep resubmitting the paper every once a while.
I really wonder how long it will take conference organizers to recognize that the noise in the review process is a significant contributor to the increasing number of submissions. This seems a rather obvious connection but I rarely hear it discussed. Not to mention that it can damage papers quality (this certainly happened to mine, and I suspect yours). Reviews can improve the papers if the review contains actual critiques. But hey, why do work when no one questions a reject?

I feel like mine was more ranty lol. But it helps to not feel alone.

◧◩◪◨⬒
5. abhgh+zEg[view] [source] 2024-11-30 22:44:02
>>godels+JAf
In my view, in most cases ACs now perform one of two roles (1) summarize what reviewers have said, and/or (2) if there is a high variance in scores, they request the reviewers to sort it out among themselves, and THEN summarize :). I am exaggerating here - sometimes they do engage, but these cases are far in between. Among my personal experiences, this has only happened twice: (a) during an ARR review where I was an author, we had requested the committee to intervene because one set of reviews went wildly off-track (along the lines of "more experiments", like you mention), (b) as a reviewer, I had pointed out some glaring benchmarking omissions, and the AC took time to understand the concern and decided to bring it up for discussion.

  > I crave for someone to actually _criticize_ my work
Yes! But if you are not doing mainstream DL, good constructive criticism is almost impossible to obtain. I feel like many reviewers expect half the paper to be a tutorial if it is not a trendy topic. Which I find is unfair for multiple reasons: (1) for a trendy topic much more complex topics are unexplained, because it is assumed the reviewer has heard of it, (2) yes, the previous point makes sense, but this is what cited materials or the Appendix is for, (3) most conferences have a page limit for the main paper, so you cannot go about rambling and arbitrarily explaining ideas, and (4) this is supposed to be a rigorous review process, it is not supposed to be easy. It shouldn't come as a shock that some papers take (sometimes a lot of) work to review!
◧◩◪◨⬒⬓
6. godels+HFg[view] [source] 2024-11-30 22:57:39
>>abhgh+zEg

  > I feel like many reviewers expect half the paper to be a tutorial if it is not a trendy topic. 
I tried to push a paper using a GAN to a workshop. I was asked to spend more time explaining what a GAN is "for those unfamiliar with the topic." I was baffled. Sure, ML moves fast, but that's catastrophic forgetting right there... (and in a fucking workshop?!)

I honestly believe that if someone is not intimately familiar with GANs then they should not be reviewing for a generative vision workshop.

  > I feel like many reviewers expect half the paper to be a tutorial if it is not a trendy topic. 
The difficulty of this gets harder with different topics. Generative works should show samples. But how much? How representative? These are crucial to evaluating the work but they devour your text limits.

It is always easy to ask for more. But with page limits there are clearly limits. I think it is too subjective. I wish we went more the direction of math papers which are often VERY short. Use as much space as you need. No more, no less. I think the formats are just too limiting (not to mention that paper isn't great for a lot of topics like video, point cloud, audio, pose estimation, and many others. But momentum is a powerful force.

[go to top]