zlacker

[return to "A statistical approach to model evaluations"]
1. fnordp+Are[view] [source] 2024-11-29 18:56:21
>>RobinH+(OP)
This does feel a bit like under grad introduction to statistical analysis and surprising anyone felt the need to explain these things. But I also suspect most AI people out there now a days have limited math skills so maybe it’s helpful?
◧◩
2. godels+5Ke[view] [source] 2024-11-29 21:26:36
>>fnordp+Are
As an ML researcher who started in physics (this seems common among physics/math turned ML people. Which Evan is included), I cannot tell you how bad is it... One year at CVPR when diffusion models hit the scenes I was asking what people's covariance was (I had overestimated the model complexity), and the most common answer I got was "how do I calculate that?" People do not understand things like what "pdf" means. People at top schools! I've been told I'm "gatekeeping" for saying that you should learn math (I say "you don't need math to build good models, but you do to understand why they're wrong"). Not that you need to, but should. (I guess this explains why Mission Impossible Language Models won best paper...)

I swear, the big reason models are black boxes are because we _want_ them to be. There's clear anti-sentiment mentality against people doing theory and the result of this shows. I remember not too long ago Yi Tay (under @agihippo but main is @YiTayML) said "fuck theorists". I guess it's not a surprise Deep Mind recently hired him after that "get good" stuff.

Also, I'd like to point out, the author uses "we" but the paper only has one author on it. So may I suggest adding their cat as a coauthor? [0]

[0] https://en.wikipedia.org/wiki/F._D._C._Willard

◧◩◪
3. abhgh+Rrf[view] [source] 2024-11-30 06:22:58
>>godels+5Ke
Personal sad story, but hopefully relevant: during my recent PhD I worked on a problem where I used a Dirichlet Process in my solution. That paper has been bouncing around for the past few years getting rejected from every venue I have submitted it to. My interpretation is that most reviewers (there are exceptions - too few to impact the final voting) don't understand any non-DL theory anymore and are not willing to read up for the sake of a fair review. This is based on their comments, where we have been told that our solution is complex (maybe? - but no one suggests an alternative), exposition is not clear (we have rewritten the paper a few times - we rewrite it based on comments from venue i to submit to venue i+1 - its a wild goose chase), and in one case, someone said the paper is derivative because it uses Blackwell-MacQueen sampling; their evidence? - they skimmed through a paper we had cited that also used the sampling algorithm. This is like saying a paper is derivative because it uses SGD.

I am on the review panel of some conferences too and it is not uncommon to be assigned a paper outside of my comfort zone. That doesn't mean I cut and bail. You set aside time, read up on the area, ask authors questions, and judge accordingly. Unfortunately this doesn't happen most of the time - people seem to be in a rush to finish their review no matter the quality. At this point, we just mechanically keep resubmitting the paper every once a while.

Sorry, end of rant :)

◧◩◪◨
4. someth+C0g[view] [source] 2024-11-30 15:34:29
>>abhgh+Rrf
Just a note

> exposition is not clear (we have rewritten the paper a few times - we rewrite it based on comments from venue i to submit to venue i+1 - its a wild goose chase)

Does not mean that the paper is invalid, but maybe the storyline is difficult to follow, the results not easy to interpret, or overall badly written or missing justifications. Even if you take into account the reviews to rewrite it, it doesn't mean the paper is clear and easy to understand.

As you noted, researchers need to read material outside of their confort zone, and the publications have shifted in focus. Before you could expect a reader to be familiar to the topic, now you need to educate him as clearly as possible.

I picked a random text inside the paper > The workings of the technique itself are presented at a high-level in Figure 2.

Annoying to read.

> Instead of learning the training distribution directly, which might be expensive because of the dimensionality of the data, we first project the data down to one dimension.

Why is that good enough? Justification missing

> This is done just once, and is shown in the left panel in Figure 2. Since we are solving for classification, we pick this dimension to be a numeric indicator of how close an instance is to a class boundary.

Why is it a good indicator, justification

> As a convenient proxy, we train a separate highly accurate probabilistic

Ok, references on previous research that show it can work?

So in essence, I don't say you need to explain everything, but the text could be more clear on the choices and why they make sense.

My gut feeling is that you know and understand what you are doing, but you miss too many justifications that proves your work valuable.

I didn't read the whole thing, so maybe I'm missing the picture, but from random sampling on the text I expect the rest to follow the same.

While I read the introduction, I don't want to read 'we did that and that and that'. But 'there was this issue, we solve it in this way because this reason '

And following issues->solution->why should give me enough understanding of what you are trying to achieve.

Follow-up sections should refine the solutions

◧◩◪◨⬒
5. abhgh+2Pg[view] [source] 2024-12-01 00:48:55
>>someth+C0g
Thank you for these comments. I appreciate them and I'll consider them in my next draft. However, I would like to point out a few things; just so that we have the larger picture in mind. Again, I do appreciate you took the time to look up the paper.

1. When I said we revise the paper between two submissions, I wasn't implying it was becoming "better". The message was that there is no general consensus around what should be expanded and what might be concise. Someone believes you should discuss prior work more, someone thinks the main algorithm requires more elaboration, someone wants you to talk more about BayesOpt etc., but you just have <10 pages in the main paper, and putting this stuff in the Appendix, or citing source, doesn't seem to be good enough in many cases (another comment in a sibling thread gives an example wrt GANs, and my experiences have been no different).

2. You say you randomly picked a few sentences to read; that's good for a casual discussion but that should not be how a review process functions. Some of the best reviewers I've encountered (and I hope I am continuing in that tradition) come back to say something like "I see what you're getting at, but your intro. doesn't sell it well enough; think about writing it like this ...". Rejecting based on random skimming is exactly one of the things I'm calling out. Let's face it - like a lot of things, high quality reviewing is hard. It isn't supposed to be quick or easy.

3. Predicting how much to elaborate: this is probably an extension of the first point, but I feel like this has become way harder in the recent years. The rule that mostly works seems to be that if its not a trending topic explain it as much as you can, because cited background material is overlooked. This is unfair for areas that are not trending - the goal of research should be to situate itself closer to "explore" on the "explore-exploit" spectrum, but the review system today heavily favors "exploit". And like I mentioned, a page limit means that the publication game stacked against people not working on mainstream ideas. This should not be the case.

◧◩◪◨⬒⬓
6. someth+Bkh[view] [source] 2024-12-01 08:59:25
>>abhgh+2Pg
1. When I said we revise the paper between two submissions, I wasn't implying it was becoming "better". The message was that there is no general consensus around what should be expanded and what might be concise. Someone believes you should discuss prior work more, someone thinks the main algorithm requires more elaboration, someone wants you to talk more about BayesOpt etc., but you just have <10 pages in the main paper, and putting this stuff in the Appendix, or citing source, doesn't seem to be good enough in many cases (another comment in a sibling thread gives an example wrt GANs, and my experiences have been no different).

That's exactly my point, the reviews do not converge because the message is too diffuse or not justified enough. I recently had a paper rejected because it was too difficult to understand, it was on 4 pages, now it's sent to a better journal and was expanded to 20 pages. The content was too big for a 4 pages content, we couldn't fit enough justifications. But in your paper you still have many places where the text could be shorter and clearer, gaining at least 1 page of content. Learning to write good research takes a lot of time, and a phd is the place where ideally this should happens. It's difficult, but you'll get there if you work on it enough! Read best paper awards of good conferences, notice how much material is there in the same number of pages, and reverse engineer what they did to make the paper clear, concise and easy to follow.

2. You say you randomly picked a few sentences to read; that's good for a casual discussion but that should not be how a review process functions. Some of the best reviewers I've encountered (and I hope I am continuing in that tradition) come back to say something like "I see what you're getting at, but your intro. doesn't sell it well enough; think about writing it like this ...". Rejecting based on random skimming is exactly one of the things I'm calling out. Let's face it - like a lot of things, high quality reviewing is hard. It isn't supposed to be quick or easy.

You cannot choose who will read. But even for the more throughout readers, if it's difficult to understand / missing justifications from the beginning, they will give a bad review, even if they read the whole thing. Reading should be like a conversation with the author, if I find the conversation with the author through the paper too sloppy or erratic, I will not understand the message, that's what happens when I ask more justifications on some part to the author. It's because I couldn't follow the logic enough or I was not agreeing with some part, so I require more justifications.

3. Predicting how much to elaborate: this is probably an extension of the first point, but I feel like this has become way harder in the recent years. The rule that mostly works seems to be that if its not a trending topic explain it as much as you can, because cited background material is overlooked. This is unfair for areas that are not trending - the goal of research should be to situate itself closer to "explore" on the "explore-exploit" spectrum, but the review system today heavily favors "exploit". And like I mentioned, a page limit means that the publication game stacked against people not working on mainstream ideas. This should not be the case.

I agree, there are no more general experts, everyone works in a very niche subfield, you don't get people that know the sota. Learning the good tradeoff is difficult. My threshold is: don't explain the math unless it's not self obvious why. For example for some equation I can give more insights on how it affects my method and if a parameter of the equation is very important to my method, a complete analysis of its effects and analogies and experiments to see its impact. I try to make the main story line as crystal clear as possible, if I deviate too much, it's better on a second paper. My experiments should reflect not trivial things. Finally I make sure the abstract corresponds to the text. I mainly don't work in deep learning, so by default my topics are extremely hard to find reviewers, I feel the pain. But it's my work to make them understand what I'm achieving and why it's important.

Hope that helps :)

◧◩◪◨⬒⬓⬔
7. abhgh+Pmh[view] [source] 2024-12-01 09:36:56
>>someth+Bkh
I feel like we are debating slightly different perspectives, and with that lens I agree with what you say. Here is the difference in perspectives (and this is decoupled from this particular paper): your take is that today the reviews work in a certain way and here are some things we could do to maximize our chances of acceptance. My take is that reviews shouldn't work this way.

To take some examples:

1. > You cannot choose who will read.

Specifically no, but generally, yes. I'd expect the reviewer to understand ML. And if this is not the brand of ML they're familiar with, I'd expect them to put in the work to familiarize themselves during the review process, in the interest of fairness. After all are we not seeking out qualified reviewers for the review process? This is not just anyone who stumbles across a paper on the internet.

2. > message is too diffuse

Any message would appear diffuse/opaque/abstract to someone unfamiliar with the area. This is exactly why an objective review process must equalize such communicative biases. This is partly facilitated by the conference picking the right reviewers and with their review-assignments, and partly it is also the duty a reviewer to fill in whatever gaps of comprehension that remain.

3. > Read best paper awards of good conferences, notice how much material is there in the same number of pages, and reverse engineer what they did to make the paper clear, concise and easy to follow.

Good general advice but you are preaching to the choir. I do read best papers from various conferences and I run reading groups where we discuss papers from ongoing conferences. I run an applied ML research group in the industry - this pretty much comes with the job. Further, I don't think that best papers are head-and-shoulders above non-best papers; they are often voted to the top because they solve a broadly known problem, or they further the understanding of such a problem. Writing plays some role here, but is not the discriminative factor.

4. Requiring justifications. Yes, there is a rebuttal phase for that.

Just to be doubly clear, I am not saying papers (and this paper) can't be improved. But that is not the argument I am making.

◧◩◪◨⬒⬓⬔⧯
8. someth+kqh[view] [source] 2024-12-01 10:28:10
>>abhgh+Pmh
I totally understand your take, and you are right in some aspect. Reviews SHOULD NOT WORK IN THIS WAY. I'm totally agreeing on that, but consider this, the population is increasing, the number of topics are increasing, we are no more in a system where the review system work as it was designed to.

When it was designed, only a few people got the chance to read, and information was not easily available. So, people became experts, and knowing the sota was mandatory. Now, due to the high quantity of (good and bad) researches, you cannot expect the review system to work properly.

But you are still stuck in this system. So consider what is important:

- do you want to write and hope that by chance the right people will read it, and they will be educated enough in your topic and have enough time at their disposal. Or

- Do you think your idea is good and should be more known.

If it's the later, it's your work to make your idea as clear as possible so that any (good) researcher can understand it, and therefore use it. We must work in the reality of the current system if we want to spread interesting ideas to the community. The publication system is a social system, and it evolves with the people inside, you want to write to spread knowledge. How can you do that if the probability is that the reader will not fully understand?

The time is very limited and I always have many things to do, I only read what I filter as worthy enough. That filter is based on the quality of writing. If some paper is important but badly written, it will automatically fall in my 'if I have time to read it' and most of the time it will never reach the 'to read' category, because there are many many paper well written with good ideas inside.

We work in a biased system. It's extremely difficult to find reviewers, we do what we can with what we have.

I also was infuriated when I got a review saying 'you didn't explain structure from motion' in a conference with a topic on structure from motion. But the reality is this. If I want my papers to be read, I must adapt to my audience.

> Read best paper awards of good conferences

With that sentence, I did not mean 'read it for knowledge' but read it with the lenses of the writer, why did they present the topic in this way, what makes this paper clear and another on the same topic not clear at all. Reverse engineer the writing style. It's not about knowledge of the content of the paper, it's about communication. Best paper do not always have the best ideas inside, but they are presented in a way that even if the topic is difficult they provide insights on it. And often those insights are what readers want to read. The maths are not important, the important thing is the insight you give to the readers. That insight can be translated in their field of they internalized it enough.

◧◩◪◨⬒⬓⬔⧯▣
9. godels+VYi[view] [source] 2024-12-02 06:02:54
>>someth+kqh

  > I also was infuriated when I got a review saying 'you didn't explain structure from motion' in a conference with a topic on structure from motion. But the reality is this. If I want my papers to be read, I must adapt to my audience.

Honestly, I hate this take. I don’t think it’s good for science or academia. Papers do not need be readable to everyone. The point is to be readable by other experts in your niche. Otherwise, I don’t know who to write to, and that’s exactly the same problem the parent is having.

Writing to too broad of an audience also makes papers unnecessarily long. You have to spend more time motivating the work and more time on the background. This has spinoff effects where reviewers can demand you cite them, contributing to the citation mining nightmare. I’ve seen 8 page papers with 100+ references (the paper I referenced has 78). This is more what we expect from a survey paper. When background sections are minimal you can’t justify asking unless you are critical to the exact problem being solved.

Every paper rewrite is time and money that should be better spent on research or other activities. Every rewrite is an additional submission into the next conference. I don’t think labs are submitting 20+ papers per round because they wrote 20 papers in that 3 months (with some exceptions) but rather because they wrote a bunch and are recycling works from the previous few years. This increases reviewer load as well.

The question then is how people enter a topic. Truth be told, I don’t think it’s any easier than when papers had under 30. For reference, that one Cybenko paper we all know has under 30 references but is 10.5 pages. What I think we should do instead is allow citing of blogs and encourage people to write tutorials. I think this would actually be a really useful task for 2nd and 3rd year PhD students. You learn a lot when writing those things and that’s the stage where you should be entering expert level at your niche. The problem is that we have no incentivize to do any of the other critical tasks in academia. This is why I personally hate it. We are hyper focused on this novelty thing but in reality that doesn’t exist and is highly subjective. It just encourages obscurification, which we’ve routinely seen from high profile labs.

We work in ML, how are we not all keenly aware of reward hacking and knock-on effects?! I honestly think the fact that we cannot get our house in order is evidence that we can’t safely build safe AGI yet. This is certainly magnitudes easier of a task, not to mention has significant reward (selfishly, it highly benefits us too!). Everyone feels that something is off but no one wants to do anything about it. We’ve only implemented half added measures that are more about signaling. Can’t let an LLM review for you? But the author is responsible for proving the reviewer used one? We’re all ML experts… we all know this isn’t possible except in the most cases. It’s as if you got shot while blindfolded and the police won’t investigate until you bring them evidence of who shot you and with what kind of gun. It shouldn’t matter if a review is bad because it was written by an LLM or because it was by a human. Just like it shouldn’t matter if you were shot or stabbed.

  > The maths are not important, the important thing is the insight you give to the readers.
I also hate this take. The math often __is__ the insight. I agree that a lot of papers have needless math (look at any diffusion paper or any paper with attention copy pasting the same equations). But other works need them. The reason to use math is the same reason we program. If there was an easier way to communicate, we would (note: math isn’t just symbols, it can be words too). Math and programming are hard because they are languages that are precise and dense. The precision is important when communicating. Yes, it might take longer to parse but it is unambiguous when interpreted (it is also easier to parse when you’re trained and in the habit. Just like any other language).

I think we lost our way in academia. We got caught up in by excitement. We let the bureaucrats take too much control and dictate the universities. We got lost in our egos (definitely not new) and too focused on prestige and fame. Our systems should be fighting these things, not enabling or encouraging them. Yes, the people at the top benefit from these systems, but the truth is that even they benefit from fixing things.

[go to top]