AlphaFold reveals the structure of the protein universe

>>MindGo+(OP)
Before my comment gets dismissed, I will disclaim I am a professional structural biologist that works in this field every day.

These threads are always the same: lots of comments about protein folding, how amazing DeepMind is, how AlphaFold is a success story, how it has flipped an entire field on it's head, etc. The language from Google is so deceptive about what they've actually done, I think it's actually intentionally disingenuous.

At the end of the day, AlphaFold is amazing homology modeling. I love it, I think it's an awesome application of machine learning, and I use it frequently. But it's doing the same thing we've been doing for 2 decades: pattern matching sequences of proteins with unknown structure to sequences of proteins with known structure, and about 2x as well as we used to be able to.

That's extremely useful, but it's not knowledge of protein folding. It can't predict a fold de novo, it can't predict folds that haven't been seen (EDIT: this is maybe not strictly true, depending on how you slice it), it fails in a number of edge cases (remember, in biology, edge cases are everything) and again, I can't stress this enough, we have no new information on how proteins fold. We know all the information (most of at least) for a proteins final fold is in the sequence. But we don't know much about the in-between.

I like AlphaFold, it's convenient and I use it (although for anything serious or anything interacting with anything else, I still need a real structure), but I feel as though it has been intentionally and deceptively oversold. There are 3-4 other deep learning projects I think have had a much greater impact on my field.

EDIT: See below: https://news.ycombinator.com/item?id=32265662 for information on predicting new folds.

>>COGlor+JD
Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

signed, someone with a PhD in biomedical engineering, did a ton of bio work.

>>ramraj+oJ
> Not sure if you should be reminded of how alpha fold started, it started by winning a competition thought un winnable by academics. Top labs working in protein structure prediction have fundamentally changed direction after alpha fold and are working to do the same even better.

Not sure what part of "it does homology modeling 2x better" you didn't see in my comment? AlphaFold scored something like 85% in CASP in 2020, in CASP 2016, I-TASSER had I think 42%? So it's ~2x as good as I-TASSER which is exactly what I said in my comment.

>This is not the first (or even tenth) time I’m seeing an academic trying to undermine genuine progress almost to the level of gaslighting. Comparing alphafold to conventional homology modeling is disingenuous at its most charitable interpretation.

It literally is homology modeling. The deep learning aspect is to boost otherwise unnoticed signal that most homology modeling software couldn't tease out. Also, I don't think I'm gaslighting, but maybe I'm wrong? If anything, I felt gaslit by the language around AlphaFold.

>Not sure what else to say. Structural biology has always been the weirdest field I’ve seen, the way students are abused (crystallize and publish in nature or go bust), and how every nature issue will have three structure papers as if that cures cancer every day. I suppose it warps one’s perception of outsiders after being in such a bubble?

What on earth are you even talking about? The vast, VAST majority of structures go unpublished ENTIRELY, let alone published in nature. There are almost 200,000 structures on deposit in the PDB.

>>COGlor+XK
What ramraj is talking about: if you go into a competitive grad program to get a PhD in structural biology, your advisor will probably expect that in 3-4 years you will: crystallize a protein of interest, collect enough data to make a model, and publish that model in a major journal. Many people in my program could not graduate until they had a Nature or Science paper (my advisor was not an asshole, I graduated with just a paper in Biochemistry).

In a sense both of you are right- DeepMind is massively overplaying the value of what they did, trying to expand its impact far beyond what they actually achieved (this is common in competitive biology), but what they did was such an improvement over the state of the art that it's considered a major accomplishment. It also achieved the target of CASP- which was to make predictions whose scores are indistinguishable from experimentally determined structures.

I don't think academics thought CASP was unwinnable but most groups were very surprised that an industrial player using 5 year old tech did so well.

>>dekhn+xR
> What ramraj is talking about: if you go into a competitive grad program to get a PhD in structural biology, your advisor will probably expect that in 3-4 years you will: crystallize a protein of interest, collect enough data to make a model, and publish that model in a major journal.

All of that is generally applicable to molecular biology in general, and I don't see how the field of structural biology is especially egregious, the way ramraj is making it out to be.

>>valara+VY
Protein crystallization can be very difficult and there is no general solution. Kits that screen for crystal growth conditions usually help but optimization is needed in most cases. Then, that crystal must have certain properties that allow for good data acquisition at the X-ray facility. That’s another problem by itself and months or years can pass until you get a suitable protein crystal and X-ray diffraction dataset where you can model your structure.

>>flobos+Q21
I'm familiar with protein crystallization and the difficulties associated with it. What I don't agree with is the characterization of the field as especially difficult, above and beyond modern biology in general. Nor can I support the assertion that structural biology students are subject to special abuse that regular grad students are not.

> ... can be very difficult and there is no general solution

This is true of pretty much any graduate work in molecular biology.

>>valara+y51
In most other sub fields you don’t get to not publish if exactly one endpoint never comes to pass. I know I didn’t have something like that, and most of my non crystallographer friends didn’t.

There’s a lot of structural biology apologists here in this thread. Happy to crap on DeepMind but not ready to take criticism of their own field.

For anyone outside of the field wanting to learn more, check out this documentary: https://en.m.wikipedia.org/wiki/Naturally_Obsessed

>>ramraj+Bn1
> In most other sub fields you don’t get to not publish if exactly one endpoint never comes to pass. I know I didn’t have something like that, and most of my non crystallographer friends didn’t.

How is this a problem unique to structural biology? In every subfield we're hoping to publish interesting results, and that endpoint is defined by the nature of the field. As a geneticist, in the early 90s, sequencing & characterizing a single bacterial gene would have been the focus of an ambitious PhD thesis and would yield multiple papers. Sequencing at that time period had a dozen points of failure and high risk to set as the goal for a thesis. Today, sequencing a whole genome is unlikely to yield a single publication. If you're setting the ability to crystallize as the single point of failure endpoint, that logic applies to every subfield. We all have something that could potentially derail our plans, and I fail to see how structural biology is unique in that respect.

> There’s a lot of structural biology apologists here in this thread. Happy to crap on DeepMind but not ready to take criticism of their own field.

I'm not a structural biologist - I'm a Geneticist who disagrees with your characterization of SB. The issues you've mentioned are not unique to SB, but apply to pretty much all subfields. I see grad students in general lament their life choices when their cell culture fails, their mice die, protocols just don't work, or their results just don't make sense.

>>valara+Ey1
> If you're setting the ability to crystallize as the single point of failure endpoint, that logic applies to every subfield.

I agree that there are other fields with similar issues. What baffles me is how long protein crystallization has been a problem.

I’ll use your example:

Nowadays, sequencing a gene is unlikely to yield a single publication by itself but is no early point of failure. It’s a solved problem with protocols that have been thoroughly developed and explained to boredom. New early points of failure arise (sample related, maybe?).

Nowadays, determining the structure of a protein is unlikely to yield a single publication by itself but has a clear, early, unsolved point of failure. No understandable protocol other than buying $creening plate$, fetching cat whiskers, drawing a theoretical phase diagram that tells you nothing, and pray that your crystallization tray doesn’t show a scrambled egg tomorrow or in six weeks. This has been an issue for more than fifty years and almost 200k published structures. The jump you mentioned in sequencing hasn’t happened yet in protein crystallography and might never happen because our understanding of macromolecular crystallization is lacking and thus we cannot predict proper crystallization conditions.

>>flobos+BG1
Sure, I agree that crystallization in particular has faced this particular bottleneck for a long time. The field of SB, however, has still managed to advance massively too. For example, Cryo-EM can do things we could barely imagine a decade ago.

The point I'm trying to make is that from the perspective of a grad student, no field is devoid of risk, and it's surprisingly easy to be stuck by something that's a solved problem on paper. For example, I know of a grad student that's been trying to develop a mouse line for about a year now, and has now discovered that this strain just won't work for what they have in mind - and must now recreate the mutant combinations in a different strain that's at least a year's work - if it even works. I've heard stories of entire mouse lines die, and you're back to square one - years of work lost.

The other thing that complicates some of these fields is the massive pace of innovation they're undergoing that it is very hard for an individual lab to keep up to date. Grad students are using techniques that were published less than 5 years ago, and there's no locally available expertise to tap into. What remains the same is the level of grunt work grad students and postdocs have to do, even if the techniques get more sophisticated over time.

zlacker