AlphaFold reveals the structure of the protein universe

>>MindGo+(OP)
I got a 5th grader question about how proteins are used/represented graphically that I've never been able to find a satisfying answer for.

Basically, you see these 3D representations of specific proteins as a crumple of ribbons-- literally like someone ran multi-colored ribbons though scissors to make curls and dumped it on the floor (like a grade school craft project).

So... I understand that proteins are huge organic molecules composed of thousands of atoms, right? Their special capabilities arise from their structure/shape. So basically the molecule contorts itself to a low energy state which could be very complex but which enables it to "bind?" to other molecules expressly because of this special shape and do the special things that proteins do-- that form the basis of living things. Hence the efforts, like Alphafold, to compute what these shapes are for any given protein molecule.

But what does one "do" with such 3D shapes?

They seem intractably complex. Are people just browsing these shapes and seeing patterns in them? What do the "ribbons" signify? Are they just some specific arrangement of C,H,O? Why are some ribbons different colors? Why are there also thread-like things instead of all ribbons?

Also, is that what proteins would really look like if you could see at sub-optical wavelength resolutions? Are they really like that? I recall from school the equipartition theorem-- 1/2 KT of kinetic energy for each degree of freedom. These things obviously have many degrees of freedom. So wouldn't they be "thrashing around" like rag doll in a blender at room temperature? It seems strange to me that something like that could be so central to life, but it is.

Just trying to get myself a cartoonish mental model of how these shapes are used! Anyone?

>>crispy+Ww
The ribbons and helices you see in those pictures are abstract representations of the underlying positions of specific arrangements of carbon atoms along the backbone.

There are tools such as DSSP https://en.wikipedia.org/wiki/DSSP_(hydrogen_bond_estimation... which will take out the 3d structure determined by crystallography and spit out hte ribbons and helices- for example, for helices, you can see a specific arrangement of carbons along the protein's backbone in 3d space (each carbon interacts with a carbon 4 amino acids down the chain).

Protein motion at room temperature varies depending on the protein- some proteins are rocks that stay pretty much in the same single conformation forever once they fold, while others do thrash around wildly and others undergo complex, whole-structure rearrangements that almost seem magical if you try to think about them using normal physics/mechanical rules.

Having a magical machine that could output the full manifold of a protein during the folding process at subatomic resolution would be really nice! but there would be a lot of data to process.

>>dekhn+jy
Thanks, awesome! So what do molecular biologists do with these 3D representations once they have them? Do they literally just see how they fit to other proteins?

>>crispy+ez
There are many uses for structure. Personally, I find the 3d structures to be useful as a mental guide for picturing things, and certainly people do try to "dock" proteins that have complementary structures, but unfortunately, the biophysics of protein complexes suggests that the conformation change on binding is so large that the predicted structures aren't super-helpful.

Certainly, in a corpo like mine (Genentech/Roche) protein structures have a long history of being used in drug discovery- not typically a simple "dock a ligand to a protein" but more for constructing lab experiments that help elucidate the actual mechanistic biology going on. That is only a tiny part of a much larger process to work on disease targets to come up with effective treatments. Genentech is different from most pharma in that their treatments are themselves typically proteins, rather than small molecules.

>>dekhn+vB
How bad is our understanding of force fields?

It seems like that's the basic principle to understand.

>>edmcnu+L81
I think many people would say that in principle, you could make a QM force field with an accurate enough basis function that an infinitely long simulation would recapitulate the energy landscape of a protein, and that information could be used to predict the kinetically accessible structures the protein adopts.

In practice, the force fields are well understood but to be computationally efficient, they have to approximate just about everything. Examples: since number of inter-atom distance pairs goes up with N**2 atoms, you need to have tricks to avoid that and instead scale around n log n or even n if you can do it. When I started, we just neglected atoms more than 9 angstrom apart, but for highly charged molecules like DNA, that leads to errors in the predicted structure. Next, typically the force fields avoid simulating polarizability (the ability of an atom's electron cloud to be drawn towards another atom with opposite charge), also because expensive. They use simplified spring models (lterally hooke's equation) for bond lengths, bond angles. The torsions (the angle formed by 4 atoms in a row) haev a simplified form. The interatomic relationships are not handled in a principled way, instead treating atoms as mushy spheres....

After having made major contributions in this area, I don't think that improvements to force fields are going to be the most effective investment in time and energy. There are other bits of data that can get us to accurate structures with less work.

>>dekhn+6e1
That's interesting. Didn't realize that. It sounds like we're just working around slower computation speed.

In an fantasy world if we had infinite computation speed/space we'd be able to just model the force field and predict from there.

>>edmcnu+Sa2
Yes, that's a fantasy world. I explored this using the Exacycle system at Google and we did actually do a couple things that nobody else could have at the time, but even that extraordinary amount of computing power really is tiny. The problem is the "force field" isn't just the enthalpic contributions I listed above, but also depends intimately on much more subtle entropic details- things like the cost of rearranging water into a more ordered structure have to be paid for. Estimating those is very expensive- far worse than just enumerating over large numbers of proteins "in vacuo", and probably cannot be surmounted, unless quantum computing somehow becomes much better.

Instead, after spending an ordinate amount of Google's revenue on extra energy, I recommended that Google instead apply machine learning to protein structure prediction and just do a better job of extracting useful structural information (note: this was around the time CNNs were in vogue, and methods like Transformers didn't exist yet) from the two big databases (all known proteins/their superfamily alignments, and the PDB).

Note that this conclusion was a really hard one for me since I had dedicated my entire scientific career up to that point in attempting to implement that fantasy world (or a coarse approximation of it), and my attempts at having people develop better force fields (ones that didn't require as much CPU time) using ML weren't successful. What DeepMind did was, in some sense, the most parsimonious incremental step possible to demonstrate their supremacy, which is far more efficient. Also, once you have a trained model, inference is nearly free compared to MD simulations!

>>dekhn+Sj2
That's interesting. Thanks for the info. They're getting better at Quantum. It's going to be fascinating to see the future of this field and all the potential medicine waiting to be discovered and the lifespan improvements and just sheer biological discoveries. It feels almost like the new panning for gold. :) It's pretty crazy to see how human advancement parallels computing power in so many areas.

zlacker