Why is protein folding important? Theoretical importance? Can we do something with protein folding knowledge? If so, what?
I've been hearing about AlphaFold from the CS side. There they seem to focus on protein folding primarily as an interesting space to apply their CS efforts.
So yeah the applications are enormous, from medicine to better industrial chemical processes, from warfare to food manufacturing.
The importance here is to figure out potential targets for treatments that take into account particularities of certain proteins. That could produce better drugs with less side effects.
(a) the structure of every protein (what DeepMind is doing here)
(b) how different protein structures interact (i.e. protein complexes - DeepMind is working on this but not there yet)
Then we could use those two building blocks to design new proteins (drugs) that do what we want. If we solve those two problems with very high accuracy, we can also reduce the time it takes to go from starting a drug discovery programme to approved medicine.
Obtaining all protein structures and determining how they interact is a key step towards making biology more predictable. Previously, solving the structure of a protein was very time consuming. As a result, we didn’t know the structure for a majority of proteins. Now that it’s much faster, downstream research can move faster.
Caveat: we should remember that these are all computational predictions. AlphaFold’s predictions can be wrong and protein structures will still need to be validated. Having said that, lots of validation has already occurred and confidence in the predictions grows with every new iteration of AlphaFold.
Misfolded proteins are contributors to numerous pathological conditions and the more we can understand about how and why this folding happens, the better we can treat these conditions.
Another aspect is that while we can at least partially determine the primary structure (the amino acid sequence) of proteins from DNA and RNA, we don't necessarily know their secondary or tertiary structures (3 dimensional conformation). This is a key piece of the puzzle for figuring out how these proteins do their proteiny things and how they interact with other proteins and even how they form quaternary structures with other proteins (an assembly of multiple proteins that perform some function, many pores are assemblies like this). Once we know these structures and understand how they work on a structural and chemical level, we can manipulate them far more easily.
In order to do rational drug design, which is designing a drug for a specific target or active site on a protein, we need to understand these structures. Working to solve protein folding is a key step in treating disease states and understanding how cells work on a fundamental level. The impact is hard to understate.
Apart from X-ray crystallography there are other methods for structure determination such as nuclear magnetic resonance (NMR) or cryo-electron microscopy (cryo-EM). The latter has seen a dramatic improvement in resolution over the last decade.
At this time, we create drugs, test them on animals, and see what the side effects and results actually are. We are very limited in our capabilities and basically throw mud at the wall and see what sticks. This would allow us to try potential drug candidates without so much randomness.
Another idea is these may come into play for anti-verification, so if you are drug screening against a known structure. You could potentially use these more flawed structures of proteins you don't want to target but may be similar, and try to reduce the drug's efficacy at binding them. Or something to that effect. All of that is fun ideas that are currently being explored in that space but we'll see where it takes us.
Does that imply proteins have some dynamics that need to be predicted too? I remember seeing animations of molecular machines that appeared to be "walking" inside the body - are those proteins or more complex structures?
Drugs are usually not proteins, but instead small molecules that are designed to help or interfere with the operation of proteins instead.
Every couple years there is a massive competition called CASP where labs submit previously unresolved protein structures derived from experimental EM, x-ray crystallography, or NMR studies and other labs attempt to predict these structures using their software. AlphaFold2 absolutely destroyed the other labs in the main contest (regular monomeric targets, predominantly globular) for structure resolution two years ago, in CASP 14.
https://predictioncenter.org/casp14/zscores_final.cgi
The latest contest, CASP15, is currently underway and expected to end this year. As with all ML, the usual caveats apply to the models Google generated -- the dangers of overfitting to existing structures, artifacts based on the way the problem was modelled, etc
A classic example is haemoglobin, that 'just' binds to oxygen at the iron in the middle of the haem. Other binding sites remote from the oxygen binding one can bind to other molecules - notably carbon dioxide. The 'Bohr effect' mechanism is outlined here : https://en.wikipedia.org/wiki/Bohr_effect#Allosteric_interac...
Even at the lowest level, there is some evidence that ligand binding can affect the structure of the backbone of the protein. For example, peptide plane flipping https://en.wikipedia.org/wiki/Peptide_plane_flipping although I'm not sure where the research is on this nowadays.
Researchers can come up with candidate molecule formulas that might work as good drugs, but the problem is that these proteins organize/fold themselves physically in a hard-to-predict way. And how they fold directly affects their properties as drugs.
If AlphaFold can accurately predict folding, it’ll allow researchers to prioritize drug candidates more accurately which will reduce research time and costs. Supposedly the major pharmaceutical companies can spend up to billions when designing a single drug. Optimistically, predicting protein folding better will allow for much more rapid and cheaper drug development
I think AlphaFold gives us more of a static picture that doesn't tell us much about how the protein will interact with the world. Predicting these dynamic aspects, protein-to-protein interactions, and ultimately an entire cell, is probably what they'll like to do next.
I don't know if the walking buddy is a single protein, it may be a handful of different ones joined together.
Biology has traditionally studied these things from the outside in, top down. Advances in biology are correlated with advances in microscope technology. We went from seeing with the naked eye to being able to magnify tissue to see cells and even what's inside of them.
Studying biology from the inside out, bottom up is significantly harder. You have chemistry which is hard science. On top of that you have biochemistry which is equally rigorous. Above that level of abstraction things start to get a lot less well defined. We go from organic chemistry (carbon, hydrogen, oxygen, etc.) to more complex things found in life (aminoacids, carbohydrates, lipids) to polymers of those things (proteins, polysaccharides, triglycerides) to a point where they get so big and complex their tridimensional structure starts influencing how they interact with other molecules then we start seeing aggregates (quaternary proteins) of these complex things!
Important proteins like ATP synthase are well understood (it's a friggin' turbine like in a hydroelectric power plant!) but there are so many others haven't received the same attention. Bioinformatics technology is changing that. The human genome's been mapped. Now it's time to map the proteome.
I often recommend the book 'Cats' Paws and Catapults' by Steven Vogel, which discusses the differences between human and biological engineering. There are very different constraints, as well as the more obvious difference between intelligently directed design (by humans) and evolutionary design.
It's not totally true that our engineering is simpler than biological systems. After all, we can now manipulate matter down to the atomic level. There have been attempts to move towards self-assembling systems, or even self-repairing ones. Not quite on the same scale or sophistication as biological systems, of course.
Is a modern computer chip more complex than a virus? Is a large connected system of computers more complex than a mycelial network? Are there more parts connected in more ways in a nuclear power station as there are in a simple bacterial cell?
I think it is definitely worth considering the similarities and differences.
I think my sense that biology is more complex than human engineering is that our engineering seems much more homogeneous. Computer chips are certainly very dense but the individual number of different component types are very small. Biology on the other hand seems to have a huge number of fundamental building blocks, e.g. proteins, with different behaviors. I suppose that maybe that's the wrong comparison though and software is where our engineering diversity is present.
It may well just be the case that my lack of familiarity with biology makes it seem more complex to me.
But nobody will ever understand everything about a natural cell. Levels are an ad hoc phenomenon adhered to in varying degrees in certain places to contain variation. But few elements have only one role at one identifiable level, and you can never be sure one doesn't have another. And huge amounts of apparatus might radically change behavior in a subtly different environment.
Software is where the diversity is, for computers. A cells systems are a set of interlocking networks of DNA, RNA, protein, small molecules, and signalling. As you say, these diverse parts then interact in diverse ways.
Another aspect is the large number of 'levels' in biological systems - humans have around seven; atoms, molecules, organelles, cells, tissues, organs, whole system. The interactions within and between these levels is a further source of complexity.
Yes, once you identified a target protein, its structure is useful to selectively target it. But the main bottleneck is identifying such targets. In other words, the main difficulty is to figure out what to hit, not how to hit it, and protein folding mostly helps with how at the moment.
They are called motor proteins because they convert chemical energy into kinetic energy. In the case of kinesin, it forms a dimer (two copies of itself bind together to form the two "legs") and also binds to light chains (accessory proteins that modulate its behavior) so that it can walk along filaments and drag cargo around your cells. They are both proteins and more complex structures because multiple proteins are interacting, as well as binding small molecules and catalyzing them into chemical products, all to produce the motion.
A machine learning approach for predicting toxicity would have a far greater impact on public health than AF2 does.
One of my favorite quotes about biology goes something like:
"An alien spaceship lands in your yard. You are kindly invited in to study and tour the ship. The technology is clearly billions of years more advanced than our own. Not decades more, not millennia more, not megayears more, but clearly gigayears more.
This is biology"
Sorry, can't find the exact quote and citation right now!