zlacker

Can someone put AlphaFold's problem space into perspective for me?

Why is protein folding important? Theoretical importance? Can we do something with protein folding knowledge? If so, what?

I've been hearing about AlphaFold from the CS side. There they seem to focus on protein folding primarily as an interesting space to apply their CS efforts.

replies(12): >>fabios+u >>epups+21 >>pbw+B1 >>axg11+i2 >>feet+P2 >>andrep+79 >>flobos+t9 >>empiri+Na >>Freeby+oc >>green-+3j >>turkey+dk >>matheu+so

>>dalbas+(OP)
You are basically made of proteins, which are basically folded sequences of amino acids, proteins are molecular machines that are the fundamental building block of animals, plants, bacteria, fungi, viruses etc.

So yeah the applications are enormous, from medicine to better industrial chemical processes, from warfare to food manufacturing.

replies(1): >>jebark+Ah

>>dalbas+(OP)
Proteins are what makes everything in a cell work. They are produced as a "linear" structure that must fold into a proper shape to execute its function, such as acting as a pore that only lets a specific chemical through the cell membrane.

The importance here is to figure out potential targets for treatments that take into account particularities of certain proteins. That could produce better drugs with less side effects.

>>dalbas+(OP)
The genome, all of our DNA combined, is just a bunch of 1D strings like "cgtattctgcttgta". Those strings encode proteins, which fold up into a 3D shape once created. This 3D shape is what determines what the protein actually does inside the cell. Without understanding protein folding we don't understand what the DNA actually does.

>>dalbas+(OP)
If we knew:

(a) the structure of every protein (what DeepMind is doing here)

(b) how different protein structures interact (i.e. protein complexes - DeepMind is working on this but not there yet)

Then we could use those two building blocks to design new proteins (drugs) that do what we want. If we solve those two problems with very high accuracy, we can also reduce the time it takes to go from starting a drug discovery programme to approved medicine.

Obtaining all protein structures and determining how they interact is a key step towards making biology more predictable. Previously, solving the structure of a protein was very time consuming. As a result, we didn’t know the structure for a majority of proteins. Now that it’s much faster, downstream research can move faster.

Caveat: we should remember that these are all computational predictions. AlphaFold’s predictions can be wrong and protein structures will still need to be validated. Having said that, lots of validation has already occurred and confidence in the predictions grows with every new iteration of AlphaFold.

replies(2): >>lamena+Y7 >>jcranm+Wh

>>dalbas+(OP)
As others have already mentioned, proteins are the machinery of the cell. They perform an immense array of functions and they must fold in a certain way to perform these functions. This is part of what's known as the structure-function relationship.

Misfolded proteins are contributors to numerous pathological conditions and the more we can understand about how and why this folding happens, the better we can treat these conditions.

Another aspect is that while we can at least partially determine the primary structure (the amino acid sequence) of proteins from DNA and RNA, we don't necessarily know their secondary or tertiary structures (3 dimensional conformation). This is a key piece of the puzzle for figuring out how these proteins do their proteiny things and how they interact with other proteins and even how they form quaternary structures with other proteins (an assembly of multiple proteins that perform some function, many pores are assemblies like this). Once we know these structures and understand how they work on a structural and chemical level, we can manipulate them far more easily.

In order to do rational drug design, which is designing a drug for a specific target or active site on a protein, we need to understand these structures. Working to solve protein folding is a key step in treating disease states and understanding how cells work on a fundamental level. The impact is hard to understate.

>>axg11+i2
How are the predictions validated? Waiting for the old fashioned way for... very difficult crystal structure experiments? Or something else?

replies(4): >>flobos+69 >>Random+ub >>f38zf5+1i >>misnom+mu

>>lamena+Y7
> very difficult crystal structure experiments?

Apart from X-ray crystallography there are other methods for structure determination such as nuclear magnetic resonance (NMR) or cryo-electron microscopy (cryo-EM). The latter has seen a dramatic improvement in resolution over the last decade.

>>dalbas+(OP)
https://en.m.wikipedia.org/wiki/Protein_folding

>>dalbas+(OP)
This might be an interesting resource for you: https://pdb101.rcsb.org/

>>dalbas+(OP)
The endgame of protein folding is nanotechnology. Life is also nanotechnology. So basically the end game is to take full control of life: change existing life forms, design new life forms, new ecosystems. Eat the earth first, then the universe. Not yet there, but getting much closer.

>>lamena+Y7
If the predictions are generally good enough, could also skip the validation and directly try to get a desired effect or reaction. That isn't strictly speaking validating the structure, but depending on the use case might be easier to just go for an outcome - really a question of application and cost efficiency.

replies(1): >>siver_+oh

>>dalbas+(OP)
The applications and importance has been discussed, but let me explain why what we are doing right now does not work which will also emphasize the importance of this.

At this time, we create drugs, test them on animals, and see what the side effects and results actually are. We are very limited in our capabilities and basically throw mud at the wall and see what sticks. This would allow us to try potential drug candidates without so much randomness.

>>Random+ub
I mean nothing is stopping you from skipping validation with pre-alphafold techniques and say for drug discovery to already do drug screening using the predicted structure. It's just the drug screening software is already error prone so you are still going to have to do some validation. However having an idea on a potential structure means that you can do other techniques that are simpler to validate it that are less expensive/time consuming (I'm thinking of things similar to FRET).

Another idea is these may come into play for anti-verification, so if you are drug screening against a known structure. You could potentially use these more flawed structures of proteins you don't want to target but may be similar, and try to reduce the drug's efficacy at binding them. Or something to that effect. All of that is fun ideas that are currently being explored in that space but we'll see where it takes us.

>>fabios+u
> proteins are molecular machines

Does that imply proteins have some dynamics that need to be predicted too? I remember seeing animations of molecular machines that appeared to be "walking" inside the body - are those proteins or more complex structures?

replies(3): >>gillea+9k >>fabios+Uk >>Alop3x+oN1

>>axg11+i2
> Then we could use those two building blocks to design new proteins (drugs) that do what we want. If we solve those two problems with very high accuracy, we can also reduce the time it takes to go from starting a drug discovery programme to approved medicine.

Drugs are usually not proteins, but instead small molecules that are designed to help or interfere with the operation of proteins instead.

replies(1): >>axg11+Aq

>>lamena+Y7
Most of them are not, just estimations based on previous results given sequences with known structure.

Every couple years there is a massive competition called CASP where labs submit previously unresolved protein structures derived from experimental EM, x-ray crystallography, or NMR studies and other labs attempt to predict these structures using their software. AlphaFold2 absolutely destroyed the other labs in the main contest (regular monomeric targets, predominantly globular) for structure resolution two years ago, in CASP 14.

https://predictioncenter.org/casp14/zscores_final.cgi

The latest contest, CASP15, is currently underway and expected to end this year. As with all ML, the usual caveats apply to the models Google generated -- the dangers of overfitting to existing structures, artifacts based on the way the problem was modelled, etc

>>dalbas+(OP)
There are a million articles and podcasts explaining exactly your question. Those will be better than HN responses. I suggest you take 15 seconds to Google it.

replies(1): >>dev_tt+Ip

>>jebark+Ah
Yes, very much so. Even for proteins that seems like they are just scaffolding for a catalytic centre can have important dynamics.

A classic example is haemoglobin, that 'just' binds to oxygen at the iron in the middle of the haem. Other binding sites remote from the oxygen binding one can bind to other molecules - notably carbon dioxide. The 'Bohr effect' mechanism is outlined here : https://en.wikipedia.org/wiki/Bohr_effect#Allosteric_interac...

Even at the lowest level, there is some evidence that ligand binding can affect the structure of the backbone of the protein. For example, peptide plane flipping https://en.wikipedia.org/wiki/Peptide_plane_flipping although I'm not sure where the research is on this nowadays.

replies(1): >>jebark+rm

>>dalbas+(OP)
My understanding is that protein folding is a major cost bottleneck in drug design.

Researchers can come up with candidate molecule formulas that might work as good drugs, but the problem is that these proteins organize/fold themselves physically in a hard-to-predict way. And how they fold directly affects their properties as drugs.

If AlphaFold can accurately predict folding, it’ll allow researchers to prioritize drug candidates more accurately which will reduce research time and costs. Supposedly the major pharmaceutical companies can spend up to billions when designing a single drug. Optimistically, predicting protein folding better will allow for much more rapid and cheaper drug development

replies(2): >>sanxiy+uk1 >>ray__+DQ1

>>jebark+Ah
Yes proteins can be dynamic, and multiple proteins can come together to form bigger structures (like a cell's membrane for example), and they can very much interact with other proteins and molecules in complex ways.

I think AlphaFold gives us more of a static picture that doesn't tell us much about how the protein will interact with the world. Predicting these dynamic aspects, protein-to-protein interactions, and ultimately an entire cell, is probably what they'll like to do next.

I don't know if the walking buddy is a single protein, it may be a handful of different ones joined together.

>>gillea+9k
Fascinating. I'm a mathematician turned AI researcher by day. The more I learn about biology the more I feel that our human engineering (hard and soft) is trivially simple compared to what evolution has already created and left for us to study.

replies(2): >>gillea+9u >>Balgai+vh2

>>dalbas+(OP)
Proteins are the molecular machines of all living beings on this planet. They do almost everything. We need to understand their structure and how that structure changes when they perform some function. Essentially we have to reverse engineer nature's machines.

Biology has traditionally studied these things from the outside in, top down. Advances in biology are correlated with advances in microscope technology. We went from seeing with the naked eye to being able to magnify tissue to see cells and even what's inside of them.

Studying biology from the inside out, bottom up is significantly harder. You have chemistry which is hard science. On top of that you have biochemistry which is equally rigorous. Above that level of abstraction things start to get a lot less well defined. We go from organic chemistry (carbon, hydrogen, oxygen, etc.) to more complex things found in life (aminoacids, carbohydrates, lipids) to polymers of those things (proteins, polysaccharides, triglycerides) to a point where they get so big and complex their tridimensional structure starts influencing how they interact with other molecules then we start seeing aggregates (quaternary proteins) of these complex things!

Important proteins like ATP synthase are well understood (it's a friggin' turbine like in a hydroelectric power plant!) but there are so many others haven't received the same attention. Bioinformatics technology is changing that. The human genome's been mapped. Now it's time to map the proteome.

>>green-+3j
Yes there are a million articles. That is why asking a question here on HN is useful. The HN community more often than not offers intelligent insight as well as curated recommended links for learning more about a topic. Yes, the signal-to-noise ratio isn't perfect on HN, but it is a lot better than random Google searches. If nothing else, it often leads to enough basic understanding so that someone can then perform more refined and therefore more productive Google searches. I appreciate the basic questions and the generous nature of many HN commenters who offer thoughtful responses.

>>jcranm+Wh
That is only true because of our current tools and capabilities. With improved manufacturing techniques and AlphaFold++ I think biologics will dominate. Even still, there are ~2000 approved biologics [0].

[0] - https://purplebooksearch.fda.gov/advanced-search

replies(1): >>bckr+0v

>>jebark+rm
Heh, I'm somewhat the opposite, as I went from a degree in biochemistry into bioinformatics (protein folds and structure) and have an amateur interest in maths.

I often recommend the book 'Cats' Paws and Catapults' by Steven Vogel, which discusses the differences between human and biological engineering. There are very different constraints, as well as the more obvious difference between intelligently directed design (by humans) and evolutionary design.

It's not totally true that our engineering is simpler than biological systems. After all, we can now manipulate matter down to the atomic level. There have been attempts to move towards self-assembling systems, or even self-repairing ones. Not quite on the same scale or sophistication as biological systems, of course.

Is a modern computer chip more complex than a virus? Is a large connected system of computers more complex than a mycelial network? Are there more parts connected in more ways in a nuclear power station as there are in a simple bacterial cell?

I think it is definitely worth considering the similarities and differences.

replies(1): >>jebark+pQ

>>lamena+Y7
For a lot of X-ray crystallography cases, some of the difficulty is working out with no prior information, the actual structure from the collected data. This makes a lot of that... much easier because with https://en.wikipedia.org/wiki/Molecular_replacement something that is "close, but not correct" can be used to bootstrap the actual structure from.

>>axg11+Aq
Yep, proteins are so much more flexible / precise than small molecules. Also we can get the body to produce them. Think mRNA vaccines.

>>gillea+9u
Definitely interesting comparisons.

I think my sense that biology is more complex than human engineering is that our engineering seems much more homogeneous. Computer chips are certainly very dense but the individual number of different component types are very small. Biology on the other hand seems to have a huge number of fundamental building blocks, e.g. proteins, with different behaviors. I suppose that maybe that's the wrong comparison though and software is where our engineering diversity is present.

It may well just be the case that my lack of familiarity with biology makes it seem more complex to me.

replies(2): >>ncmncm+Y31 >>gillea+Gi1

>>jebark+pQ
One person can, at least in principle, understand every last thing in a nuclear reactor. A few elements have roles at different levels of abstraction. And of course they have computers in them that invariably do stuff radically simpler than they could be doing.

But nobody will ever understand everything about a natural cell. Levels are an ad hoc phenomenon adhered to in varying degrees in certain places to contain variation. But few elements have only one role at one identifiable level, and you can never be sure one doesn't have another. And huge amounts of apparatus might radically change behavior in a subtly different environment.

>>jebark+pQ
You're right in focusing on the diversity of the parts. After all, a large pile of sand has a lot of parts, but they are all (to a decent approximation) the same. Therefore (?) they have the same small set of interactions, and the whole system has a 'small' set of states. (However, see 'self-organized criticality').

Software is where the diversity is, for computers. A cells systems are a set of interlocking networks of DNA, RNA, protein, small molecules, and signalling. As you say, these diverse parts then interact in diverse ways.

Another aspect is the large number of 'levels' in biological systems - humans have around seven; atoms, molecules, organelles, cells, tissues, organs, whole system. The interactions within and between these levels is a further source of complexity.

>>turkey+dk
My understanding is that protein folding is not a bottleneck in drug design.

Yes, once you identified a target protein, its structure is useful to selectively target it. But the main bottleneck is identifying such targets. In other words, the main difficulty is to figure out what to hit, not how to hit it, and protein folding mostly helps with how at the moment.

>>jebark+Ah
Reference: https://en.wikipedia.org/wiki/Kinesin , https://en.wikipedia.org/wiki/Dynein , https://en.wikipedia.org/wiki/Myosin

They are called motor proteins because they convert chemical energy into kinetic energy. In the case of kinesin, it forms a dimer (two copies of itself bind together to form the two "legs") and also binds to light chains (accessory proteins that modulate its behavior) so that it can walk along filaments and drag cargo around your cells. They are both proteins and more complex structures because multiple proteins are interacting, as well as binding small molecules and catalyzing them into chemical products, all to produce the motion.

>>turkey+dk
I love AlphaFold, but this is a big misconception. The biggest cost bottle neck in drug development and design, by orders of magnitude, is associated with assaying (and potentially reducing) off-target binding or toxicity and assaying (and potentially increasing) efficacy. Determining a protein structure empirically with cryoEM, NMR, or crystallography will generally cost less than $1M (sometimes far less), which is tiny compared to the many millions or billions of dollars that get poured into clinical trials for a single drug. AF2 is useful in some basic research cases but isn't really that useful for traditional drug design and development.

A machine learning approach for predicting toxicity would have a far greater impact on public health than AF2 does.

>>jebark+rm
May career path took me from the very hard sciences to the biological ones. Though bio is experimentally trivial compared to the complexity of astrophysics, bio theory is millennia harder.

One of my favorite quotes about biology goes something like:

"An alien spaceship lands in your yard. You are kindly invited in to study and tour the ship. The technology is clearly billions of years more advanced than our own. Not decades more, not millennia more, not megayears more, but clearly gigayears more.

This is biology"

Sorry, can't find the exact quote and citation right now!

replies(1): >>flobos+Tj2

>>Balgai+vh2
Bert Hubert, “Our amazing immune system”: https://berthub.eu/articles/posts/immune-system/

replies(1): >>Balgai+8p2

>>flobos+Tj2
Thank you!