---------------
Yes, the idea of a 'protein universe' seems like it should at least encompass 'fold space'.
For example, WR Taylor : https://pubmed.ncbi.nlm.nih.gov/11948354/
I think the rough estimate was that there were around 1000 folds - depending on how fine-grained you want to go.
Absolutely agree, though, that a lot of proteins are hard to crystalise (i understand) due to being trans-membrane or just the difficulty of getting the right parameters for the experiment.
For example for transmembrane proteins, there is a gross under-representation of structures derived from experimental evidence, so we would expect that whatever your algorithm is "solving" is going to have a much higher degree of error than globular proteins, and likely artifacts associated with learning from much more abundant globular proteins.
edit: As an example, "Sampling the conformational landscapes of transporters and receptors with AlphaFold2". AF2 was able to reproduce the alternative conformations of GPCRs, but only with non-default settings. With default settings there is clear evidence of overfitting.
> Overall, these results demonstrate that highly accurate models adopting both conformations of all eight protein targets could be predicted with AF2 by using MSAs that are far shallower than the default. However, because the optimal MSA depth and choice of templates varied for each protein, they also argue against a one-size-fits-all approach for conformational sampling.
At a guess, the core packing in non-globular proteins might be different? Also the distribution of secondary structure might also vary between classes. Might be worth someone studying how much structural constraints depend on fold (if they have not already).