Talk:50,000!

From PDBWiki

Jump to: navigation, search

[edit] Some text to pick over...

The very first prediction of protein conformation, calculated by Linus Pauling and Robert Corey in 1951 (i), implied common structural elements in all proteins. This was verified in 1960 when the first high-resolution crystal structure was solved by John Kendrew working on sperm-whale myoglobin (ii), confirming the presence of right handed -helix. At the same time Max Perutz solved the structure of horse haemoglobin (iii), showing that despite different amino acid composition, the subunits of haemoglobin have essentially the same tertiary structure as myoglobin. At the time Kendrew noted, “Myoglobin possesses a structure the significance of which extends beyond a particular species and even beyond a particular protein”.

It took 26 years and approximately 300 more protein structures before the relationship between sequence and structure was quantified by Cyrus Chothia and Arthur Lesk (iv). They showed that the RMS deviation in protein structures is monotonically related to the sequence similarity of those structures, suggesting that structure represents an additional layer of redundancy between sequence and function. This means that a functionally constrained protein will rarely change its core fold over the course of evolution, even though its sequence may diverge to less than 20% similarity with the original (v).

Back in 1976 only 56 structures were available in the PDB (vi), yet tertiary structure had been generally classified into four secondary structure classes and three different ‘folding units’ or supersecondary structures (5), speculated to be the ‘building blocks’ of tertiary structure (vii). The prevalence of common secondary and supersecondary structures in proteins is explained by the thermodynamic stability conferred by these protein conformations. Thus, for any given amino-acid sequence, only a few stable secondary and supersecondary conformations are available. For a protein to reliably assume a functional topology in a biological system, these structural units can be converged on over the course of evolution as consistent ‘means to an overall functional end’.

The predominant theory used to explain the similarities seen in sequence and tertiary structure is homology. The relative simplicity of sequence information has allowed the development of simple criteria for assessing the similarities between them. Within specific boundaries set by these criteria it is possible to reliably predict the probability of homology between two proteins. However, in the case of very divergent proteins, structural similarity alone cannot be used to reliably infer homology, as similar structures can be the result of convergence. A qualitative measure of structural and functional similarity can be used as additional criteria for assessing homology in these cases. Without precise criteria for the classification of tertiary structure it is impossible to unambiguously infer homology.

As of the 4th September 2001 there are 14600 protein structures in the PDB. This number has grown at an exponential rate with an estimated half-life of just 18 months (viii). Unsurprisingly there has been a concomitant explosion in the number of protein classification systems available (see ix,x and xi), based on a variety of different classification criteria. Although the amount of biological data is increasing exponentially, it can be assumed that the biologically relevant information it represents is asymptotically approaching a limit. Thus the aim of any classification system is to condense this information out of the data (for example see figure 1).

[edit] Some context

Maybe it would be useful to explain the context of the massive bioinformatical databases, such as in the context of computational biology. For example, a few weeks ago somebody at my local high school was complaining about a disease they have, and I was frankly sick of hearing about it for the umpteenth time, so I spent the lunch period reviewing the databases and emailed the poor bloke a total, complete solution. The informatics age allows intense, serious problem solving, and I don't think this understanding has trickled down to any level yet. There are not many resources focusing on the 'meta' aspect of these databases. -- Kanzure 06:54, 11 April 2008 (KST)

Personal tools