Proteins are long chains of amino acids that fold into complex shapes. This allows them to do their jobs, from lining the inner hull of a racecar to binding with a specific target molecule like a lock fitting into a keyhole.
Understanding how proteins fold is essential for understanding life itself. Last year, an artificial intelligence system called AlphaFold, run by Google parent company DeepMind, dominated the CASP competition.
Amino acid sequence
The three-dimensional structure of a protein is encoded by its amino acid sequence. A protein folding problem therefore reduces to a search for the best three-dimensional structure that fits the sequence data. The structural biologists call this process threading and it has been a central focus of computational biology since the late 1960s.
The first step in protein folding is the establishment of regular secondary structures, mainly alpha helices and beta sheets. These are stabilized by hydrogen bonds between the carbonyl groups (C=O) of adjacent amino acids, forming a network of contacts that is referred to as the backbone. Then comes tertiary structure, the packing of the alpha helix turns and beta sheet segments. Finally, quaternary structure expresses the overall shape of the protein as it is assembled into higher-order motifs and domains.
It is thought that the amino acid sequence of a protein is optimized by evolution to permit fast and reliable folding. Interactions that slow down the folding process are selected against, while those that speed it up are promoted. This gives rise to globally “funneled energy landscapes” that are largely directed towards the native folded state of the protein.
The exact mechanisms by which a protein folds are not fully understood. However, a number of observations have been made. For example, a series of pulse-labeling experiments with the protein Cyt c have shown that specific regions of the chain fold within milliseconds. This suggests that the protein is self-folding but it probably requires assistance from molecular chaperones.
Hydrophobic and hydrophilic amino acids
The side chains of amino acids can be either hydrophobic (non-polar) or hydrophilic. These side-chain characteristics influence protein folding, as they tend to interact with other amino acids or with water. Generally, the parent amino and carboxyl groups are busy with primary/secondary structure, so these are not considered when studying the characteristics of an amino acid. The characteristics of a side chain include its polarity, aromaticity, and sulphur content.
Hydrophobic interactions are dominant in protein stability. They occur between residues that have large areas of non-polar surface and hydrophobic amino acid side chains and the highly polar water molecules surrounding them. Hydrophobic interaction energy is very strong and can be determined by calculating the phase-partitioning behaviour of the amino acid in an organic solvent, or by analysing the hydrophobicity of the residues using a protein-solvent interaction map. Tryptophan, for example, has a large area of non-polar surface and is very hydrophobic.
Another important type of interaction is the formation of hydrophobic pockets in proteins. These are regions of excluded volume where the amino acids and peptide backbone form parallel or anti-parallel b-sheets.
The emergence of these regions is essential for the overall structure of proteins. They provide rigidity, which is necessary for sharp polypeptide turns in loop structures and provide channels to transport substances within the protein. The occurrence of these hydrophobic pockets is aided by the formation of disulfide bonds, which are sulfur-sulfur chemical bonds that link nonadjacent cysteine residues in the protein.
The folding funnel
The classic principle of protein folding is that all the information a protein needs to adopt its three-dimensional structure is encoded in its amino acid sequence. However, recent work has shown that this simple view of protein folding is not accurate.
A fundamental insight is that the protein folding process can be described in terms of a funnel-shaped energy landscape. This concept replaces the old assumption that the process must take a single pathway with clearly defined chemical intermediates (1, 2).
Good folding sequences have a landscape that is rugged and has low-energy states with structurally similar configurations. Around the critical temperature, the kinetics of these sequences is exponential and very robust to reasonable environmental changes and mutations (2, 3).
The rugged landscape also allows a protein to be trapped in more than one low-energy state and to explore a range of pathways to its native state (4, 5).
The ideal asymptotic free-energy profile as a function of a reaction coordinate measuring progress down the funnel is broad, suggesting that fluctuations away from the ideal profile can significantly affect the rate at which proteins fold (6, 7). This feature explains why computer simulations that address size scaling issues and those from player-designed proteins are often difficult to interpret. The capillarity theory discussed here unifies these disparate aspects of protein folding by showing how the phenomenological funnel descriptions used in mean field and capillarity approximations are remarkably consistent with the rugged landscape picture that Thirumalai has developed for glasses (20, 21). These features suggest that the ideal asymptotic free-energy profiles will be broadly similar for all proteins.
The development of fast techniques for triggering and monitoring protein folding has greatly advanced the study of the dynamics of protein folding. These fast techniques include neutron scattering, ultrafast mixing of solutions, laser temperature jump spectroscopy and photochemical methods.
The experimental techniques require a large amount of computational resources. This has led to the growth of distibuted computing projects that use idle CPU time from personal computers and PlayStation 3s (see Folding@home).
Proteins fold spontaneously in a cell, often as they are being made on a ribosome or when a fully-folded protein misfolds and aggregates into an inclusion body. These proteins are assisted by other proteins called protein folding chaperones.
During protein folding, many different conformational states can be formed. The energy landscape of these states is often compared to the configuration space of an unsolved puzzle. To acquire the native state of a protein, the most favourable states must be found in this energy landscape. This search for optimum conformational states is often described using Markov state models.
A simple example is the helix-turn-helix model, in which the first helix is formed at one end of the protein and the turn is located near the centre of the protein. The next step is to make a beta sheet, which requires the formation of another turn and so on. This process is driven by hydrophobic interactions, van der Waals forces and conformational entropy.