Protein Folding Prediction: Secondary Structure

Protein Folding Prediction

Secondary Structure

Presented By Vincent

 

 

 

 

 


Table of Contents

Protein Folding Prediction: Secondary Structure. 2

Introduction. 2

Steps Involved in Prediction. 2

Prediction in 1D involves. 2

Prediction in 2D involves. 3

Prediction in 3D involves. 3

Prediction in 1D.. 3

Secondary structure prediction. 3

First generation SSP. 3

Second generation SSP. 4

Third generation SSP. 4

Prediction of solvent accessibility. 5

Prediction of Trans-membrane helices. 5

Prediction in 2D.. 5

Prediction of inter-residue and strand contacts. 5

Prediction in 3D.. 6

Summary. 6

 


Protein Folding Prediction: Secondary Structure.

 

Introduction

Proteins are building blocks of life. Proteins exhibit more sequence and chemical complexity than DNA or RNA. A protein sequence is a linear hetero polymer made up of one of the 20 different amino acids. They perform a wide variety of functions in the living organism: catalytic, structural, regulatory, differentiation, replication and signaling roles required for the cellular development. The key to the wide variety of functions exhibited by the individual proteins is not its linear sequence but its three dimensional structure.

The 3D structure of proteins can be studied either by experimental methods or structure prediction.

Protein structure prediction is not as easy as it sounds. There are a number of facts that exist that make structure prediction a difficult task. These are:

• A protein could fold in several ways to attain the native state.

• The physical basis of protein structural stability is not fully understood.

• The primary sequence may not fully specify the tertiary structure.

There are proteins called chaperones that induce the protein to fold in specific ways

Steps Involved in Prediction

Although there are many methods and algorithms to predict the structure, the general steps involved can be summarized as follows:

Prediction in 1D involves

• Prediction of secondary structure (SSP)

• Prediction of solvent accessibility

• Prediction of trans-membrane helices

Prediction in 2D involves

• Prediction of the inter-residue and strand contacts

Prediction in 3D involves

  • Searching the database to find a suitable template for modeling.

 

Prediction in 1D

Secondary structure prediction

The secondary structure of a protein has three regular forms, α-helix, β-sheet and loop or turns. SSP involves predicting the secondary structure state for each amino acid residue. The most widely used accuracy index for SSP is the 3state accuracy which gives the percentage of the correctly predicted residues in any of the three states.

Q = (Pα + Pβ + Ploop)/T x 100

Where T is the total number of residues, Pα is the number of residues predicted correctly to be in alpha helix, Pβ is the number of residues predicted correctly to be in beta sheet and Ploop is the number of residues predicted correctly to be in loops or turns.

The quality of the prediction is assessed by the number of segments in a protein, the average segment length and the distribution of the number of segments with the length.

First generation SSP

Most of the methods in this generation were based on single residue statistics. In the Chou-Fasman method developed in 1974, the residues were aligned according to their ability to form or break a secondary structure. He identified an α-helix by locating a clusters within 6 residues which was extended in both directions until terminated by a tetra-peptide with an average α propensity of less than 1.

For β-sheet, he looked in a cluster that had 3 out of 5 residues and then extended in both directions. Turns were predicted in a window of 4 residues first with an overall score that is significantly greater than that for helix and then by a position specific score for each of the 4 residues in reverse turn.

The GOR algorithm (Garnier, Osuguthorpe, and Robson) was developed in 1978 to improve upon the Chou-Fasman method. The GOR method not only took the relative occurrence of residue in a particular element of the structure but also took the accuracy of the data into consideration. The method first analyzed the protein of a known structure based on the query. It then considered the effect that a residue has on the secondary structure of another residue say ‘n’ residues from it. This gave the likelihood of a residue and its neighbors being in particular secondary structure.

Second generation SSP

These methods depended on sequence structure relationship and modeled using algorithms based on statistical information, physio-chemical properties, sequence patterns, multilayered neural networks, graph theory, multivariate statistics and nearest neighbor algorithm. The neural network based algorithm by Qian and Sejnowksi predicted the α-helix, a β-sheet of 15 test proteins.

The first generation method gave an accuracy of 50-60% and the second generation methods gave an improved accuracy of about 70%, this method had some drawbacks. The secondary structures differ even between crystals of the same protein. Moreover the long range interaction plays a role in secondary structure formation.

Third generation SSP

Superior in terms of accuracy (76%) and also dealing with the drawbacks of the other two generation is the third generation SSP.  Developed by Rost and Sander in 1993 it is composed of several cascading neural networks. In it, aligned homologous sequences of known structures are used to “train” the network, which then can be used to predict the secondary structure of the aligned sequences of the unknown protein. The homologous sequences are determined by BLAST and are aligned using MaxHom.

In this case, the network will be trained not to predict unreasonably short segments of secondary structure. Another step consists of averaging the output from independently trained network. Some of the best secondary structure prediction programs are PHD with an approximate 72% accuracy, Jpred with about 73-75% accuracy, PREDATOR with about 75% accuracy, Sam T99 with about 74% accuracy.

One of the difficulties in predicting secondary structures at high accuracy is the presence of non-local contacts in protein folding. This is because amino acids which are quite distant in the primary sequence may be close to each other in the 3D structure as the protein folds. Bayesian network which is based on parameterization of the sequence structure relationship in terms of structural segments can be used for predicting secondary structures.

Prediction of solvent accessibility

Usually non-polar amino acids tend to be buried inside the protein and the polar amino acids are in contact with the solvent. Although residue solvent accessibility is not as well conserved within a structural family as secondary structure, prediction can be improved by including evolutionary information. A neural network prediction of accessibility has been shown to be superior to simple hydrophobicity analyses. Prediction of solvent accessibility has been used successfully in prediction based threading as well.

The average accuracy of predicting the solvent accessibility is around 70-75%

 

Prediction of Trans-membrane helices

There are two main classes of membrane protein

  • 17 to 27 residue forming transmembrane helices that spans the membrane
  • 16 strand beta barrel fold that forms a pore through the membrane.

 

Prediction in 2D

Prediction of inter-residue and strand contacts

The NMR spectroscopy produces experimental data of distances between the protons. Using these distances, the 3D structure can be reconstructed using distance geometry or molecular dynamics. Hence if the secondary structure can be predicted successfully, some fraction (helices and strands which can be assigned based on hydrogen bonding pattern) of the contacts is known and its 3D structure can be determined by distance geometry. But the contacts predicted by secondary structure are short range contacts. For application of distance geometry, contacts between residues far apart in sequence should also be considered. One of the methods to predict such long range inter-residue contacts is by analyzing correlated mutations.

 

Prediction in 3D

The tertiary structure of proteins involves the folding of the secondary structural. The physical properties that determine fold are the backbone rigidity, interaction between the amino acids which include the electrostatic interaction, the vander-waals interaction, hydrogen and disulphide bonds and interaction with water.

There are three methods for protein structure prediction namely:

  1. homology modeling
  2. Fold recognition or threading and
  3. Ab-initio method.

All these methods involve searching the database for a homologue to the target protein.

Summary

Proteins are formed by polymerization of amino acid molecules to provide a primary structure which further folds to form a helix (secondary structure). Protein secondary structure prediction remains an important step on the way to full tertiary structure prediction in computational biology. Predicting the structure of a protein is a difficult task. Different approaches to predict the structure take into account different chemical and physical properties. This has given rise to a number of tools and techniques, some of which being specialized to work on either some aspects of predictions or some categories of proteins. Nevertheless these are not significantly accurate or reliable enough to predict all kinds of proteins..

END