disorder prediction and phosphorylation site prediction
Predictors of Natural Disordered Regions


PONDR® Home Create a new User Account Log in to a User Account Visit Molecular Kinetics Visit Dr. Dunker's lab page Visit Dr. Obradovic's lab page Visit our Research Service Center
   

Introduction

Disordered regions (DRs) are entire proteins or regions of proteins which lack a fixed tertiary structure, essentially being partially or fully unfolded. Such disordered regions have been shown to be involved in a variety of functions, including DNA recognition, modulation of specificity/affinity of protein binding, molecular threading, activation by cleavage, and control of protein lifetimes. Although these DRs lack a defined 3-D structure in their native states, they frequently undergo disorder-to-order transitions upon binding to their partners.

As it is known that sequence determines structure, we assumed that sequence would determine lack of structure as well. To test this, we developed a series of neural network predictors (NNPs) that use amino acid sequence data to predict disorder in a given region. This collection of Predictors of Natural Disordered Regions is termed PONDR®.

PONDR® methods

PONDR® functions from primary sequence data alone. The predictors are feedforward neural networks that use sequence information from windows of generally 21 amino acids. Attributes, such as the fractional composition of particular amino acids or hydropathy, are calculated over this window, and these values are used as inputs for the predictor. The neural network, which has been trained on a specific set of ordered and disordered sequences, then outputs a value for the central amino acid in the window. The predictions are then smoothed over a sliding window of 9 amino acids. If a residue value exceeds a threshold of 0.5 (the threshold used for training) the residue is considered disordered.

The default PONDR® predictor is VL-XT; the XL1 and CaN predictors are also available.

The VL-XT predictor integrates three feedforward neural networks: the VL1 predictor from Romero et al. 2000and the N- and C- terminal predictors (XT) from Li et al. 1999. VL1 was trained using 8 disordered regions identified from missing electron density in X-ray crystallographic studies, and 7 disordered regions characterized by NMR. The XT predictors were also trained using X-ray crystallographic data. The attributes used by these predictors are listed in Table 1 (taken from Romero et al. 2000). Output for the VL1 predictor starts and ends 11 amino acids from the termini. The XT predictors ouptput predictions up to 14 amino acids from their respective ends. A simple average is taken for the overlapping predictions; and a sliding window of 9 amino acids is used to smooth the prediction values along the length of the sequence. Unsmoothed prediction values from the XT predictors are used for the first and last 4 sequence positions.

The XL1 predictor is a feedforward neural network optimized to predict regions of disorder greater than 39 amino acids (Romero et al., 1997). It was trained on 7 of the 8 disordered regions identified from missing electron density that were used to train the VL1 predictor. The attributes used by this predictor are listed in Table 1 (taken from Romero et al., 2000). This predictor uses a sliding window of 9 amino acids to smooth the prediction values along the length of the sequence, so predictions are only provided starting and ending 15 amino acids from the termini.

The CaN predictor is a feedforward neural network that was trained on regions of 13 Calcineurin proteins that were identified by sequence homology with the known disordered region of human Calcineurin (Romero et al., 1997). The attributes used by this predictor are listed in Table 1. This predictor shows poor out of sample accuracy, but in some cases the contrast of its output with other predictors provides insight into binding regions of disordered sequences (Garner et al., 1999).

Table 1. Attributes used by various PONDR®s
PONDR® ATTRIBUTES
XL1 Flexibility Hydropathy C W Y H D E K S
VL1 Coordination number Net charge WFY W Y F D E K R
XN Coordination number V VIYFW M N H D PEVK    
XC Coordination number Hydropathy VIYFW M T H   PEVK   R
CaN beta-moment V F W Y H C E S R

Interpreting PONDR®'s Output

PONDR® outputs two forms of data, a graph and a log file.

The graph shows the residue by residue output of the neural network. Any region that exceeds 0.5 on the Y-axis is considered disordered. Note that as the length of the predicted disordered region increases, the accuracy of the predicition increases also. Extremely long predictions of disorder have a very high level of confidence. In some cases, extreme minimas within regions structurally characterized to be disordered correlate to binding regions (see Garner et. al., 1999)

The log file shows the sequence, and direcly beneath, a capital "D" where the output of the predictor exceeded the threshold. At the top of the log output is the residue locations of contiguous predictions of disorder, their length and their average score.

PONDR® Accuracy

Aside from the 5-cross validation accuracies determined during training, the various predictors were applied to different data sets to determine their accuracy on an amino acid basis. False negatives were determined by application to a data set of disordered proteins. False positives were determined by application to a non-redunant database called O_PDB_S25 developed by finding all of the ordered seqments of the crystal structures from PDB_SELECT_25.
Note that these are accuracies are only on an amino acid basis. The XL1 + VL1 predictors were trained to recognize regions of disorder 40 amino acids or longer. As the length of a prediction increases, so does the confidence in the prediction.

Table 2. PONDR® accuracies
Predictor False Negative
(dis_ALL)
False Positive
(O_PDB_S25)
5-cross
Validation
VL-XT 40% 22% 75 - 83%
XL1 62% 19% 73 ± 4%
CaN 39% 34% 83 ± 5%



PONDR® References

When using the VL-XT predictor, please cite:

Predicting protein disorder for N-, C-, and internal regions.
Li, X., P. Romero, M. Rani, A. K. Dunker, and Z. Obradovic, Genome Informatics, 1999, 10:30-40.

Sequence complexity of disordered protein.
Romero, P., Z. Obradovic, X. Li, E. Garner, C. Brown, and A. K. Dunker, Proteins: Struct. Funct. Gen., 2001, 42:38-48.

Sequence data analysis for long disordered regions prediction in the calcineurin family.
Romero, P., Z. Obradovic, and A. K. Dunker, Genome Informatics, 1997, 8:110-124.

When using the XL1 predictor, please cite the following reference:

Identifying Disordered Regions in Proteins from Amino Acid Sequences.
Romero, P., Z. Obradovic, C.R. Kissinger, J.E. Villafranca, and A.K. Dunker, Proc. I.E.E.E. International Conference on Neural Networks, 1997, p. 90-95.

For more reading on the development, use, and applications of PONDR®, please refer to our bibliography page.

A comprehensive review regarding disordered proteins was written by Peter Wright and Jane Dyson, follow this link for the abstract.

PONDR® development

PONDR® was developed by P.Romero, A.K. Dunker, X. Li, and Z. Obradovic.
CGI, web interface, and misc. coding was done by E. Garner, C. Crosetto and J. Mueller.


 

To contact us:

Molecular Kinetics, Inc.

351 West 10th Street; Suite 318
Indianapolis, Indiana 46202
Tel: 317-638-0244
Fax: 317-638-0295
e-mail: main@molecularkinetics.com

PONDR® was developed by P. Romero, A.K. Dunker, X. Li, and Z. Obradovic.
CGI, web interface, and miscellaneous coding was done by E. Garner, C. Crosetto and J. Mueller.

Access to PONDR® is provided by Molecular Kinetics (6201 La Pas Trail - Ste 160, Indianapolis, IN 46268;
www.molecularkinetics.com; main@molecularkinetics.com) under license from the WSU Research Foundation.
PONDR® is copyright ©1999 by the WSU Research Foundation, all rights reserved.

Molecular Kinetics, Inc., Washington State University and the WSU Research Foundation and their several employees and consultants assume no liability, either real or implied, from the use of PONDR® in any of its forms or the results of its predictions for any damage, loss of time, loss of profit, either real or potential, or any other damage or loss that may arise from the use of PONDR® in any of its forms or from the results of its predictions.