|
I.
Protein Structure and Function Prediction Services
(folding, threading, potential, contact, torsion, docking etc)

Introduction: I-TASSER server is an Internet service for protein structure and function predictions.
Models are built based on multiple-threading alignments by LOMETS and iterative
TASSER simulations. I-TASSER (as 'Zhang-Server') was ranked as the No 1 server
in recent CASP7 and CASP8 experiments. The server is in active development with
the goal to provide accurate structural and function predictions using
state-of-the-art algorithms.
References:
Ambrish Roy, Alper Kucukural, Yang Zhang.
I-TASSER: a unified platform for automated protein structure and function prediction.
Nature Protocols, vol 5, 725-738 (2010).
(download the PDF file).
Yang Zhang. I-TASSER server for protein 3D structure prediction.
BMC Bioinformatics, vol 9, 40 (2008).
(download the PDF file).
|

Introduction:
QUARK is a computer algorithm for ab initio protein folding and protein structure
prediction, which aims to construct the correct protein 3D model from amino acid
sequence only. QUARK models are built from a small fragments (1-20 residues long)
by replica-exchange Monte Carlo simulation under the guide of an atomic-level
knowledge-based force field. QUARK was ranked as the No 1 server in Free-modeling
(FM) in CASP9. Since no global template information is used in QUARK simulation,
the server is suitable for proteins which are considered without homologous templates.
References:
D. Xu, Y. Zhang, QUARK Ab Intio Protein Structure Prediction I: Methodology developments (in preparation)
D. Xu, Y. Zhang, QUARK Ab Intio Protein Structure Prediction II: Results of benchmark and blind tests (in preparation)
|

Introduction: LOMETS (Local Meta-Threading-Server) is a locally installed
meta-server for protein structure prediction. It generates 3D models by collecting
consensus target-to-template alignments from 9 locally-installed threading programs
(FUGUE, HHsearch, PAINT, PPA-I, PPA-II, PROSPECT2, SAM-T02, SPARKS, SP3).
References: S. Wu, Y. Zhang.
LOMETS: A local meta-threading-server for protein structure prediction.
Nucleic Acids Research 2007; 35: 3375-3382
(download the PDF file).
|

Introduction:
COFACTOR is an automated method for biological function annotation of protein
molecules, based on protein 3D structures. When user provides a structure model
of the target protein, COFACTOR will match the target proteins to the known
proteins (templates) in three comprehensive protein function libraries by global
and local structure comparisons. Functional insights, including ligand-binding site,
gene-ontology term, and enzyme classification, are then derived from the best
template proteins of the highest confidence score (C-score). The COFACTOR
algorithm was ranked as the best method for ligand-binding site predictions
in the community-wide CASP9 experiments.
References:
A Roy and Y Zhang,
Recognizing protein-ligand binding sites by global structural
alignment and local geometry refinement. 2011(Submitted)
|

Introduction: MUSTER (MUlti-Sources ThreadER) is a new protein threading
algorithm to identify the template structures from the PDB library. It generate
sequence-template alignments by combining sequence profile-profile alignment
with multiple structural information.
References:
S. Wu, Y. Zhang.
MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information.
Proteins: Structure, Function, and Bioinformatics 2008; 72: 547-556.
(download the PDF file)
|

Introduction:
SEGMER is a segmental threading algorithm designed to recoginzing substructure motifs
from the Protein Data Bank (PDB) library. It first splits target sequences into segments
which consists of 2-4 consecutive or non-consecutive secondary structure elements
(alpha-helix, beta-strand). The sequence segments are then threaded through the PDB to
identify conserved substructures. It often identifies better conserved structure motifs
than the whole-chain threading methods, especially when there is no similar global fold
existing in the PDB.
References:
S. Wu, Y. Zhang.
SEGMER:identifying protein sub-structural similarity by segmental threading. Structure, vol 18, 858-867 (2010).
(download the PDF file)
|

Introduction:
FG-MD is a molecular dynamics (MD) based algorithm for high-resolution protein structure
refinement. Given an initial protein or protein complex 3D model (either in C-alpha or
full-atom), FG-MD first identifies analogous fragments from the PDB by the structural
alignment program TM-align. Spatial restraints extracted from the fragments are then
used to guide the molecular dynamics simulations. In general, FG-MD aims to refine
the initial models closer to the native structure. It also improves the local geometry
of the structures by removing the steric clashes and improving the torsion angle and
the hydrogen-binding networks.
References:
J Zhang, Y Zhang. High-resolution protein structure refinement using
fragment guided molecular dynamics simulations (2011), submitted.
|

Introduction:
ModRefiner is an algorithm for atomic-level, high-resolution protein structure refinement. It can
start from either C-alpha trace, main-chain model or full-atomic model. Both side-chain and
backbone atoms are completely flexible during structure refinement simulations, where
conformational search is guided by a composite of physics- and knowledge-based force field.
ModRefiner has an option to allow for the assignment of a second structure
which will be used as a reference to which the refinement simulations are driven.
One aim of ModRefiner is to draw the initial starting models closer to their native state.
It also generates significant improvement in physical quality of local structures.
References:
Dong Xu and Yang Zhang.
Improving Physical Realism and Structural Accuracy of Protein Models by
a Two-step Atomic-level Energy Minimization, Biophysical Journal, 2011 (in press).
|

Introduction:
REMO is a new algorithm for constructing protein atomic structures
from C-alpha traces by optimizing the backbone hydrogen-bonding networks.
References:
Yunqi Li and Yang Zhang.
REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks.
Proteins, 2009, 76: 665-676.
(download the PDF file).
|

Introduction:
SVMSEQ is a new algorithm for protein residue-residue contact prediction
using Support Vector Machines.
References:
S. Wu, Y. Zhang.
A comprehensive assessment of sequence-based and template-based methods for protein contact prediction.
Bioinformatics, vol 24, 924-931 (2008).
(download the PDF file)
|

Introduction:
ANGLOR is a machine-learning based algorithm for ab initio prediction
of protein backbone torsion angles. For a given amino acid sequence,
the real-value backbone torsion angles (phi and psi) for each residue
are predicted by the combination of the neural network training and
the support vector machine.
References:
S. Wu, Y. Zhang.
ANGLOR: A Composite Machine-Learning Algorithm for Protein Backbone Torsion Angle Prediction.
PLoS ONE 2008; 3: e3400.
(download the PDF file)
|

Introduction:
COTH (CO-THreader) is a multiple-chain protein threading algorithm to identify
and recombine the protein complex structures from both tertiary and complex
structure libraries. It first generates complex query-template alignments by
sequence profile-profile alignment assisted by the ab initio binding-site
predictions from BSpred. The monomer structures from tertiary template library
are then combined into the complex framework by structure superposition.
References:
S Mukherjee, Y Zhang
Protein-protein complex structure prediction by multimeric threading and template recombination.
Structure, in press (2011).
|

Introduction:
BSpred is a neural network based algorithm for predicting binding site of proteins
from amino acid sequences. The algorithm was extensively trained on the sequence-based
features including protein sequence profile, secondary structure prediction,
and hydrophobicity scales of amino acids.
References:
S Mukherjee, Y Zhang
Protein-protein complex structure prediction by multimeric threading and template recombination.
Structure, in press (2011).
|

Introduction:
BSP-SLIM is a blind molecular docking method on low-resolution protein structures.
The method first identifies putative ligand binding sites by structurally matching the
target to the template holo-structures. The ligand-protein docking conformation is then
constructed by local shape and chemical feature complementarities between ligand
and the negative image of binding pockets.
References:
H S Lee, Y Zhang. BSP-SLIM: A blind low-resolution ligand-protein docking approach using
theoretically predicted protein structures (2011) submitted.
|

Introduction:
SAXSTER is a new algorithm to combine small-angle x-ray scattering (SAXS) data and
threading for high-resolution protein structure determination. Given a query sequence,
SAXSTER first generates a list of template alignments using the MUSTER threading program
from the PDB library. The SAXS data will then be used to prioritize the best template
alignments based on the SAXS profile match, which are finally used for full-length
atomic protein structure construction.
References:
M. dos Reis, R. Aparicio and Y. Zhang. Improving protein template recognition by using small angle X-ray scattering profiles. Biophysical Journal, 2011, in press.
|
|
|
II. Bioinformatics Tools
(structure alignment, sequence alignment, 3D visulization, surface, and clustering, etc)

Introduction:
TM-score is an algorithm to calculate the topological similarity of two
protein structures. It can be used to quantitatively access the quality of
protein structure predictions relative to the native. Because TM-score weights
the close matches stronger than the distant matches, TM-score is more sensitive
to the global topology of structures than the often-used
root-mean-square deviation (RMSD).
References:
Y. Zhang, J. Skolnick, Scoring function for automated assessment of
protein structure template quality. Proteins, 2004 57: 702-710
(download the PDF file and Correction).
|

Introduction:
TM-align is a computer algorithm for quick and accurate protein structure
alignment using dynamic programming and TM-score rotation matrix. An optimal
alignment between two proteins, as well as the TM-score, will be reported for
each comparison.
References:Y. Zhang, J. Skolnick, TM-align: A protein structure alignment algorithm
based on TM-score. Nucleic Acids Research, 2005 33: 2302-2309 (download the PDF file).
|

Introduction:
MM-align is designed to structurally align multimeric protein complexes using
heuristic iteration of dynamic programming based on TM-score rotation matrix.
The multple chains in each complex are first joined, in every possible order,
and then simultaneously aligned with cross-chain alignment prevented. The
alignment on interface structures can be enhenced by MM-align by an
interface-specific weighting factor. A TM-score is reported for
assessing the structural similarity of two complexes.
References:
S. Mukherjee, Y. Zhang,
MM-align: a quick algorithm for aligning multiple-chain protein complex
structures using iterative dynamic programming.
Nucleic Acids Research 2009; 37: e83
(Download
PDF file and supporting materials).
|

Introduction:
NW-align is simple and robust alignment program for protein sequence-sequence
alignments based on the standard Needleman-Wunsch dynamic programming algorithm.
The mutation matrix is from BLOSUM62 with gap openning penaly=-11 and gap extension
panalty=-1. The source code of this program can be downloaded at the bottom of the
NW-align website, which can be easily modified for different purposes.
|

Introduction:
EDTSurf is a open source program to construct triangulated
surfaces for macromolecules. It can generate three major macromolecular
surfaces of van der Waals surface, solvent-accessible surface and molecular
surface (solvent-excluded surface), and identify cavities which are inside
of macromolecules.
References:
Dong Xu, Yang Zhang (2009) Generating Triangulated Macromolecular
Surfaces by Euclidean Distance Transform. PLoS ONE 4(12): e8140
(download the PDF file).
|

Introduction:
MVP (Macromolecular Visualization and Processing) is a convenient tool
for visualizing macromolecular structures and their derived
information. It supports PDB format and EM density maps and has many
drawing styles and color modes. It contains lots of convenient
features, including computations of triangulated surfaces,
depth, principal axes and estimate the secondary structures for protein
structures etc.
References:
Dong Xu, Yang Zhang (2009) Generating Triangulated Macromolecular
Surfaces by Euclidean Distance Transform. PLoS ONE 4(12): e8140. (download
the PDF file).
(download the PDF file)
|

Introduction:
MVP-Fit is a tool to combine and fit multiple monomer structures
into EM density maps.
While most current tools can only achieve regid-body docking and fitting,
MVP-Fit has the advantage to flexibly move and dock the monomer structures
into the EM density maps while keeping the physical and geometric restraints
of the individual structural models.
References:
Dong Xu, Yang Zhang, MVP-Fit: A Convenient Tool for Flexible Fitting of
Protein Domain Structures with Cryo-Electron Microscopy Density Map,
(2011, in preparation).
|

Introduction:
SPICKER is a clustering algorithm to identify the near-native models
from a pool of protein structure decoys. The cluster is defined by the
pair-wise RMSD metrics of the structural decoys.
References:
Y. Zhang, J. Skolnick, SPICKER: Approach to clustering protein structures for near-native model selection,
Journal of Computational Chemistry, 2004 25: 865-871.
(download the PDF file).
|

Introduction:
HAAD is a computer algorithm for constructing hydrogen atoms
from protein heavy-atom structures. The hydrgen is added
by minimizing atomic overlap and encouraging hydrogen bonding.
References:
Yunqi Li, Roy Ambrish and Yang Zhang,
HAAD: A Quick Algorithm for Accurate Prediction of Hydrogen Atoms in Protein Structures,
PLoS One, 2009 4: e6701 (download the PDF file).
|

Introduction:
PSSpred is a multiple neural training algorithm for accurate
protein secondary structure prediction.
The program is freely downloadable.
References:
http://zhanglab.ccmb.med.umich.edu/PSSpred
|
|
|
III. Databases and Potentials

Introduction:
GPCRRD is a primiary database of experimental restraints for G protein-coupled
receptors (GPCRs) which are systematically collected from literature and experimental
reports. It contains thousands of spatial restraints from mutagenesis, disulfide
mapping distances, electron cryomicroscopy, and FTIR experiments. The data can be
conveniently used for assisting GPCR structure prediction and functional
annotations.
References:
Jian Zhang, Yang Zhang, "GPCRRD: An experimental restraint database for GPCR structure modeling" 2009 (submitted).
|

Introduction:
TM-fold is a on-line server to estimate the posterior
possibility of two protein structures belonging to the same family.
For a given pair of protein structures, this server is to calculate
the structural similarity by structural alignment algorithms, and
report a posterior probability for the structures belonging to the
same SCOP/CATH Fold family.
References:
J Xu, Y Zhang,
How significant is a protein structure similarity with TM-score=0.5?
Bioinformatics, 2010, doi:10.1093.
(download the PDF file).
|

Introduction:
The atomic structure decoys of 56 non-homologous small proteins.
The backbone structures are generated by the I-TASSER ab initio
modeling; the side-chain and other atoms are added using Pulchra.
References:
Sitao Wu, Jeffrey Skolnick, Yang Zhang: Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biology 2007, 5: 17.
(download PDF file)
|

Introduction:
The interaction parameters and the knowledge-based force field used by I-TASSER.
References:
1. Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick.
Touchstone II: A new approach to ab initio protein Structure Prediction.
Biophysical Journal, vol 85, 1145 (2003).
[download the PDF file]
2. Yang Zhang, Jeffrey Skolnick.
Automated structure prediction of weakly homologous proteins on a genomic scale.
Proceedings of the National Academy of Sciences of USA, vol 101, 7594 (2004).
[download the PDF file]
3. Sitao Wu, Jeffrey Skolnick, Yang Zhang.
Ab initio modeling of small proteins by iterative TASSER simulations
BMC Biology, vol 5, 17 (2007).
[download the PDF file]
|

Introduction:
RW is distance-dependent atomic potential for protein structure modeling and structure
decoy recognition. It is calculated from 1,383 high-resolution PDB structures using
an ideal random-walk chain as the reference state.
References:
Jian Zhang and Yang Zhang, A distance-dependent atomic potential form random-walk ideal chain reference state for protein fold selection and structure prediction. (2009) submitted.
|

Introduction:
An automated assessment of protein structure predictions generated by 189 human and server groups
in the CASP7 experiments. The assessment is based on TM-score, MaxSub and GDT-TS score
where 124 domains are split into HA (high accuracy), TBM (template-based modeling),
and FM (free-modeling) targets.
|

Introduction:
An automated assessment of protein structure predictions generated by 81 server groups
in the CASP8 experiments. The assessment is based on TM-score, MaxSub and GDT-TS score
where 172 domains are split into Easy and Hard targets.
|

Introduction:
An automated assessment of protein structure predictions generated by 81 server groups
in the CASP9 experiments. The assessment is based on TM-score, MaxSub and GDT-TS score
where 144 domains are split into Easy and Hard targets.
|
|
|
|
|