Computationally generated structure conformations (called decoys) are widely used for training protein
force field and folding methods. But generation of high quality structure decoys turns out
to be a significantly challenging problem.
A systematic examination of the decoy sets used in the literature shows that,
surprisingly, almost all the existing decoy sets have serious flaws in the evenness
and diversity of structure distributions (Deng et al, 2015). Many decoy sets have bias to specific secondary structure
and compactness pattern, which make the native structure easily distinguishable using trivial
potentials. Meanwhile, most structure decoy sets have been pre-calculated for specific proteins;
but many researchers need to generate structure decoys for specific proteins that they are
3DRobot is a new algorithm devoted to create structure decoys, by free fragment assembly with
enhanced hydrogen bonding and compactness interactions associated with balanced conformational
selection. The flowchart of 3DRobot is depicted in Figure 1, which consists of four major steps.
Step 1: Starting from the target structure, TM-align is used to thread the structure through
a representative PDB library, which consists of 27,822 non-redundant protein structures with a
pair-wise sequence identity less than 70%. Up to 100 non-redundant structure scaffolds (or templates)
are selected from the top structure alignments ranked based on RMSD to the target structure.
- Step 2: Starting from the TM-align scaffolds, the full-length structure models are
assembled by replica-exchange Monte Carlo simulations, based on a protocol extended from I-TASSER.
The target sequence is split into structurally aligned (modeled off-lattice ) and unaligned regions
(modeled on a lattice system with grid=0.87 Å). Several new energy terms were introduced to enhance
the hydrogen-bonding networks and compactness of the conformation decoys during simulations.
- Step 3: Decoy structures are divided into n bins according to RMSD to the native,
each with a RMSD interval 1 Å. The decoys in each RMSD bin are normally adopted from different templates
by numeration till the number of decoys in the bin reaches N/n (N is the total number
of decoys required by user). If some bins don't have enough decoys, more decoys will be selected
from their adjacent bins.
- Step 4: Starting from the reduced models generated from Step 3, which have each residue
specified by its C-alpha atom and side-chain center of mass,
ModRefiner is extended to construct full-atom structures from the C-alpha traces, which
also aims to refine the local structure clashes and hydrogen-bonding networks. The backbone atoms are
quickly constructed using a look-up table that involves 4 neighboring C-alpha atoms. The overall
structures are then relaxed and refined iteratively by a two-step energy minimization procedure.
Figure 1. Flowchart of 3DRobot for protein structure decoy generation.
Haiyou Deng, Ya Jia and Yang Zhang.
Automated Generation of Diverse and Well-packed
Protein Structure Decoys, submitted, 2015.