EvoDesign is an evolution-based approach to de novo design of proteins and protein-protein interactions.
It takes the full-atomic model of a scaffold in PDB format and outputs a list of designed sequences along
with the percent sequence identity of each sequence to the scaffold. EvoDesign also provides
normalized relative errors for predicted secondary structure, solvent accessibility, and backbone torsional
angles with respect to the input. This helps the user understand the quality of the designed sequences.
Overview of EvoDesign
Making Input to EvoDesign
Figure 1. The flow diagram of EvoDesign.
EvoDesign has two design options: monomer and protein-protein interface design. Both design strategies consist of three stages – pre-processing, simulation, and analysis of the data generated during the simulation phase to generate the final designed sequences (Fig. 1).
Pre-processing: For monomer design, the program starts with a given scaffold and searches the Protein Data Bank (PDB)
for proteins with similar folds using our in-house TM-align program
. The primary structures of proteins with similar folds are then aligned to generate a profile.
DSSP is used to assign secondary structure, solvent accessibility
and backbone torsional angles for the scaffold. For protein interface design, the user must not only supply the scaffold of interest but also its protein binding partner in PDB format. From here, a dimeric profile is constructed from multiple structure alignments of protein-protein interfaces.
Simulation: The core of EvoDesign is a Monte-Carlo (MC) based sequence space search. MC simulations start with a random seed
sequence. Ten independent runs, each with 30,000 steps are initiated in parallel. At each step of the simulation, the sequence is altered by
replacing some randomly selected amino acids at some randomly selected positions. Then, the energy of the sequence is computed. A sequence is
accepted or rejected based upon Metropolis criterion. EvoDesign uses replica-exchange Monte Carlo to search for low free-energy states.
Analysis: Sequences accepted during the Monte Carlo simulation are
clustered using the SPICKER algorithm  following the same procedure as used by Bazzoli et al.
The process works iteratively to identify the sequence with the maximum number of neighbors. The output of EvoDesign is the seed sequence of the
top ten (at most) clusters sorted by the size of the clusters.
The basic requirement for EvoDesign monomer design is a protein (or protein complex) structure in
PDB format. Users can copy and
paste PDB format protein structures in the given text box, or can upload PDB format protein structure files from the local computer.
Currently, EvoDesign includes two interfaces of "Monomer Design" and "Interface Design", which are developed
for protein fold and protein-protein interaction designs, respectively.
In each interface, there are multiple Advanced Options as follows.
Understanding output of EvoDesign
Structural profile cutoff:
By default, EvoDesign will look for highly similar folds in the PDB
for structural profile
construction. Thus, the number of proteins may differ from one scaffold to another based
on their fold types. Again, it has been shown that to avoid sequence biasing during design
simulation, at least ten proteins are required .
Therefore, users may opt for a lower cutoff for profile construction if the fold of
the scaffold is novel. It should be noted that evolutionary information content decreases
with a decrease in TM-score cutoff . Therefore, an optimized choice
is required for selection of this threshold.
For computational enhancement, the default energy setting for monomer design is to use the evolution-based
energy function only. Our benchmarking validates the sufficiency of evolution-based energy functions to
design reasonably good sequences. Nonetheless, for experimental purposes, we suggest that users use both
the evolution and physics-based energy function. Furthermore, interface design requires the use of the
evolution- and phyics-based energy function. Note, the use of the evolution- and physics-based energy
function is 2-3 times slower than evolution-based only design. The excess time requirement is due to
computing intensive side chain refinement. A detailed discussion of the evolution-based energy function
can be found at Mitra et al  and Pearce et al .
It is important to note that the server no longer uses FoldX, but our own physical energy function that has been
optimized for our pipeline.
Restrictions on residues: Users can control the design by restricting one or more residues
type(s) (for instance users may wish to restrict
CYS in the design sequence) or by fixing some of the residues by mentioning their residue ID.
Model designed sequences using
Users can model the designed sequences using I-TASSER by checking 'Yes' under the I-TASSER modeling option.
Since this step demands a great deal of computing resources, the default option is 'No'. We urge the
user to use this option properly. Alternatively, the user can use the
webserver to model the designed sequences.
Name of your protein: This is purely for inventory purposes. We suggest naming protein such that
users are able to discriminate
between different EvoDesign runs for different proteins.
Email: Your email address will be used to send you job completion notification
with the link to the results page. If you do not wish
to provide your email address, please bookmark the link that will be displayed immediately after successful job submission.
Figure 2 depicts the results page of EvoDesign server
for monomer design (A) and interface design (B), resectively.
For both monomer and interface design, Region A and Region B appear as soon as
the job is submitted. "Job running" status is displayed below Region B as long as the job is running on the
back-end server. Upon successful completion of the job, Region C and Region D will appear.
Screenshots of the result pages for EvoDesign. Left panel: "Monemer Design" module;
Right panel: "Interface Design" module.
The explanation of different Regions is given as follows:
- Region A: This region displays the input information for EvoDesign.
The input scaffold structure, which has been taken for EvoDesign, will be hyperlinked
as long the job will run. Upon completion of the EvoDesign the scaffold structure used
for design will be hyperlinked. These two structures will be different if the uploaded
structure is C-alpha only. Second line in this Region indicates the lower cutoff on
TM-score used for profile construction; third line of this region indicates the energy
function used by EvoDesign.
- Region B: The scaffold, receptor, and complex structures are displayed using
JSmol (an open-source
- Region C: The summary of EvoDesign data with links to download the files along with the
designed sequences are tabulated in this region. There are several columns:
- EvoDesign Rank: The first column links to the alignment of the EvoDesign sequences
(seed sequences of the clusters) with the scaffold
sequences and features of the scaffold (see Region D).
The column is sorted by the EvoDesign Rank. At
most, ten design sequences will be displayed, ranked 1 through 10.
EvoDesign Score: The confidence of each designed sequence is
denoted by the EvoDesign Score in the second column.
The lower the EvoDesign Score, the higher the confidence.
Sequence Identity: The percent sequence identity of each designed
sequence to the scaffold is calculated and is displayed in the second column.
Normalized Relative Error (NRE): The normalized relative error
or NRE is an error measure for each designed sequence to the scaffold sequence. If
the error for the scaffold sequence is Error(scaffold) and the error for the designed sequence
is Error(design), then the NRE of the designed sequence is defined
as [ Error(design) - Error(scaffold) ]/Error(scaffold). Clearly, a negative NRE value indicates
that the designed sequence has a smaller error than the
scaffold sequence. The error for the designed sequence is normalized by the number of residues
in case the error on the scaffold sequence is zero.
The quality of each design is estimated by the NRE of that sequence for secondary structure,
solvent accessibility and backbone torsional
angles. EvoDesign outputs all of the NREs along with the designed sequences. For each sequence
(each row), column 3-6 represents this
NRE for Secondary Structure: The secondary structure of the scaffold is assigned by
DSSP. For the scaffold as well as
the designed sequences, the secondary
structure is predicted by PSSPred.
The Q3 error of each sequence is calculated and compared to the secondary structure of the scaffold.
The NRE for the secondary structure is computed from the Q3 error of the designed and
NRE for Solvent Accessibility: The solvent accessibility of the scaffold is assigned by
DSSP. For scaffold as well as design
sequences the solvent accessibility
is predicted by neural network predictions. The pearson correlation coefficient between
assigned solvent accessibility and predicted solvent accessibility is computed. The difference of
correlation coefficient of scaffold sequence
from correlation coefficient of design sequence normalized by correlation coefficient of scaffold sequence is defined as normalized relative
error on solvent accessibility.
NRE for Backbone Torsion Angles: The backbone torsion angles (φ and ψ) of the scaffold is assigned by
DSSP program. For scaffold as well as design sequences the torsion angles
are predicted by neural-network predictions. Mean absolute error of the prediction is calculated from the assigned torsion angles.
Normalized relative error is computed from this mean absolute error.
I-TASSER prediction: The
EvoDesign sequences are again predicted by the I-TASSER sequence
to structure prediction webserver [6,7]
to validate how close the design sequences folds with the input scaffold.
In this regard, the last two columns, respectively, provide the TM-score and RMSD of the input scaffold with the
I-TASSER predicted first model on the EvoDesign sequence. The
I-TASSER predicted model can be downloaded from the Model column.
Data download: To download scaffold sequence and all the design sequences as a single file click on SI (last row, third column).
The secondary structure of scaffold structure (as assigned by DSSP program), and all the sequences (scaffold sequence and ten design sequences)
as predicted by PSSPred program can be downloaded as a single file from SS (last row, fourth column). It should be noted that the secondary
structure states are classified into three categories Helix (H), Sheet (E) and Coil (C). Solvent accessibility and backbone torsion angles can
be downloaded respectively from fourth, fifth and sixth column of the last row. Instead of selective download user can opt for downloading all
the parameters and the sequences as a single zip file (Data.zip) from the last row, first column.
Region D: The final column of the table linked with a text file hetero atom binding site information (Lig_bind),
secondary structure information as assigned by DSSP program on the scaffold structure (DSSP_SS), scaffold sequence (Scaffold) and design
sequence (Design). The identical residues are marked by '|' at the penultimate row. If there is any hetero atoms present in the
scaffold structure which are within 8.0 Å sphere radius of the residues then those residue positions will be marked as stars
(*) in the Lig_bind row. This will help the user to identify quickly whether those residues are conserved or not. This row will be absent if
there is no hetero atoms or hetero atoms are not within 8.0 Å sphere radius of any residues of the scaffold structure.
- Modeling of the designed sequences will consume a few more hours at the I-TASSER webserver. The EvoDesign results
will be made available as soon as the EvoDesign job is over. At this stage, I-TASSER prediction columns (last three columns) will
be marked by '-' (hyphen). These cells will be filled as and when the I-TASSER results are available.
- All the linked data are marked by gray colored cells.
- The name of your protein (whatever you have entered) will show up at the top of the table. If you provide none then the name
"your_protein" will be displayed.
- Yang Zhang, Jeffrey Skolnick (2005). TM-align: A protein structure alignment algorithm based on TM-score.
Nucleic Acids Research, 33:2302-2309.
- Yang Zhang, Jeffrey Skolnick (2004). SPICKER: Approach to clustering protein structures for near-native
model selection. Journal of Computational Chemistry, 25:865-871.
- Andrea Bazzoli, Andrea G. B. Tettamanzi, Yang Zhang (2011). Computational Protein Design and Large-Scale
Assessment by I-TASSER Structure Assembly Simulations. Journal of Molecular Biology, 407:764-776.
- Pralay Mitra, David Shultis, Jeffrey R. Brender, Jeff Czajka, David Marsh, Felicia Gray, Tomasz Cierpicki,
Yang Zhang (2013). An evolution-based approach for de novo protein design and a case study on
Microbacterium tuberculosis. PLOS Computational Biology,9: e1003298.
- Robin Pearce, Dani Setiawan, Xiaoqiang Huang, Yang Zhang (2018). EvoDesign: Designing protein-protein binding
interactions using evolutionary interface profiles, in preparation.
- Ambrish Roy, Alper Kucukural, Yang Zhang (2010). I-TASSER: a unified platform for automated protein
structure and function prediction. Nature Protocols, 5:725-738.
- Jianyi Yang, Renxiang Yan, Ambrish Roy, Dong Xu, Jonathan Poisson, Yang Zhang (2015). The I-TASSER Suite:
Protein structure and function prediction. Nature Methods, 12: 7-8.
Back to EvoDesign Homepage