EvoDesign is a evolution-based approach for de novo protein sequence design. It takes the full-atomic model of a scaffold in PDB format and outputs a list of design sequences along with the percentage sequence identity with the scaffold. EvoDesign also provides normalized relative error on predicted secondary structure, solvent accessibility, and backbone torsional angles with respect to the input. This helps the user to understand the quality of the design sequences.
|EvoDesign: Overview of EvoDesign.|
Figure 1. The flow diagram of EvoDesign.
|EvoDesign consists of three stages – pre-processing, simulation and analyzing the data generated by simulation stage to output the design sequence (Fig. 1). Pre-processing: For a given scaffold, Protein Data Bank (PDB) is searched for similar folds using our in-house TM-align program . The primary structures of similar fold proteins are then aligned to generate a profile. DSSP program is used to assign secondary structure, solvent accessibility and backbone torsional angles of the scaffold. Simulation: The core of EvoDesign is a Monte-Carlo (MC) based sequence space search. MC simulation starts with a random seed sequence. Ten independent runs, each with 30,000 steps are initiated in parallel. At each step of the simulation the sequence is altered by replacing some randomly selected amino acids at some randomly selected positions. Then, the energy of the sequence is computed. A sequence is accepted or rejected based upon Metropolis criterion. The temperature of the Metropolis criterion is set as 0.03 to achieve 15% average acceptance of the sequences during simulation. Analysis: All the sequences from ten independent runs are combined by picking each 20th sequence. This helps to keep total number of sequences to be analyzed within control of computer memory. Next, the combined sequences are clustered using SPICKER algorithm  following the same procedure as used by Bazzoli et al. . The process works iteratively to identify the sequence with the maximum number of neighbors. The output of EvoDesign is the seed sequence of top ten (at the most) clusters sorted by the size of the clusters.|
EvoDesign: Basic Information.|
The basic requirement of EvoDesign is a protein structure in PDB format. Currently, EvoDesign supports single chain proteins without any major structural error like backbone break. User can copy and paste PDB format protein structure information in the given text box, or can upload PDB format protein structure file from the local computer.
EvoDesign: Advanced Options.|
Structural profile cutoff: By default EvoDesign will look for highly similar folds in PDB (TM-score>0.7) for structural profile construction. Thus, the number of proteins may differ from one protein to another based on their fold types. Again, it has been proved that to avoid sequence biasing during design simulation at least ten proteins are required . Therefore, user may opt for lower cutoff for profile construction if the fold of the scaffold is novel. It should be noted that evolutionary information content decreases with the decrease in TM-score cutoff . Therefore, an optimized choice is required for selection of this threshold.
Energy function: For computational enhancement, the default energy setting of EvoDesign is Evolution based energy function only. Our benchmarking validates that only evolution-based energy function can design reasonably good sequences. Nonetheless, for experimental purposes, we suggest to switch to Evolution and physics-based energy function. Note that Evolution and physics-based energy function is 2-3 times slower compared to only Evolution based function. The excess time requirement is due to computing intensive side chain refinement. A detailed discussion of energy function can be found at Mitra et al .
Restriction on residues: User can control the design by restricting one or more residues type(s) (like user may wish to restrict CYS in the design sequence) or by fixing some of the residues by mentioning their residue ID.
Model designed sequences using I-TASSER: User can model the designed sequences using I-TASSER program by checking option 'Yes'. Since this step demands lots of computing resources, so the default option is 'No'. We urge the user for proper use of this option. Alternatively, the user can use I-TASSER webserver for modeling the designed sequences.
EvoDesign: Optional Information.|
Name of your protein: This is purely for user's inventory purpose. We suggest to provide protein information to discriminate between different EvoDesign runs for different proteins.
|Email: Your email address will be used to send you job completion notification with the link of result page. If you do not wish to provide your email address then please bookmark the link that will be displayed immediately after successful job submission.|
EvoDesign: Understanding the output.
The server automatically removes all the data which are older than 90 days. There is no way to retrieve deleted data.
Figure 2 depicts the result page of EvoDesign server. Region A and Region B appears as soon as the job will be submitted. "Job running" status will be displayed below the Region B as long as the job will be running at the back-end server. On successful completion of the job, Region C will appear.
Region A: This region will display the input information for EvoDesign. The input scaffold structure that has been taken for EvoDesign, will be hyperlinked as long the job will run. Upon completion of the EvoDesign the scaffold structure used for design will be hyperlinked. These two structures will be different if the uploaded structure is C-alpha only.
Second line indicates the lower cutoff on TM-score used for profile construction.
Third line of this region indicates the energy function used by EvoDesign.
Figure 2. Screen shot of the result page of EvoDesign.
Region B: The scaffold structure will be displayed using Jmol (an open-source Java viewer for chemical structures in 3D). The only requirement for visualization is that your browser must be updated with the current version of Java.
Region C: The summary of EvoDesign data and link to download them along with the design sequences are tabulated in this region.
EvoDesign Rank: First column links to the alignment of the EvoDesign sequences (seed sequence of the cluster) with the scaffold sequences and features of the scaffold (see Region D). The column is sorted by the rank of EvoDesign. At the most ten design sequences will be outputted ranking 1 through 10.
EvoDesign Score: The confidence of EvoDesign is denoted by the EvoDesign Score at the second column. Highest confidence means lowest EvoDesign Score value.
Sequence Identity: The percentage identity of each design sequences are computed with scaffold sequence. Second column of the table indicates the percentage sequence identity.
Normalized Relative Error (NRE): Normalized relative error or NRE is an error measure for design sequences from scaffold sequence. If the error on scaffold sequence is Error(scaffold) and that on design sequence is Error(design), then NRE of design sequence is defined as [ Error(design) - Error(scaffold) ]/Error(scaffold). Clearly, a negative NRE value indicates that design sequence has less error than scaffold sequence. The error on design sequence is normalized by the number of residues in case the error on scaffold sequence is zero. The quality of design is estimated by NRE of sequence for secondary structure, solvent accessibility and backbone torsional angles. EvoDesign outputs all those information along with design sequence. Corresponding to each sequence (each row), column 3-6 represents those information.
NRE on Secondary Structure: The secondary structure of scaffold was assigned by DSSP program. For scaffold as well as design sequences the secondary structure is predicted by PSSPred program. Now, the Q3 error of each sequences are calculated from secondary structure of scaffold. NRE on secondary structure is computed from this Q3 error of design and scaffold sequences.
NRE on Solvent Accessibility: The solvent accessibility of the scaffold is assigned by DSSP program. For scaffold as well as design sequences the solvent accessibility is predicted by neural network predictions. The pearson correlation coefficient between assigned solvent accessibility and predicted solvent accessibility is computed. The difference of correlation coefficient of scaffold sequence from correlation coefficient of design sequence normalized by correlation coefficient of scaffold sequence is defined as normalized relative error on solvent accessibility.
NRE on Backbone Torsion Angles: The backbone torsion angles (φ and ψ) of the scaffold is assigned by DSSP program. For scaffold as well as design sequences the torsion angles are predicted by neural-network predictions. Mean absolute error of the prediction is calculated from the assigned torsion angles. Normalized relative error is computed from this mean absolute error.
I-TASSER prediction: The EvoDesign sequences are again predicted by the I-TASSER sequence to structure prediction webserver  to validate how close the design sequences folds with the input scaffold. In this regard, the last two columns, respectively, provide the TM-score and RMSD of the input scaffold with the I-TASSER predicted first model on the EvoDesign sequence. The I-TASSER predicted model can be downloaded from the Model column.
Data download: To download scaffold sequence and all the design sequences as a single file click on SI (last row, third column). The secondary structure of scaffold structure (as assigned by DSSP program), and all the sequences (scaffold sequence and ten design sequences) as predicted by PSSPred program can be downloaded as a single file from SS (last row, fourth column). It should be noted that the secondary structure states are classified into three categories Helix (H), Sheet (E) and Coil (C). Solvent accessibility and backbone torsion angles can be downloaded respectively from fourth, fifth and sixth column of the last row. Instead of selective download user can opt for downloading all the parameters and the sequences as a single zip file (Data.zip) from the last row, first column.
Note 1: The modeling of the design sequences will consume few more hours at the I-TASSER webserver. The EvoDesign result will be made available as soon as the EvoDesign job will be over. At this stage, I-TASSER prediction columns (last three columns) will be marked by '-' (hyphen). These cells will be filled as and when the I-TASSER results will be available.
Note 2: All the linked data are marked by gray colored cells.
Note 3: The name of your protein (whatever you have entered) will be showed up on top of the table. If you provide none then "your_protein" will be shown up.
Region D: The final column of the table linked with a text file hetero atom binding site information (Lig_bind), secondary structure information as assigned by DSSP program on the scaffold structure (DSSP_SS), scaffold sequence (Scaffold) and design sequence (Design). The identical residues are marked by '|' at the penultimate row. If there is any hetero atoms present in the scaffold structure which are within 8.0 Å sphere radius of the residues then those residue positions will be marked as star (*) in the Lig_bind row. This will help the user to identify quickly whether those residues are conserved or not. This row will be absent if there is no hetero atoms or hetero atoms are not within 8.0 Å sphere radius of any residues of the scaffold structure.
yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218