I-TASSER server for protein structure and function prediction

Online Services

●I-TASSER ●I-TASSER-MTD ●C-I-TASSER ●CR-I-TASSER ●QUARK ●C-QUARK ●LOMETS ●MUSTER ●CEthreader ●SEGMER ●DeepFold ●DeepFoldRNA ●FoldDesign ●COFACTOR ●COACH ●MetaGO ●TripletGO ●IonCom ●FG-MD ●ModRefiner ●REMO ●DEMO ●DEMO-EM ●SPRING ●COTH ●Threpp ●PEPPI ●BSpred ●ANGLOR ●EDock ●BSP-SLIM ●SAXSTER ●FUpred ●ThreaDom ●ThreaDomEx ●EvoDesign ●BindProf ●BindProfX ●SSIPe ●GPCR-I-TASSER ●MAGELLAN ●ResQ ●STRUM ●DAMpred

●TM-score ●TM-align ●US-align ●MM-align ●RNA-align ●NW-align ●LS-align ●EDTSurf ●MVP ●MVP-Fit ●SPICKER ●HAAD ●PSSpred ●3DRobot ●MR-REX ●I-TASSER-MR ●SVMSEQ ●NeBcon ●ResPRE ●TripletRes ●DeepPotential ●WDL-RF ●ATPbind ●DockRMSD ●DeepMSA ●FASPR ●EM-Refiner ●GPU-I-TASSER

●BioLiP ●E. coli ●GLASS ●GPCR-HGmod ●GPCR-RD ●GPCR-EXP ●Tara-3D ●TM-fold ●DECOYS ●POTENTIAL ●RW/RWplus ●EvoEF ●HPSF ●THE-DB ●ADDRESS ●Alpaca-Antibody ●CASP7 ●CASP8 ●CASP9 ●CASP10 ●CASP11 ●CASP12 ●CASP13 ●CASP14

[Server] [Queue] [Forum] [Download] [Example] [Search] [Registration] [Statistics] [Remove] [Potential] [Decoys] [News] [About] [Annotation]

About I-TASSER server

View On-line I-TASSER Video

What is I-TASSER server?

How does I-TASSER generate structure and function predictions?

LOMETS

In the second step, the continuous fragments excised from the PDB templates are reassembled into full-length models by replica-exchange Monte Carlo simulations with the threading unaligned regions (mainly loops) built by ab initio modeling. In cases where no appropriate template is identified by LOMETS, I-TASSER will build the whole structures by ab initio modeling. The low free-energy states are identified by SPICKER through clustering the simulation decoys.

In the third step, the fragment assembly simulation is performed again starting from the SPICKER cluster centroids, where the spatial restrains collected from both the LOMETS templates and the PDB structures by TM-align are used to guide the simulations. The purpose of the second iteration is to remove the steric clash as well as to refine the global topology of the cluster centroids. The decoys generated in the second simulations are then clustered and the lowest energy structures are selected. The final full-atomic models are obtained by REMO which builds the atomic details from the selected I-TASSER decoys through the optimization of the hydrogen-bonding network (see Figure 1).

Figure 1. I-TASSER protocol for protein structure and function prediction.

For predicting the biological function of the protein (the last column at Figure 1), the I-TASSER server matches the predicted 3D models to the proteins in 3 independent libraries which consist of proteins of known enzyme classification (EC) number, gene ontology (GO) vocabulary, and ligand-binding sites. The final results of function predictions are deduced from the consensus of top structural matches with the function scores calculated based on the confidence score of the I-TASSER structural models, the structural similarity between model and templates as evaluated by TM-score, and the sequence identity in the structurally aligned regions.

What are the performances of I-TASSER server compared with other methods?

CASP (or Critical Assessment of Techniques for Protein Structure Prediction)

The I-TASSER server (as "Zhang-Server") participated in the Server Section of 7th (2006), 8th (2008), 9th (2010), and 10th CASPs (2012), and was ranked as the No 1 server in CASP7 and CASP8. In CASP9 and CASP10, I-TASSER server and QUARK (another server from our lab) were ranked as No 1 and No 2 servers, respectively. The detailed rank results can be seen here for CASP7, CASP8, CASP9, and CASP10. Figure 2 shows histograms of the Z-score of GDT-TS scores of all servers in CASP7 (68 servers), CASP8 (72 servers), CASP9 (81 servers), and CASP9 (72 servers).

Figure 2. Histogram of Z-scores of all server groups at CASP7, CASP8, CASP9 and CASP10.

Figure 3 is a summary of COFACTOR, a component of I-TASSER server, in the function prediction section of CASP9, where COFACTOR was registered as "I-TASSER_FUNCTION" and "Zhang" in the server and human prediction sections, respectively. The picture was taken from the presentation by the CASP9 assessor Dr. T Schwede, see http://predictioncenter.org/casp9/doc/presentations/CASP9_FN.pdf.

Figure 3. Mean MCC Z-scores of the best ten groups in the Function Prediction in CASP9.

What are the output of the I-TASSER server if you submit a seqeunce?

Up to five full-length atomic models (ranked based on cluster density)
Estimated accuracy of the predicted models (including a confidence score of all models, and predicted TM-score and RMSD for the first model)
GIF images of the predicted models
Predicted secondary structures
Predicted solvent accessibility
Top 10 threading alignment from LOMETS
Top 10 proteins in PDB which are structurally closest to the predicted models
Predicted Enzyme Classification and the confidence score
Predicted GO terms and the confidence score
Predicted ligand-binding sites and the confidence score
An image of the predicted ligand-binding sites

here

How to interpret the output data generated by the I-TASSER server?

an example of I-TASSER output

I-TASSER message board

I-TASSER output annotation page

What are the 'top 10 templates used by I-TASSER'?
I-TASSER modeling starts from the structure templates identified by LOMETS from the PDB library. LOMETS is a meta-server threading approach containing multiple threading programs, where each program can generate tens of thousands of templates. I-TASSER only uses the templates of the highest significance in the threading alignments, which are measured by the Z-score (the difference between the raw and average scores in the unit of standard deviation). The top 10 templates are the 10 templates selected from the LOMETS threading programs. Usually, one (or two) template of the highest Z-score is selected from each threading program, where the threading programs are sorted by the average performance in the large-scale benchmark test experiments.
What is the 'top 5 models predicted by I-TASSER'?
For each target, I-TASSER simulations generate tens of thousands conformations (called decoys). To select the final models, I-TASSER uses SPICKER program to cluster all the decoys based on the pair-wise structure similarity, and report up to five models which corresponds to the five largest structure clusters. In Monte Carlo theory, the largest clusters correspond to the states of the largest partition function (or lowest free energy) and therefore have the highest confidence. The confidence of each model is quantitatively measured by C-score (see below). Since the top 5 models are ranked by the cluster size, it is possible that the lower-rank models have a higher C-score. Although the first model has a higher C-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models. If the I-TASSER simulations converge, it is possible to have less than 5 clusters generated. This is usually an indication that the models have a good quality because of the converged simulations.
What are 'Proteins structurally close to the target in the PDB'?
After the structure-assembly simulation, I-TASSER use TM-align program to match the first I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB which have the closest structural similarity (i.e. the highest TM-score) to the predicted I-TASSER model. Due to the structural similarity, these proteins often have similar function to the target. However, users are encouraged to use the data in 'Predicted function using COACH' to infer the biological function of the target protein, since COACH has been extensively trained to derive function from multi-source of sequence and structure features which has on average a much higher accuracy than the function annotations derived only from the global structure comparison.
What is C-score?
C-score is a confidence score for estimating the quality of predicted models by I-TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.
What is TM-score?
TM-score is a recently proposed scale for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will arise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. These cutoff does not depends on the protein length.
What is difference and relationship between C-score and TM-score?
TM-score (or RMSD) is a known standard for measuring structural similarity between two structures which are usually used to measure the accuracy of structure modeling when the native structure is known, while C-score is a metric that I-TASSER developed to estimate the confidence of the modeling. In case where the native structure is not known, it becomes necessary to predict the quality of the modeling prediction, i.e. what is the distance between the predicted model and the native structures? To answer this question, we tried predicted the TM-score and RMSD of the predicted models relative the native structures based on the C-score.
In a benchmark test set of 500 non-homologous proteins, we found that C-score is highly correlated with TM-score and RMSD. Correlation coefficient of C-score of the first model with TM-score to the native structure is 0.91, while the coefficient of C-score with RMSD to the native structure is 0.75. These data lay the base for the reliable prediction of the TM-score and RMSD using C-score. In the output section, I-TASSER only reports the quality prediction (TM-score and RMSD) for the first model, because it was found that the correlation between C-score and TM-score is weak for lower rank models. However, the C-score is listed for all models just for a reference.
Why some lower-rank models have higher C-score?
We have found that the cluster size is more robust than C-score for ranking the predicted models. The final I-TASSER models are therefore ranked based on cluster size rather than C-score in the output. Nevertheless, the C-score has a strong correlation with the quality of the final models, which has been used to quantitatively estimate the RMSD and TM-score of the final models relative to the native structure. Unfortunately, such strong correlation only occurs for the first predicted model from the largest cluster. Thus, the C-scores of the lower-rank models (i.e., models 2-5) are listed only for reference and a comparison among them is not advised. In other word, even though the lower-rank models may have a higher C-score than the first model in some cases, the first model is on average the most reliable and should be considered if without special reasons (e.g., from biological sense or experimental data).

How to use known information (e.g. templates and function) to improve I-TASSER modeling?

If users know some information about the structure of the modeled proteins, the information can be conveniently uploaded to the I-TASSER server. The information can significantly improve the quality of structural and function predictions.

The I-TASSER server currently accepts two types of user-specified restraints:

Assign contact/distance restraints: If you know what atom pairs should be in contact or in some distances, you can use this option to upload a text file including the contact and/or distance information of atom pairs.
Specify template without alignment: If you want I-TASSER to use a specific PDB structure as a template, you can use this option specify the PDB structure. You only need to type in the PDBID:ChainID, e.g. 1wor:A without specifying the target-template alignments. If the chain information is not present in the PDB file, indicate the ChainID using "_". I-TASSER will first fetch the structure from the PDB library and then generate the target-template alignment based on our in-house alignment tool, MUSTER.
Specify template without alignment: You can actually use any 3D structure as the template, which does not necessary exist in the PDB library. In this case, you can use this option to upload the 3D structure. This structure file must be in the standard PDB format. You do not need to input the target-template alignments. I-TASSER will generate target-template alignment based on our in-house alignment tool, MUSTER.
Specify template with alignment: This option allows you (usually the advanced users) to specify both template structure and the target-template alignment.

adding restraints to I-TASSER modeling

Can I exclude some proteins from the I-TASSER template library?

I-TASSER needs templates to generate high-resolution structure predictions. In general, excluding close templates will decrease the quality of the I-TASSER modeling. However, users can exclude some templates from the I-TASSER template library for some special purposes (e.g. knowning some templates are different from target, or benchmark testing of the current algorithms).

The I-TASSER server accept two ways of template excludings:

Exclude templates that are homologous to the query protein: The users can use this option to exclude templates from the I-TASSER template library, which are homologous to the query protein. The homology is defined based on the sequence identity cutoff, i.e. the number of identical residue between template and query divided by the total number of residues in the query sequence. For example, if you type "60%", I-TASSER will automatically exclude all templates which have a sequence identity >60% to the query protein. The minimum cutoff is set at 25% and all value below 25% will return as 25%.
Exclude specific template proteins: This option allows users to upload a list of template structures that will be excluded from the I-TASSER template library. As the PDB library is redundant and same protein can exist as multiple entries, I-TASSER server will by default exclude the user-specified templates as well as all templates that have a sequence identity >90% to the specified templates. Users can also specify a different sequence identity cutoff, e.g. 70%, where I-TASSER will exclude all templates with a sequence identity >70% to specified template proteins.
The format of the file should be "PDBID:ChainID %Sequence_Identity", e.g.
OR

How long does it take for I-TASSER to generate the predictions for your protein?

Currently, the major time consuming part in the I-TASSER protocol is the structural refinement assembly simulations. For those users who want a quicker reponse or those who do not need a refined models, we recommend them to use our LOMETS (meta-server) or MUSTER (single-server fold-recognition). Because these two servers do not attempt to refine the threading models, the response time is faster than the I-TASSER server.

What is new?

2022/12/15: An updated version of I-TASSER, D-I-TASSER (as 'UM-TBM'), was ranked as the No. 1 server in two categories of protein structure predictions in the 15th CASP experiment, including Multi-domain Targets, Single-domain Targets, and DMFold (as 'Zheng') was ranked as the No. 1 predictor in protein complex structure prediction category (i.e., Multi-chain Targets).
2022/04/13: A new platform, I-TASSER-MTD, specifically designed to model structure and function of multi-domain proteins, was accepted for publication in Nature Protocols.
2022/02/08: CR-I-TASSER couples I-TASSER simulation with cryo-EM density maps and significantly improves accuracy of protein structure determination; the article was published in Nature Methods.
2021/07/26: A new version of I-TASSER, C-I-TASSER, was published in Cell Reports Methods, which incorporates deep-learning contact-maps and significantly improve I-TASSER's ability in modeling non-homologous proteins.
2020/12/01: I-TASSER (as 'Zhang-Server') was ranked as the No. 1 protein structure prediction server in the 14th CASP experiment.
>> Read more I-TASSER news ...

How to cite I-TASSER

J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). (Download the PDF file).
A Roy, A Kucukural, Y Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, 5: 725-738 (2010) (download the PDF file)
Y Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, vol 9, 40 (2008). (download the PDF file).

Funding support

NSF Career Award 0746198

1027394

NSF ABI Award 1564756

NSF III Award 1901191

Contact information

I-TASSER Message Board

yangzhanglab

umich.edu | (734) 647-1549 | 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218