INSTALLATION AND IMPLEMENTATION OF I-TASSER SUITE
   (Copyright 2012 by Zhang Lab, University of Michigan, All rights reserved)
                    (Version 2.1, 2012/10/23)

1. What is I-TASSER Suite?
   
   The I-TASSER Suite is a composite package of programs for protein
   structure prediction and protein function annotations. The Suite
   includes following programs:

   a) I-TASSER: A hierarchical program for protein structure prediction
   b) MUSTER: A threading program for protein template identication
   c) LOMETS: A meta-server approach consisting of multiple threading programs
   d) SPICKER: A clustering program for structure decoy selection
   e) HAAD: Quickly adding hydrogen atoms to protein heavy atom structure
   f) EDTSurf: Construct triangulated surfaces of protein molecules
   g) ModRefiner: Construct and refine atomic model from C-alpha traces
   h) NWalign: Protein sequence alignments by Needleman-Wunsch algorithm
   i) PSSpred: A program for Protein Secondary Structure PREDiction

2. How to install the I-TASSER Suite?

   a) download the I-TASSER Suite 'I-TASSER2.1.tar.gz' from
      http://zhanglab.dcmb.med.umich.edu/I-TASSER/download/download.php

   b) unpack 'I-TASSER2.1.tar.gz' by
      > tar -zxvf I-TASSER2.1.tar.gz
      The root path of this package is called $libdir, e.g. 
      /home/yourname/I-TASSER. You should view all the programs at this 
      directory.

3. Bug report:

   Please report bugs and problems to yangzhanglab@umich.edu or post them 
   at I-TASSER message board: 
   http://zhanglab.dcmb.med.umich.edu/forum

   #######################################################
   #                                                     #
   #  4. Installation and implementation of I-TASSER     #
   #                                                     #
   #######################################################
   
4.1. Introduction of I-TASSER
   
   I-TASSER is an integrated package for protein structure prediction. For a 
   given sequence, I-TASSER first identifies template proteins from the 
   Protein Data Bank (PDB) by multiple threading techniques (LOMETS). The 
   continuous fragments excised from the template alignments are used to 
   assemble full-length models by iterative Monte Carlo simulations. The best 
   models are then selected from the Monte Carlo trajectories by decoy 
   clustering. The final atomic models are rebuilt from the structure clusters 
   by atomic-level structural refinements.

4.2. Files needed to download in addition to 'I-TASSER2.1.tar.gz':
   
   a) Download I-TASSER library files from 
      http://zhanglab.dcmb.med.umich.edu/library/ 
      and files and directories to $libdir/.
   b) Download the non-rundant sequence database from
      http://zhanglab.dcmb.med.umich.edu/cgi-bin/download_ftp.cgi?ID=nr.tar.gz
      and decompress the file to $lib/nr. You can also download the newest nr
      database from NCBI ftp://ftp.ncbi.nlm.nih.gov/blast/db/ (recommended).

4.3. How to run I-TASSER?
   
   a) Main script for running I-TASSER is $libdir/I-TASSERmod/runI-TASSER.pl. 
      Run it directly without arguments will output the help information about
      the description of arguments.

   b) Four arguments must be set mandatorily. One example is: 

      "$libdir/I-TASSERmod/runI-TASSER.pl -libdir /home/yourname/I-TASSER -seqname 1a2bA -datadir /home/yourname/1a2bA -usrname yourname"

      -libdir  means the path of the I-TASSER package 
      -seqname means the unique name of your query sequence
      -datadir means the directory which contains your sequence 
      -usrname means your user name in the computer.

   c) Eleven other arguments are optional whose default values have been set.
      User can reset one or more of them. One example of command line is: 

      "$libdir/I-TASSERmod/runI-TASSER.pl -libdir /home/yourname/I-TASSER -seqname 1a2bA -datadir /home/yourname/1a2bA -usrname yourname -runstyle parallel -homoflag benchmark -idcut 0.3 -ntemp 15 -nmodel 3"

      -runstyle   default value is "serial" which means running I-TASSER simulation sequentially, 
                  "parallel" means running parallel simulation jobs in the cluster
      -homoflag   default value is "real" which means using all templates, "benchmark" means 
                  excluding homologous templates
      -idcut      sequence identity cutoff for "benchmark" runs, default value is 0.3, range 
                  is in [0,1]
      -ntemp      number of top templetes output for each threading program, default value is 
                  20, range is in [1,50]
      -nmodel     number of final models output by I-TASSER, default value is 5, range in [1,10]
      -restraint1 specify distance/contact restraints 
                  (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option1.html)
      -restraint2 specify template with alignment 
                  (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option4.html)
      -restraint3 specify template name without alignment 
                  (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option2.html)
      -restraint4 specify template file without alignment 
                  (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option3.html)
      -temp_excl  exclude specific templates from template library 
                  (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option6.html)
      -traj       using this option means to deposit the trajectory files
      -light      using this option means to run I-TASSER in light mode (each simulation runs 
                  five hours maximum)
      -java_home  specify the path of java in your machine (default path is /usr/java/latest)

   NOTE: 
   a) Outline of steps for running I-TASSER (runI-TASSER.pl):
	     a1) standardize 'seq.fasta' to 'seq.txt' and get the sequence length
	     a2) run 'psiblast' to generate 'chk', 'out', 'pssm', 'mtx' files
	         run 'PSSpred' to get 'seq.dat', 'seq.dat.ss'
		 run 'solve' to get 'exp.dat'
		 run 'pairmod' to get 'pair1.dat' and 'pair3.dat'
	     a3) run threading programs sequentially
	         run 'mkinit.pl' to generate restraints
	     a4) run I-TASSER simulation
	     a5) run SPICKER clustering program
	         run 'get_cscore.pl' to get confidence score
		 run 'EMrefinement.pl' to get full-atomic models
   b) 'seq.fasta' is the query sequence file in fasta format, which is the
      only needed input file for running I-TASSER. This file should be
      put in $datadir before running this job.
   c) I-TASSER structure assembly simulations contains 14 independent 
      runs by default. This number can be modified if the user wants to run
      more simulations, especially for big protein without good templates.
   d) If working with a cluster of multiple nodes, it is recommended to set 
      $runstyle="parallel". Parallel jobs will run faster since jobs are 
      distributed among different nodes. The default setting $runstyle="serial"
      will run all the jobs on a single computer.
   e) If you have run partial of the job and encounter some error, you can 
      rerun the main script without modification. It will check the existing 
      files and start from the correct position.

4.4. System requirement:

   a) x86_64 machine, Linux kernel OS
   b) Perl and java compilers should be installed.
   c) Basic compress and decompress package should be installed to support: 
      /bin/tar and /usr/bin/bunzip2.
   d) If you are using cluster, job management software should support: 
      /opt/torque/bin/qsub and /opt/torque/bin/qstat. 

4.5. How to cite I-TASSER?

   If you are using the I-TASSER package, you can cite:

   1. Y Zhang. I-TASSER server for protein 3D structure prediction. 
      BMC Bioinformatics, 9: 40 (2008).
   2. A Roy, A Kucukural, Y Zhang. I-TASSER: a unified platform 
      for automated protein structure and function prediction. 
      Nature Protocols, 5: 725-738 (2010).

   #######################################################
   #                                                     #
   #  5. Installation and implementation of MUSTER       #
   #                                                     #
   #######################################################
   
5.1. Introduction of MUSTER
   
   MUSTER (MUlti-Sources ThreadER) is a protein threading algorithm to 
   identify the template structures from the PDB library. It generates 
   sequence-template alignments by combining sequence profile-profile 
   alignment with multiple structural information.

5.2. How to install MUSTER program?

   When you unpack the I-TASSER Suite, MUSTER program is already installed.

5.3. How to run MUSTER program?

   The MUSTER main script is $libdir/I-TASSERmod/runMUSTER.pl. The running 
   option of this program is similar to that in runI-TASSER.pl. By running
   the program without arguement, you can print all the running options.

5.4. How to cite MUSTER?

   If you are using the MUSTER program, you can cite:

   S Wu, Y Zhang. MUSTER: Improving protein sequence profile-profile 
   alignments by using multiple sources of structure information. 
   Proteins, 72: 547-556 (2008).

   #######################################################
   #                                                     #
   #  6. Installation and implementation of LOMETS       #
   #                                                     #
   #######################################################
   
6.1. Introduction of LOMETS
   
   LOMETS (Local Meta-Threading-Server) is meta-server approach to protein
   fold-recognition. It consists of 8 individual threading programs: MUSTER,
   PPA, dPPA, dPPA2, sPPA, wPPA, wdPPA, wMUSTER. The last 7 programs are 
   variances of MUSTER which includes different optimized energy terms.

6.2. How to install LOMETS program?

   When you unpack the I-TASSER Suite, LOMETS programs are already installed.

6.3. How to run LOMETS program?

   The LOMETS main script is $libdir/I-TASSERmod/runLOMETS.pl. The running 
   option of this program is similar to that in 'runI-TASSER.pl'. By running
   the program without arguement, you can print all the running options.

6.4. How to cite LOMETS?

   If you are using the LOMETS program, you can cite:

   S Wu, Y Zhang. LOMETS: A local meta-threading-server for protein 
   structure prediction. Nucleic Acids Research, 35: 3375-3382 (2007).


   #######################################################
   #                                                     #
   #  7. Installation and implementation of SPICKER      #
   #                                                     #
   #######################################################
   
7.1. Introduction of SPICKER
   
   SPICKER is a clustering algorithm to identify the near-native models 
   from a pool of protein structure decoys.

7.2. How to install SPICKER program?

   When you unpack the I-TASSER Suite, SPICKER program is already installed
   at $libdir/I-TASSERmod/spicker45d

7.3. How to run SPICKER program?

   To run SPICKER, you need to prepare following input files:
       'rmsinp'---Mandatory, length of protein & piece for RMSD calculation;
       'seq.dat'--Mandatory, sequence file, for output of PDB models.
       'tra.in'---Mandatory, list of trajectory names used for clustering.
                  In the first line of 'tra.in', there are 3 parameters:
                  par1: number of decoy files
                  par2: 1, default cutoff, best for decoys from template-based 
                           modeling; 
                       -1, cutoff based on variation, best for decoys from 
                           ab initio modeling.
                  par3: 1, closc from all decoys; -1, closc clustered decoys
                  From second lines are file names which contain coordinates
                  of 3D structure decoys. All these files are mandatory. See 
                  attached 'rep1.tra1' for the format of decoys.
       'CA'-------Optional, native structure, for comparison to native.

     Output files of SPICKER include:
       'str.txt'-----list of structure in cluster;
       'combo*.pdb'--PDB format of cluster centroids;
       'closc*.pdb'--PDB format of structures closest to centroids;
       'rst.dat'-----summary of clustering results;

    A detailed readme file can be found at
    http://zhanglab.dcmb.med.umich.edu/SPICKER/readme

7.4. How to cite SPICKER?

   If you are using the SPICKER program, you can cite:

   Y Zhang, J Skolnick, SPICKER: Approach to clustering protein structures 
   for near-native model selection, Journal of Computational Chemistry, 
   25: 865-871 (2004).

   #######################################################
   #                                                     #
   #  8. Installation and implementation of HAAD         #
   #                                                     #
   #######################################################
   
8.1. Introduction of HAAD
   
   HAAD is a computer algorithm for constructing hydrogen atoms from 
   protein heavy-atom structures. The hydrgen is added by minimizing 
   atomic overlap and encouraging hydrogen bonding. 

8.2. How to install HAAD program?

   When you unpack the I-TASSER Suite, HAAD program is already installed
   at $libdir/abs/mybin/HAAD

8.3. How to run HAAD program?

   Hydrogen atoms in a PDB file(xx.pdb) can be added by running 
   "./HAAD xx.pdb", the output is "xx.pdb.h".

   In "xx.pdb.h", the label in column 57 presents the label for the atoms 
   that have been added by HAAD. When the value of the label is less 
   than 2, the position of the added atom is trustable.

8.4. How to cite HAAD?

   If you are using the HAAD program, you can cite:

   Y Li, A Roy, Y Zhang, HAAD: A Quick Algorithm for Accurate Prediction 
   of Hydrogen Atoms in Protein Structures, PLoS One, 4: e6701 (2009).


   #######################################################
   #                                                     #
   #  9. Installation and implementation of EDTSurf      #
   #                                                     #
   #######################################################
   
9.1. Introduction of EDTSurf
   
   EDTSurf is a program to construct triangulated surfaces for macromolecules. 
   It generates three major macromolecular surfaces: van der Waals surface, 
   solvent-accessible surface and molecular surface (solvent-excluded 
   surface). EDTsurf also identifies cavities which are inside of 
   macromolecules. 

9.2. How to install EDTSurf program?

   When you unpack the I-TASSER Suite, EDTSurf program is already installed
   at $libdir/bin/EDTSurf

9.3. How to use EDTSurf program?

   EDTSurf -i inputfile ...
   Specific options:
         -o prefix of output files (default is the prefix of inputfile)
         -t triangulation type, 1-MC 2-VCMC (default is 2)
         -s surface type, 1-VWS 2-SAS 3-MS (default is 3)
         -c color mode, 1-pure 2-atom 3-chain (default is 2)
         -p probe radius, float point in [0,2.0] (default is 1.4)
         -h inner or outer surface for output, 1-inner and outer 2-outer 
	    3-inner (default is 1)
         -f scale factor, float point in (0,20.0] (default is 4.0)

      Molecule is scaled by this factor to fit in a bounding box. Scale 
      factor is the larger the better, but will increase the memory use. 
      Our strategy is first enlarging the molecule to check if it exceeds 
      the maximum bounding box. If yes, then reset a proper scale factor 
      to fit the molecule in the maximum bounding box.

   By running EDTSurf itself, it will print out a brief description on how
   to use the program. A detail description of EDTSurf is available at
   http://zhanglab.dcmb.med.umich.edu/EDTSurf/

9.4. How to cite EDTSurf?

   If you are using the EDTSurf program, you can cite:

   D Xu, Y Zhang, Generating Triangulated Macromolecular Surfaces by Euclidean 
   Distance Transform. PLoS ONE 4: e8140 (2009).


   #######################################################
   #                                                     #
   #  10. Installation and implementation of ModRefiner  #
   #                                                     #
   #######################################################
   
10.1. Introduction of ModRefiner
   
   ModRefiner is a standalone program for atomic-level protein structure 
   construction and refinement. It includes two steps: (1) construct
   main-chain models from C-alpha trace; (2) build side-chain models
   and atomic-level structure refinement.

10.2. How to install ModRefiner program?

   When you unpack the I-TASSER Suite, ModRefiner program is already installed
   at $libdir/I-TASSERmod/ModRefiner.pl

10.3. How to use ModRefiner program?

   ModRefiner supports following four options:
   
   a) add side-chain heavy atoms to main-chain model without refinement
      > ModRefiner.pl 1 ID MD IM ON

   b) build main-chain model from C-alpha trace model
      > ModRefiner.pl 2 ID MD IM RM ON

   c) build full-atomic model from main-chain model
      > ModRefiner.pl 3 ID MD IM RM ON

   d) build full-atomic model from C-alpha trace model
      > ModRefiner.pl 4 ID MD IM RM ON

   ID: the path of the I-TASSER package, e.g. '/home/yourname/I-TASSER'
   MD: directory which contains the initial model, e.g. '/home/yourname/1a2bA'
   IM: the initial model to be refined, e.g. 'mode1.pdb'
   RM: reference model that refined model is driven to, e.g. 'combo1.pdb'.
       Only CA trace is needed and the length can be not full which will make 
       the refinement of the missing region flexible. If you don't have the
       referece model, use the name of IM instead.
   ON: the output name of the refined model, e.g. 'model1_ref.pdb'

   By running the program without arguement, you can print a brief description
   of how to use the program.
   
10.4. How to cite ModRefiner?

   If you are using the ModRefiner program, you can cite:

   D Xu, Y Zhang, Atomic-level protein structure construction and refinement
   from C-alpha traces (in preparation).


   #######################################################
   #                                                     #
   #  11. Installation and implementation of NWalign     #
   #                                                     #
   #######################################################
   
11.1. Introduction of NWalign
   
   NW-align is simple and robust alignment program for protein 
   sequence-to-sequence alignments based on the standard Needleman-Wunsch 
   dynamic programming algorithm. The mutation matrix is from BLOSUM62 
   with gap openning penaly=-11 and gap extension panalty=-1. 

11.2. How to install NWalign program?

   When you unpack the I-TASSER Suite, NWalign program is already installed
   at $libdir/bin/align.

11.3. How to use NWalign program?
   
   > align F1.fasta F2.fasta (align two sequences in fasta file)
   > align F1.pdb F2.pdb 1   (align two sequences in PDB file)
   > align F1.fasta F2.pdb 2 (align Sequence 1 in fasta and 2 in pdb)
   > align GKDGL EVADELVSE 3 (align sequences typed by keyboard)
   > align GKDGL F.fasta 4   (align Seq-1 by keyboard and 2 in fasta)
   > align GKDGL F.pdb 5     (align Seq-1 by keyboard and 2 in pdb)

   By running the program itself, it will print out the usage options of
   the program.

11.4. How to cite NWalign?

   There is no published paper associated with this program. If you are using
   the NWalign program, you can cite it as 

   Y Zhang, http://zhanglab.dcmb.med.umich.edu/NW-align

   #######################################################
   #                                                     #
   #  12. Installation and implementation of PSSpred     #
   #                                                     #
   #######################################################
   
12.1 Introduction of PSSpred

   PSSpred (Protein Secondary Structure PREDiction) is a simple neural network 
   training algorithm for accurate protein secondary structure prediction. It first 
   collects multiple sequence alignments using PSI-BLAST. Amino-acid frequence and 
   log-odds data with Henikoff weights are then used to train secondary structure, 
   separately, based on the Rumelhart error backpropagation method. The final 
   secondary structure prediction result is a combination of 7 neural network 
   predictors from different profile data and parameters.

12.2 How to install PSSpred program?

   When you unpack the I-TASSER Suite, NWalign program is already installed
   at $libdir/PSSpred
   
12.3 How to use PSSpred program?   

   $libdir/PSSpred/mPSSpred.pl seq.txt $libdir

   Please note that 'seq.txt' should be in current directory and the script will
   generate two files 'seq.dat' and 'seq.dat.ss' in the current folder. Here, 
   $libdir is the root path of I-TASSER package.
 
12.4 How to cite PSSpred?

   If you are using the PSSpred program, you can cite:
   http://zhanglab.dcmb.med.umich.edu/PSSpred