We are interested in:



Protein Structure Prediction
    Protein structure prediction refers to the effort to construct 3-dimensional shape of protein molecules from the amino acid sequence by computational calculations. Our lab has developed a number of algorithms for protein 3D structure prediction, including I-TASSER for iterative protein structure assembly, QUARK for ab initio protein folding, and MUSTER and LOMETS for protein template structure identification, some of which have been considered as the world's best and widely used by the community.

    The Critical Assessment of Structure Prediction (CASP) is a community-wide experiment, which designs to benchmark the state-of-the-art of protein structure prediction in every two years since 1994. Our lab has participated as "Zhang-Server" in the automated structure prediction section in the experiments held in 2006, 2008 and 2010. The method was ranked at the top position in all three experiments (CASP 7-9) (Table 1).

    Table 1. Top ten groups in automated structure prediction in CASP 7-9, ranked based on cumulative GDT-TS score of first model.
    (Data were taken from http://predictioncenter.org. When multiple servers are from same lab, the best server was listed)

    The most difficult problem in protein structure prediction is the modeling of proteins which have no solved structures that can be used as template, commonly referred "ab initio" or "free modeling (FM)" modeling. Figure 1 shows a successful example of ab initio modeling on a FM target (T0604_1) in CASP9, where the first model by the I-TASSER server has a RMSD 2.66 Angstroms to the X-ray crystal structure.


    Figure 1. The first model by the I-TASSER server versus the crystal structure of T0604_1, a FM target in CASP9.
    This is the VP0956 protein from Vibrio parahaemolyticus, solved by the Northeast Structural Genomics Consortium.

    Despite the successes, there are still significant unsolved problems in protein structure prediction, which will be the target of our lab in the next few years. These include:

    1. How to build structures of experimental resolution (below 1-2 Angstroms, useful for drug screening) when homologous templates are available?
    2. How to identify distantly homologous templates with accurate query-template alignments?
    3. How to fold proteins (especially the beta-proteins) with correct topology by ab initio modeling, when no templates exist?
    4. How to fold membrane proteins?
Protein Function Prediction
    Given the amino acid sequence, can we tell what the protein molecule does in living cells? We have developed COFACTOR for protein function prediction, based on the sequence-to-structure-to-function paradigm. From the amino acid sequence, 3D structures are first constructed by I-TASSER. The functional insights (including enzyme classification, gene ontology, and ligand binding specificity) are then deduced by the local and global comparison of the structural models with proteins of known functions (Figure 2).


    Figure 2. Protein function annotation based on the sequence-to-structure-to-function paradigm. The right
    panel is the funcation homologs identified by global (a) and local (b) matches of I-TASSER models.

    The COFACTOR was tested in the community-wide CASP9 experiment as "I-TASSER_FUNCTION" in the Server section and as "ZHANG" in the Human section, which were ranked at the first two positions in both Z-score and the Matthews correlation coefficient (MCC) rankng compared with the experimental data (Figure 2a).



    Figure 2a. Mean MCC Z-scores of the best ten groups in the Function Prediction in CASP9.
    (The picture was taken from the presentation by the CASP9 assessor Dr. T Schwede).

Protein Design
    Protein design refers to the effort to design new protein molecules of a desired 3D structure and function. It is a reverse procedure of protein structure prediction and the solution of the problem therefore highly relies on the extent of our understanding on the principle of protein folding (Figure 3).

    Figure 3. Protein design is a reverse procedure of protein structure prediction.

    We recently designed a number of new protein sequences based on a physics-based atomic force field with the lowest free-energy state searched by Monte Carlo simulation, followed by sequence-based clustering. The designed protein sequence can be folded by I-TASSER with a RMSD <2 Angstroms in 62% of cases, despite that the I-TASSER force field differs significantly from that used in the design. Figure 4 shows three representative examples of the target protein structure and I-TASSER model of the designed sequences.

    Figure 4. I-TASSER models of design sequences (red) versus crystal structure of target proteins (green)
    for calcium-binding domain of Calx (3E9TA), odorant binding protein (2ERBA), and peptidyl-tRNA
    hydrolase (1WN2A). The sequence identities of the designed and target sequences are all below 30%.

    We are now working on the design of protein molecules associated with human breast cancer and on solicitation of experimental validation. For speeding up the design and test procedure, we are developing new protein design methods by the aid of protein threading and fold-recognition methods.

Modeling of G Protein-Coupled Receptor and Ligand-Receptor Interactions
Amyloid Diseases and Fiber Aggregation
    When a cell creates a protein, it could either make the actual protein or some peptide fragments. The fragments can sometimes "mis-fold" into insoluble protein fibers of uniform beta-pleated sheets, called amyloid fiber. When these fibrils abnormally accumulate in tissues and organs, the patient may suffer from serious amyloidosis. If this happens in the brain, for example, degeneration of neuronal processes and synaptic abnormalities may appear, resulting amyloidosis including Alzheimer's, Parkinson's, and Huntington's diseases (Figure 7).

    Figure 7. A normal aged brain (left) versus an Alzheimer's patient's brain with amyloid fiber aggragated (right).

    To understand the mechanisms of amyloid fiber formation, we developed a new approach to the asymptotic solution of the fiber aggregation master equation. It was found that four distinct stages, lag phase, exponential growth phase, breaking phase and static phase, dominate the fiber formation process. Amyloid proteins can thus be classified into four hierarchical groups based on the fiber formation half-time and growth rate (Figure 8).

    Figure 8. Amyloid proteins consist of four distinct types according to nucleation mechanism.

Modeling of Protein-Protein Interactions
RNA Alternative Splicing
Ligand Screening and Structure-Based Drug Design
    In terms of the lock-and-key metaphor, drug design is essentially a procedure to find an appropriate compound molecule (the key) which can match well with the active site pocket of the target protein (the lock). Therefore, an important step of structure-based rational drug design is to use the experimental or predicted 3D structure of the target protein to screen compound databases with the purpose of identifying appropriate drugs which can inhibit or activate the protein (Figure 13).

    Figure 13. A successful example of structure-based drug design by Bugg et al. in 1990s in designing a molecule
    that inhibits enzyme purine nucleoside phosphorylase (PNP). PNP normally takes up individual nucleosides (a)
    and cleaves the purine from the sugar, giving rise to a free purine base and a phosphorylated sugar (b).
    A tightly fitting compound blocks the binding pocket and therefore inhibits the acitivity of the PNP enzyme (c).

    We recently developed a composite approach for druglike compound identification, which combines structure-based virtual screening with quantitative structure-activity relationship (QSAR). When using the approach to the epidermal growth factor receptor (EGFR), an important target protein associated with brain, lung, bladder and colon tumors, we found that two compounds (2 and 21) have significant EGFR-inhibitory activities (Figure 14). The experimental assay to test the ability of the compounds in inhibiting the receptor proteins is in progress.

    Figure 14. Binding structure of two compounds screened from the ZINC library which have inhibitory
    activity on the epidermal growth factor receptor (EGFR), an important tumor target protein.

 


yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue Ann Arbor, MI 48109-2218