%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % The features designed by DAMpred prediction % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1. What information did DAMpred consider? DAMpred collected 70 features that are extracted from physicochemical properties, biological assembly and I-TASSER structural prediction. They are categorized into four groups based on their properties: 1) The physicochemical property features in DAMpred include the pharmcophore of the target residues and the mutation-induced environmental pharmacophore changes. 2) Evolution is a major driven force for protein structure and function determination, where sequence profiles from multiple sequence alignments contain information on how the protein families evolve. To identify distant-homology relations between sequences, three sequence profiles are collected in DAMpred by PSI-BLAST, LOMETS and Pfam separately. 3) The contact-environment based features deduced from the complex structural models built by the dimeric threader, SPRING. 4) I-TASSER was used to construct 3D models for both wild-type and mutant sequences, where two groups of structure-based features, on protein surface and physics-based energy terms, are extracted from the I-TASSER models. 2. Detailed explanations of the 70 features in DAMpred model. DAMpred examined the power of individual features by calculating the p-value of Mann-Whitney (MW) of their distributions between the disease-associated and neutral datasets. Here, the top 20 features with the lowest p-values are labeled "*". Pharmacophore for the wild-type residues 1) HPw Hydrophobic residue 2) noHPw Non-hydrophobic residue 3) ARw Aromatic rings 4) noARw Non-aromatic rings 5) PCw Positive charge 6) NCw Negative charge * 7) noCw Neutral charge 8) BPw Both wild-type and neighbor AA are polar 9) OPw Either of wild-type and neighbor AA is polar 10) NPw Both wild-type and neighbor are nonpolar 11) ACw The count of reside being the hydrogen acceptor 12) DOw The count of reside being the hydrogen donor Pharmacophore for the mutant residues 13) HPm Hydrophobic 14) noHPm Non-hydrophobic 15) ARm Aromatic rings *16) noARm Non-aromatic rings 17) PCm Positive charge 18) NCm Negative charge *19) noCm Neutral charge 20) BPm Both wild-type and neighbor AA are polar 21) OPm Either of wild-type and neighbor AA is polar 22) NPm Both wild-type and neighbor are nonpolar 23) ACm The count of eside being the hydrogen acceptor 24) DOm The count of reside being the hydrogen donor Mutation-induced environmental pharmacophore changes *25) cosWM cosin for the pharmacophores of wild-type and mutant residues 26) rmsWM RMSD for pharmacophores of wild-type and mutant residues 27) cosNWM cosin for neighbor pharmacophores of wild-type and mutant *28) rmsNWM RMSD for neighbor pharmacophores of wild-type and mutant 29) cosNSWM cosin for neighbor pharmacophores of wild-type and mutant residues related with single residue *30) rmsNSWM RMSD for neighbor pharmacophores of wild-type and mutant residues related with single residue 31) cosNPWM cosin for neighbor pharmacophores of wild-type and mutant residues related with residue paired 32) rmsNPWM RMSDfor neighbor pharmacophores of wild-type and mutant residues related with residue paired Other physicochemical properties 33) Volw The volume of wild-type residue 34) Volm The volume of mutant residue 35) dVol The volume difference 36) Ww The weight of wild-type residue 37) Wm The weight of mutant residue 38) dW The molecular weight difference PSI-BLAST profile scores *39) PSICw The PSIC score for wild-type residue *40) PSICm The PSIC score for mutant residue *41) dPSIC The PSIC score difference 42) JSDw The JSD score for wild-type residue 43) JSDm The JSD score for mutant residue 44) dJSD The JSD score difference *45) JSDi The JSD score at mutant position i LOMETS profile scores *46) tPSICw The PSIC score for wild-type residue 47) tPSICm The PSIC score for mutant residue *48) dtPSIC The PSIC score difference Pfam profile scores *49) Pfamw The Pfam score for wild-type residue *50) Pfamm The Pfam score for mutant residue *51) dPfam The Pfam score difference Directly contacted residues *52) Intra The number of intramolecular contacts 53) FunIntra The number of intramolecular functional contacts 54) Inter The number of intermolecular contacts 55) FunInter The number of intermolecular functional contacts Indirectly contacted residues *56) CIntra The number of intramolecular indirectly contacts 57) CFunIntra The number of intramolecular functional indirectly contacts 58) CInter The number of intermolecular indirectly contacts 59) CFunInter The number of intermolecular functional indirectly contacts Protein surface regions favorable for interactions 60) CS The ConCavity score for the wild-type score *61) Depth The average distance of atoms of wild-type residue to its closest molecule of bulk solvent The energy function 62) ED The EvoDesign score 63) ddG The stability changes upon mutation 64) VDWw Van Der Waals potential of the wild-type residue from CISS-RR 65) VDWm Van Der Waals potential of the mutant residue from CISS-RR *66) dVDW Van Der Waals potential difference 67) RTw Rotamer term which measures the preferences of the wild-type side-chain conformers from CISS-RR 68) CISRRw CIS-RR score for the wild-type residue *69) CISRRm CIS-RR score for the mutant residue 70) dCISRR CIS-RR score difference For more information about these features, please refer to the following article: Lijun Quan, Hongjie Wu, Qiang Lyu , and Yang Zhang, Recognizing disease-associated nsSNP mutations in human genome through a Bayes-guided artificaial neural-network model built on BioUnit and protein structure predictions, Submitted (2018)