CTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
| Home | Instructions | Datasets | Citing | Disclaimer | |
These datasets are subsets of ProTherm.
Dataset 1
1784 variations from 80 proteins with experimentally determined ΔΔG values in ProTherm. 1154 positive cases of which 931 are destabilizing (ΔΔG ≤0.5 kcal/mol), 222 are stabilizing (ΔΔG ≥ -0.5 kcal/mol), and 631 neutral cases (0.5 kcal/mol > ΔΔG < -0.5 kcal/mol).
F
Reference:
Khan S, Vihinen M. Performance of protein stability predictors. Hum Mutat. 2010, 31(6):675-684. PUBMED
Dataset 2
2156 variations combined from a list of 964 single variations (Guerois et al. 2002) and from a set of 2972 single variations from the ProTherm after filtering for duplicate entries. NMR determined structures were excluded and only the average ΔΔG value was given when several ΔΔG values were present for a single variation.
Reference: Potapov V, Cohen M, Schreiber G. Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel. 2009, 22(9):553-560. PUBMED
Dataset 3
Training dataset of 339 experimentally studied variants in nine proteins and 625 variants from ProTherm.
Reference: Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002, 320(2):369-387. PUBMED
Dataset 4
S1615 was used for training/testing the neural network system. S388 was used as the test data and contains 388 variations collected only at physiological conditions. S388 is a subset of S1615. Only single variations with ΔΔG in ProTherm and structures deposited to PDB.
References: Capriotti E, Fariselli P, Casadio R. A neural-network-based method for predicting protein stability changes upon single point mutations. Bioinformatics. 2004, 20 Suppl 1:i63-68. PUBMED
Dataset 5
The correctness and quality of each variant was checked manually. The dataset contains 1564 variations from 99 proteins.
References: Yang, Y., Urolagin, S., Niroula, A., Ding, X., Shen, B., & Vihinen, M. (2018). PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality. International journal of molecular sciences, 19(4), 1009. doi:10.3390/ijms19041009. PUBMED
Dataset 6
Datasets used for I-Mutant2.0.
Reference: Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 2005, 33:W306-W310. PUBMED
Dataset 7
Datasets used by Saraboji and coworkers.
Reference: Saraboji, K.; Gromiha, M. M.; Ponnuswamy, M. N. Average assignment method for predicting the stability of protein mutants. Biopolymers 2006, 82:80-92 doi: 10.1002/bip.20462. PUBMED
Dataset 8
Dataset used for iPTREE-STAB
Reference: Huang, L. T.; Gromiha, M. M.; Ho, S. Y. iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations. Bioinformatics 2007, 23:1292-1293. PUBMED
Dataset 9
Datasets used for SVM-WIN31 and SVM-3D12
Reference: Capriotti, E.; Fariselli, P.; Rossi, I.; Casadio, R. A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 2008, 9 ddGSuppl 2: S6. doi: 10.1186/1471-2105-9-S2-S6 PUBMED
Dataset 10
Dataset used for PoPMuSiC-2.0
Reference: Dehouck, Y.; Grosfils, A.; Folch, B.; Gilis, D.; Bogaerts, P.; Rooman, M. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 2009, 25:2537-2543 doi: 10.1093/bioinformatics/btp445 PUBMED
Dataset 11
Dataset used for sMMGB
Reference: Zhang, Z.; Wang, L.; Gao, Y.; Zhang, J.; Zhenirovskyy, M.; Alexov, E. Predicting folding free energy changes upon single point mutations. Bioinformatics 2012, 28:664-671. doi: 10.1093/bioinformatics/bts005 PUBMED
Dataset 12
Dataset used for M8 and M47
Reference: Yang, Y.; Chen, B.; Tan, G.; Vihinen, M.; Shen, B. Structure-based prediction of the effects of a missense variant on protein stability. Amino Acids 2013, 44:847-855 doi: 10.1007/s00726-012-1407-7 PUBMED
Dataset 13
Dataset used for EASE-MM
Reference: Folkman, L.; Stantic, B.; Sattar, A. Feature-based multiple models improve classification of mutation-induced stability changes. BMC Genomics 2014, 15 Suppl 4:S6 doi: 10.1186/1471-2164-15-S4-S6 PUBMED
Dataset 14
Dataset used for HoTMuSiC
Reference: Pucci, F.; Bourgeas, R.; Rooman, M. Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiC. Sci Rep 2016, 6:23257 doi: 10.1038/srep23257 PUBMED
Dataset 15
Dataset used for SAAFEC
Reference: Getov, I.; Petukh, M.; Alexov, E. SAAFEC: Predicting the Effect of Single Point Mutations on Protein Folding Free Energy Using a Knowledge-Modified MM/PBSA Approach. Int J Mol Sci 2016, 17:512 doi: 10.3390/ijms1704051 PUBMED
Dataset 16
Dataset used for STRUM
Reference: Quan, L.; Lv, Q.; Zhang, Y. STRUM: structure-based prediction of protein stability changes upon single-point mutation. Bioinformatics 2016, 32:2936-2946 doi: 10.1093/bioinformatics/btw361 PUBMED
Dataset 17
Dataset used for a metapredictor
Reference: Broom, A.; Jacobi, Z.; Trainor, K.; Meiering, E. M. Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 2017, 292:14349-14361 doi: 10.1074/jbc.M117.784165 PUBMED
Dataset 18
Dataset used for Automute
Reference: Masso, M.; Vaisman, II. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics 2008, 24:2002-2009 doi: 10.1093/bioinformatics/btn353 PUBMED
Dataset 19
Dataset for TP53 variants
Reference: Pires, DE.; Ascher, DB.; Blundell, TL. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 2014, 30:335-342 doi: 10.1093/bioinformatics/btt691 PUBMED
Dataset 20
Dataset Ssym composed of 684 single-site variations inserted in 357 protein structures
Reference: Pucci, F.; Bernaerts, KV.; Kwasigroch, JM.; Rooman, M. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics 2018, bty348 doi: 10.1093/bioinformatics/bty348 PUBMED
Dataset 21
F1 is for a alanine-scanning mutagenesis dataset including 768 “hot spots,” or amino acid side chains that are predicted to significantly destabilize the interface when altered to alanine. F2 is 2971 ProTherm single variations, F3 is 2154 variations from Potapov et al. [PMID:19561092], F4 is 1005 variations from Guerois et al. [PMID:12079393] and F5 is 380 variations from Kortemme and Baker dataset.
References: Kortemme, T.; Kim, D.E.; Baker, D. Computational Alanine Scanning of Protein-Protein Interfaces.SCIENCE'S STKE10, FEB 2004 : PL2. PUBMED
Tanja Kortemme, David Baker, A simple physical model for binding energy hot spots in protein–protein complexes, Proceedings of the National Academy of Sciences Oct 2002, 99 (22) 14116-14121; DOI: 10.1073/pnas.202485799. PUBMED
Dataset 22
The file is a set comprised of 1210 single mutations obtained from Protherm.
Reference: Kellogg, E. H., Leaver-Fay, A., & Baker, D. (2010). Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins, 79(3), 830-8. PUBMED
Dataset 23
Dataset for PreTherMut
Both single and multiple variants. M-dataset 3366 variants, 836 stability increasing, 2530 stability decreasing variants.
Reference: Tian, J., Wu, N., Chu, X., Fan, Y., Predicting changes in protein thermostability brought about by single- or multi-site mutations, BMC Bioinformatics;11:370. doi: 10.1186/1471-2105-11-370. PUBMED
Dataset 24
Dataset for iStable
F1 contains M3131 positive (increasing stability) and dataset F2 contains negative (decreasing stability) dataset. F3 is a training data set with 1311 data and F4 is a training data set with 1820 variants.
Reference: Chen, C., Lin, J., Chu, Y., Stable: off-the-shelf predictor integration for predicting protein stability changes, BMC Bioinformatics;14 Suppl 2:S5. doi: 10.1186/1471-2105-14-S2-S5. PUBMED
Dataset 25
CAGI frataxin benchmark cases
F contains experimentally-determined ΔΔG values (in kcal / mol)
Reference: Strokach, A., Corbi-Verge, C., Kim, P.M., Predicting changes in protein stability caused by mutation using sequence-and structure-based methods in a CAGI5 blind challenge, Hum Mutat;40(9):1414-1423. doi: 10.1002/humu.23852. PUBMED
Dataset 26
Dataset for iStable2
F1 is a training dataset (S3568), F2 is a test set (S630)
Reference: Chen, C.W., Lin,M.H., Liao, C.C., Chang, H.P., Chu, Y.W., iStable 2.0: Predicting protein thermal stability changes by integrating various characteristic modules, Comput Struct Biotechnol J;18:622-630. doi: 10.1016/j.csbj.2020.02.021. PUBMED
Dataset 27
Dataset for benchmarking study
1024 variants, 585 destabilizing, 168 slightly destabilizing, 103 slightly stabilizing, 147 stabilizing, 21 no effect
Reference: Marabotti, A., Prete, E.D., Scafuri, B., Facchiano, A., Performance of Web tools for predicting changes in protein stability caused by mutations, BMC Bioinformatics;22(Suppl 7):345. doi: 10.1186/s12859-021-04238-w. PUBMED
Dataset 28
Dataset for Thermonet
The data sets consisting of Q3214 and Q1744 variants and their associated experimental ΔΔGs.
F1 contains Q3214 data, F2 contains Q3214 reverse variants, F3 contains Q3214 direct variants, F4 contains Q1744 data, F5 contains Q1744 reverse variants, F6 contains Q1744 direct variants
Reference: Li, B., Yang, Y.T., Capra, J.A., Gerstein, M.B., Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks, PLoS Comput Biol . 2020 Nov 30;16(11):e1008291. doi: 10.1371/journal.pcbi.1008291. PUBMED
Dataset 29
Dataset for ACDC-NN, free energy change prediction
S2648 contains 2,648 manually curated variants with experimentally measured ∆∆G values
Ssym provides variations on proteins whose wildtype and variant 3D structures are solved by X-ray crystallography. It contains 684 variations, and half of them are reverse variations
vb1423 variants
Reference: S Benevenuta, C Pancotti, P Fariselli, G Birolo and T Sanavia, An antisymmetric neural network to predict free energy changes in protein variants, S Benevenuta et al 2021 J. Phys. D: Appl. Phys. 54 245403. PUBMED
Dataset 30
Dataset for benchmark study. The file contains 19 experimental structures for the direct variants and 342 experimental structures for each of the reverse variants are known.
Reference: C Pancotti, S Benevenuta, G Birolo, V Alberini, V Repetto, T Sanavia, E Capriotti, P Fariselli, Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset, Brief Bioinform. 2022 Mar 10;23(2):bbab555. doi: 10.1093/bib/bbab555. PUBMED
These datasets contain cases with double variants
Dataset 1
Dataset used for WET-STAB
Reference: Huang, LT.; Gromiha, MM. Reliable prediction of protein thermostability change upon double mutation from amino acid sequence. Bioinformatics 2009, 25:2181-2187 doi: 10.1093/bioinformatics/btp370 PUBMED
Last updated: 2022-02-22 by Niloofar Shirvanizadeh.