| Home | Instructions | Datasets | Citing | Disclaimer | |
DATASET 1
168 experimentally verified mismatch repair system (MMR) amino acid substitutions with known functional effect from the literature. There are
Reference: Ali H, Olatubosun A, Vihinen M. Classification of mismatch repair gene missense variants with PON-MMR. Hum Mutat. 2012 Apr; 33(4):642-50. doi: 10.1002/humu.22038. PUBMED
DATASET 2
224 validated MMR amino acid substitutions with known functional effect from InSiGHT database. There are
Reference: Niroula A, Vihinen M. 2015. Classification of amino acid substitutions in mismatch repair proteins using PON-MMR2. Hum Mutat 36(12):1128-1134 PUBMED
DATASET 3
146 single nucleotide substitutions in human mitochondrial tRNAs. There are 91 pathogenic and 55 neutral variations in the dataset.
Reference: Niroula A, Vihinen M. 2016. PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations. Nucleic Acids Res 44(5):2020-2027. PUBMED
DATASET 4
152 disease (XLA)-associated single amino acid-substitution caused amino acid substitions (SNAVs) in 91 residues.
Reference: Valiaho, J. , Faisal, I. , Ortutay, C. , Smith, C. I. and Vihinen, M. (2015), Characterization of all Possible Single-Nucleotide Change Caused Amino Acid Substitutions in the Kinase Domain of Bruton Tyrosine Kinase. Human Mutation, 36: 638-647. doi:10.1002/humu.22791. PUBMED
DATASET 5
Dataset used for Kinact
384 amino acid substitutions in protein kinases in F1, 258 of which were mapped to experimentally solved 3D structures in F2.
Reference: Rodrigues, C. H., Ascher, D. B., & Pires, D. E. (2018). Kinact: a computational approach for predicting activating missense mutations in protein kinases. Nucleic acids research, 46(W1), W127-W132. PUBMED
DATASET 6
Dataset used for KinMutBase
KinMutBase is a comprehensive knowledge base for human diseaserelated variations in protein kinase domains. The latest version contains 1414 variations.
Reference: Ortutay, C. , Valiaho, J. , Stenberg, K. and Vihinen, M. (2005), KInMutBase: A registry of disease-causing mutations in protein kinase domains. Hum. Mutat., 25: 435-442. doi:10.1002/humu.20166. PUBMED
DATASET 7
Dataset used for Kin-Driver
Somatic variations in protein kinases with experimental evidence demonstrating their functional role. Database v82 contains 783 variations.
Reference: Simonetti, F. L., Tornador, C., Nabau-Moret, N., Molina-Vila, M. A., & Marino-Buslje, C. (2014). Kin-Driver: a database of driver mutations in protein kinases. Database : the journal of biological databases and curation, 2014, bau104. doi:10.1093/database/bau104. PUBMED
DATASET 8
Nonsynonymous coding SNVs in protein kinases. F1 contains 1463 disease-causing variants, F2 999 unknown disease causing (uDCs) variants and F3 contains 302 benign variants from Swiss-Prot.
References: A Torkamani, N J. Schork; Accurate prediction of deleterious protein kinase polymorphisms, Bioinformatics, Volume 23, Issue 21, 1 November 2007, Pages 2918-2925, https://doi.org/10.1093/bioinformatics/btm437. PUBMED A Torkamani, N J. Schork, (2007) Distribution analysis of nonsynonymous polymorphisms within the human kinase gene family. Genomics, Volume 90, Issue 1, 2007, Pages 49-58, ISSN 0888-7543, https://doi.org/10.1016/j.ygeno.2007.03.006. PUBMED
DATASET 9
Dataset for wKinMut
865 and 2627 disease-causing and neutral non-synonymous variants in human protein kinases.
Reference: Izarzugaza, J. M., Vazquez, M., del Pozo, A., & Valencia, A. (2013). wKinMut: an integrated tool for the analysis and interpretation of mutations in human protein kinases. BMC bioinformatics, 14, 345. doi:10.1186/1471-2105-14-345. PUBMED
DATASET 10
Dataset used for PTENpred
676 nonsynonymous SNVs in a tumor-suppressor PTEN.
Reference: Johnston, S. B., & Raines, R. T. (2016). PTENpred: A Designer Protein Impact Predictor for PTEN-related Disorders. Journal of computational biology : a journal of computational molecular cell biology, 23(12), 969-975. PUBMED
DATASET 11
Pathogenic and neutral variants for 82 proteins used to compare generic and protein specific predictors.
Reference: Riera C, Padilla N and de la Cruz X, 2016. The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions. Hum Mutat 37:1013–1024 PUBMED
DATASET 12
166 damaging and 21 benign amino acid substitutions in neurodegenerative disorder Niemann-Pick disease type C (NP-C).Reference: Adebali, O., Reznik, A. O., Ory, D. S., & Zhulin, I. B. (2016). Establishing the precise evolutionary history of a gene improves prediction of disease-causing missense mutations. Genetics in medicine : official journal of the American College of Medical Genetics, 18(10), 1029-36. PUBMED
DATASET 13
Dataset used for DPYDVarifier
Deleterious variants in dihydropyrimidine dehydrogenase (DPD, DPYD gene). F1 contains 69 variants with 30% or greater reduction in activity compared to wild type DPD. F2 contains 295 germline variants reported in dbSNP.
References: Hamzic S, Amstutz U, Largiader C, Come a long way, still a ways to go: from predicting and preventing fluoropyrimidine toxicity to increased efficacy?, Pharmacogenomics, 10.2217/pgs-2018-0040, 19, 8, (689-692), (2018). PUBMED Shrestha S, Zhang C, Jerde C, Nie Q, Li H, Offer S, Diasio R (2018). Gene-Specific Variant Classifier (DPYD-Varifier) to Identify Deleterious Alleles of Dihydropyrimidine Dehydrogenase, CLINICAL PHARMACOLOGY & THERAPEUTICS, 104(4), 709-718. PUBMED
DATASET 14
Database of BRCA1/2 missense variants
F1 contains 201 sequence alterations in BRCA1 or BRCA2 in a cohort of 523 index patients of families with HBOC. F2 contains 68 missense variants in BRCA1 or BRCA2 in a cohort of 523 index patients of families with HBOC.Reference: Sadowski C, Kohlstedt D, Meisel C, Keller K, Becker K, Mackenroth L, Rump A, Schrck E, Wimberger P, Kast K, BRCA1/2 missense mutations and the value of in-silico analyses, European Journal of Medical Genetics, Volume 60, Issue 11, 2017, Pages 572-577, ISSN 1769-7212, https://doi.org/10.1016/j.ejmg.2017.08.005. PUBMED
DATASET 15
20 Cystic fibrosis transmembrane conductance regulator (CFTR) nucleotide-binding domain (NBD) variants in F1. F2 contains 11 newly characterized NBD variants.Reference: Masica, D. L., Sosnay, P. R., Raraigh, K. S., Cutting, G. R., & Karchin, R. (2014). Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity. Human molecular genetics, 24(7), 1908-17.. PUBMED
DATASET 16
Dataset for HApredictor
1138 factor VIII amino acid substitutions from hemophilia A (HA) patients.
Reference: Hamasaki-Katagiri, N., Salari, R., Wu, A., Qi, Y., Schiller, T., Filiberto, A. C., Schisterman, E. F., Komar, A. A., Przytycka, T. M., Kimchi-Sarfaty, C. (2013). A gene-specific method for predicting hemophilia-causing point mutations. Journal of molecular biology, 425(21), 4023-33. PUBMED
DATASET 17
Dataset for MutaCYP
Cytochrome P450 monooxygenase (CYP) variation datasets. F1 is a control set CS30, F2 is a training dataset of 285 variants in 15 CYPs. F3 contains 328 variants in blind dataset, where association with a disease is not entirely clear.
Reference: Fechter, K., & Porollo, A. (2014). MutaCYP: Classification of missense mutations in human cytochromes P450. BMC medical genomics, 7, 47. doi:10.1186/1755-8794-7-47. PUBMED
DATASET 18
Non-synonymous single nucleotide variants in voltage-gated potassium (Kv) channels causing diseases. F1 contains 1259 variants in training dataset and F2 contains 176 variants in test dataset.
Reference: L. F. Stead, I. C. Wood, D. R. Westhead (2011) KvSNP: accurately predicting the effect of genetic variants in voltage-gated potassium channels, Bioinformatics, Volume 27, Issue 16, 15 August 2011, Pages 2181-2186, https://doi.org/10.1093/bioinformatics/btr365. PUBMED
DATASET 19
Dataset for CFTR-MetaPred
Cystic fibrosis transmembrane conductance regulator (CFTR). F1 contains 1899 variants of clinical significance and F2 contains subset of 1210 amino acid substitutions
Reference: Rychkova A, Buu M, Scharfe C, Lefterova M, Odegaard J, Schrijver I, Milla C, Bustamante C, Developing Gene-Specific Meta-Predictor of Variant Pathogenicity, doi: https://doi.org/10.1101/115956 PUBMED
DATASET 20
Dataset for CYSMA, CFTR amino acid substitution predictor
Dataset of 128 disease-causing and 13 non-disease-causing variants
Reference: Sasorith S, David Baux D, Bergougnoux A, Paulet D, Lahure A, Bareil C, Taulan-Cadars M, Roux A, Koenig M, Claustres M, Raynal C, The CYSMA web server: An example of integrative tool for in silico analysis of missense variants identified in Mendelian disorders, Hum Mutat;41(2):375-386. doi: 10.1002/humu.23941. PUBMED
DATASET 21
Dataset for KinMutRF, Disease-related protein kinase family variants KinMutRF
Reference: Pons T, Vazquez M, Matey-Hernandez M, Brunak S, Valencia A, Izarzugaza J, KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily, BMC Genomics;17 Suppl 2(Suppl 2):396. doi: 10.1186/s12864-016-2723-1. PUBMED
DATASET 22
Cardiac sodium channel variants
1392 variants, 370 pathogenic, 602 benign, 420 UVs
Reference: Tarnovskaya S, Korkosh V, Zhorov B, Frishman D, Predicting novel disease mutations in the cardiac sodium channel, Biochem Biophys Res Commun;521(3):603-611. doi: 10.1016/j.bbrc.2019.10.142 PUBMED
DATASET 23
Dataset for SCN9A variants
31 pathogenic and 54 neutral variants
Reference: Toffano A, Chiarot G, Zamuner S, Marchi M, Salvi E, Waxman S, Faber C, Lauria G, Giacometti A, Simeoni M, Computational pipeline to probe NaV1.7 gain-of-function variants in neuropathic painful syndromes, Sci Rep;10(1):17930. doi: 10.1038/s41598-020-74591-y. PUBMED
DATASET 24
Dataset for troponin variants
136 pathogenic or likely pathogenic amino acid substitutions in Tn genes: 13 in cardiac TnC (TNNC1), 65 in cardiac TnT (TNNT2) and 58 in cardiac TnI (TNNI3)
Reference: Shakur R, Ochoa J, Robinson A, Niroula A, Chandran A, Rahman T, Vihinen M, Monserrat L, Prognostic implications of troponin T variations in inherited cardiomyopathies using systems biology, NPJ Genom Med;6(1):47. doi: 10.1038/s41525-021-00204-w. PUBMED
DATASET 25
Dataset for IDUA
Reference: Borges P, Pasqualim G, Matte U, Which Is the Best In Silico Program for the Missense Variations in IDUA Gene? A Comparison of 33 Programs Plus a Conservation Score and Evaluation of 586 Missense Variants, Front Mol Biosci. 2021 Oct 21;8:752797. doi: 10.3389/fmolb.2021.752797. PUBMED
Last updated: 2021-02-24 by Niloofar Shirvanizadeh.