| Home | Instructions | Datasets | Citing | Disclaimer | |
A. Gain of function datasets
Dataset 1
Dataset for fuNCion
F1 contains pathogenic variants used in training, 518 lof and 309 gof variants in voltage-gated sodium and calcium channels. F2 contains variants from gnomAD SCN CACNA1 genes (neutral variants used in training)
Reference: H Heyne, D Baez-Nieto et al, Predicting functional effects of missense variants in voltage-gated sodium and calcium channels, Sci Transl Med;12(556):eaay6848. doi: 10.1126/scitranslmed.aay6848. PUBMED
B. Deep mutational datasets
Dataset 1
Dataset for DeepSequence
42 experimental datasets, 712218 variants in 34 proteins and RNA, 108 experiments
Reference: Riesselman A, Ingraham J, Marks D, Deep generative models of genetic variation capture the effects of mutations, Nat Methods;15(10):816-822. doi: 10.1038/s41592-018-0138-4. PUBMED
Dataset 2
Dataset for fuNTRp
Data of training 11130 substitutions in 822 amino acids in five proteins. Test data for three proteins 11807 variants
Reference: Miller M, Vitale D, Kahn P, Rost B, Bromberg Y, funtrp: identifying protein positions for variation driven functional tuning, Nucleic Acids Res;47(21):e142. doi: 10.1093/nar/gkz818. PUBMED
Dataset 3
Dataset for functional effects
Deep mutational scanning data sets, 9 data sets
Reference: Reeb J, Wirth T, Rost B, Variant effect predictions capture some aspects of deep mutational scanning experiments, BMC Bioinformatics;21(1):107. doi: 10.1186/s12859-020-3439-4. PUBMED
Dataset 4
Analysis of deep mutational landscape
28 deep mutational scanning studies, variants in 6321 positions in 30 proteins.
Reference: Dunham A, Beltrao P, Exploring amino acid functions in a deep mutational landscape, Mol Syst Biol;17(7):e10305. doi: 10.15252/msb.202110305. PUBMEDDataset 5
Dataset for pathogenic variant benchmarking
Reference: Livesey B, Marsh J, Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations, Mol Syst Biol . 2020 Jul;16(7):e9380. doi: 10.15252/msb.20199380. PUBMED
Dataset 6
Dataset for LacI variants
102 variants in 12 positions, 4303 variants in 52 positions
Reference: M Miller, Y Bromberg, L Swint-Kruse, Computational predictors fail to identify amino acid substitution effects at rheostat positions, Sci Rep . 2017 Jan 30;7:41329. doi: 10.1038/srep41329. PUBMED
Dataset 7
Neutral positions in liver pyruvate kinase
117 variants in nine positions
Reference:
Martin T, Wu T, Tang Q, Dougherty L, Parente D, Swint-Kruse L, Identification of biochemically neutral positions in liver pyruvate kinase, Proteins
. 2020 Oct;88(10):1340-1350. doi: 10.1002/prot.25953. PUBMED
Last updated: 2022-02-21 by Niloofar Shirvanizadeh.