A benchmark database for variations


Home | Instructions | Datasets | Citing | Disclaimer |


Synonymous and unsense variants

Dataset of synonymous and RNA structure and transcription affecting variants (misleadingly called synonymous or silent)

DATASET 1

Dataset used for dbDSM

2021 synonymous disease-associaed amino acid substitutions.

    F1

Reference: Wen P, Xiao P, Xia J, dbDSM: a manually curated database for deleterious synonymous mutations, Bioinformatics, Volume 32, Issue 12, 15 June 2016, Pages 1914–1916, https://doi.org/10.1093/bioinformatics/btw086.  PUBMED  

DATASET 2

Dataset used for IDSV

600 synonymous sSNVs in training dataset F1 and 5331 sSNVs in independent test set.

    F1,     F2

Reference: Shi F, Yao Y, Bin Y, Zheng C, Xia J, Computational identification of deleterious synonymous variants in human genomes using a feature-based approach, BMC Medical Genomics, 2019, Jan, 12(1), pages 12, doi : 10.1186/s12920-018-0455-6..  PUBMED  

DATASET 3

Dataset for Silva

A dataset of 33 rare (allele frequency <5%) synonymous variants according to the criteria: they have been implicated in a disorder and experimentally validated to affect splicing, transcript abundance, mRNA stability or translational efficiency.

    F1

Reference: Buske O, Manickaraj A, Mital S, Ray P, Brudno M, Identification of deleterious synonymous variants in human genomes,Bioinformatics, 2013, 29(15):1843-50, doi: 10.1093/bioinformatics/btt308.  PUBMED  

DATASET 4

Dataset for TraP

F1 contains 401 de novo synonymous benign variants within the consensus coding sequence (CCDS) and identified from individuals not ascertained for any specific disorder. F2 contains 97 de novo variations from obsessive-compulsive disorder (OCD) dataset consisting of 436 OCD family trios. F3 contains 97 de novo synonymous variants for Epi4K de novo variations.

    F1     F2     F3

Reference: Gelfman S, Wang Q, McSweeney M, Ren Z, Carpia F, Halvorsen M, Schoch K, Ratzon F, Heinzen E, Boland M, Petrovski S, Goldstein D, Annotating pathogenic non-coding variants in genic regions, 2017,8(1):236. doi: 10.1038/s41467-017-00141-2.  PUBMED  

DATASET 5

Dataset for usDSM

Training data of 1201 deleterious, 238158 neutral variants. Test set 96 deleterious, 2348 benign variants. Test set of 30 deleterious variants, 5025 benign variants.

F1 contains deleterious and benign mutations (full version) F2 contains deleterious and benign variations (undersampling version) F3 contains deleterious and benign variations (full version) F4 contains deleterious and benign variations (undersampling version) F5 contains deleterious and benign variations of the second test dataset

    Train data( F1     F2)     Test data(F3     F4     F5)

Reference: Tang X, Zhang T, Cheng N, Wang H, Zheng C, Xia J, Zhang T, usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme, 2021 Sep 2;22(5):bbab123. doi: 10.1093/bib/bbab123.  PUBMED  

DATASET 6

Dataset for ensemble predictor

F contians 243 pathogenic and 243 benign variants

    F

Reference: Ganakammal S, Alexov E, An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants, 2020;11(9):1102. doi: 10.3390/genes11091102.  PUBMED  

DATASET 7

Dataset for predictor review

1048575 observed and generated variants

    F1

Reference: Zeng Z, Bromberg Y, Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives, Front Genet.7;10:914. doi: 10.3389/fgene.2019.00914. eCollection 2019.  PUBMED  


Last updated: 2021-02-07 by Niloofar Shirvanizadeh.