VariBench

Synonymous and unsense variants

Dataset of synonymous and RNA structure and transcription affecting variants (misleadingly called synonymous or silent)

DATASET 1

Dataset used for dbDSM

2021 synonymous disease-associaed amino acid substitutions.

Reference: Wen P, Xiao P, Xia J, dbDSM: a manually curated database for deleterious synonymous mutations, Bioinformatics, Volume 32, Issue 12, 15 June 2016, Pages 1914–1916, https://doi.org/10.1093/bioinformatics/btw086. PUBMED

DATASET 2

Dataset used for IDSV

600 synonymous sSNVs in training dataset F1 and 5331 sSNVs in independent test set.

F1,

Reference: Shi F, Yao Y, Bin Y, Zheng C, Xia J, Computational identification of deleterious synonymous variants in human genomes using a feature-based approach, BMC Medical Genomics, 2019, Jan, 12(1), pages 12, doi : 10.1186/s12920-018-0455-6.. PUBMED

DATASET 3

Dataset for Silva

A dataset of 33 rare (allele frequency <5%) synonymous variants according to the criteria: they have been implicated in a disorder and experimentally validated to affect splicing, transcript abundance, mRNA stability or translational efficiency.

Reference: Buske O, Manickaraj A, Mital S, Ray P, Brudno M, Identification of deleterious synonymous variants in human genomes,Bioinformatics, 2013, 29(15):1843-50, doi: 10.1093/bioinformatics/btt308. PUBMED

DATASET 4

Dataset for TraP

F1 contains 401 de novo synonymous benign variants within the consensus coding sequence (CCDS) and identified from individuals not ascertained for any specific disorder. F2 contains 97 de novo variations from obsessive-compulsive disorder (OCD) dataset consisting of 436 OCD family trios. F3 contains 97 de novo synonymous variants for Epi4K de novo variations.

Reference: Gelfman S, Wang Q, McSweeney M, Ren Z, Carpia F, Halvorsen M, Schoch K, Ratzon F, Heinzen E, Boland M, Petrovski S, Goldstein D, Annotating pathogenic non-coding variants in genic regions, 2017,8(1):236. doi: 10.1038/s41467-017-00141-2. PUBMED

DATASET 5

Dataset for usDSM

Training data of 1201 deleterious, 238158 neutral variants. Test set 96 deleterious, 2348 benign variants. Test set of 30 deleterious variants, 5025 benign variants.

F1 contains deleterious and benign mutations (full version) F2 contains deleterious and benign variations (undersampling version) F3 contains deleterious and benign variations (full version) F4 contains deleterious and benign variations (undersampling version) F5 contains deleterious and benign variations of the second test dataset

Reference: Tang X, Zhang T, Cheng N, Wang H, Zheng C, Xia J, Zhang T, usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme, 2021 Sep 2;22(5):bbab123. doi: 10.1093/bib/bbab123. PUBMED

DATASET 6

Dataset for ensemble predictor

F contians 243 pathogenic and 243 benign variants

Reference: Ganakammal S, Alexov E, An Ensemble Approach to Predict the Pathogenicity of Synonymous Variants, 2020;11(9):1102. doi: 10.3390/genes11091102. PUBMED

DATASET 7

Dataset for predictor review

1048575 observed and generated variants

Reference: Zeng Z, Bromberg Y, Predicting Functional Effects of Synonymous Variants: A Systematic Review and Perspectives, Front Genet.7;10:914. doi: 10.3389/fgene.2019.00914. eCollection 2019. PUBMED

Last updated: 2021-02-07 by Niloofar Shirvanizadeh.

A benchmark database for variations