VariBench_logo

A benchmark database for variations


Home | Instructions | Datasets | Citing | Disclaimer |


1. Variation datasets affecting protein tolerance

DATASET 1

Dataset of neutral single nucleotide polymorphisms

This is the neutral dataset or non synonymous coding SNV dataset comprising 23,683 human non synonymous coding SNVs with allele frequency >0.01 and chromosome sample count >49 from the dbSNP database build 131. This dataset was filtered for the disease-associated SNVs. The variant position mapping for this dataset was extracted from dbSNP database.

Download: Neutral_Dataset*

Download: Neutral_Dataset annotated with VariO**

Dataset of pathogenic single nucleotide polymorphisms

This is the pathogenic dataset of 19,335 amino acid substitutions obtained from the PhenCode database downloaded in June 2009), IDbases and from 18 individual LSDBs. For this dataset, the variations along with the variant position mappings to RefSeq protein (>=99% match), RefSeq mRNA and RefSeq genomic sequences are available. 

Download: Pathogenic_Dataset

Download: Pathogenic_Dataset annotated with VariO**

Reference: Thusberg J, Olatubosun A, Vihinen M. Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat. 2011, 32(4):358-68.   PUBMED  

* Last updated: 2017-07-06.

** Tab-delimited file, updated: 2013-11-12.