VariBench_logo

A benchmark database for variations


Home | Instructions | Datasets | Citing | Disclaimer |


Phenotype dataset

DATASET 1

Dataset for PON-PS

This dataset consists of amino acid substitutions that lead to severe or non-severe disease phenotypes. The variations were collected from the published literature and several databases. The training dataset consists of 885 mild, 463 moderate, and 1179 severe disease-causing amino acid substitutions in 83 proteins. The test dataset consists of 143 mild, 38 moderate, and 220 severe disease-causing amino acid substitutions in 8 proteins.

    PON-PS training and test datasets

Reference: Niroula A, Vihinen M. 2017. Predicting severity of disease-causing variants. Hum Mutat 38(4):357-364.  PUBMED  

DATASET 2

Benchmark datasets of VusPrize:

F1 contains variants classified as VUS in ClinVar on August 08th 2020. F2 contains variants classified as Pathogenic in ClinVar on August 08th 2020. F3 contains variants that were VUS but had been reclassified as Pathogenic with at least two gold stars in ClinVar on August 08th 2020. F4 contains variants that were VUS but had been reclassified as Pathogenic with at least two gold stars in ClinVar on August 08th 2020. F5 contains classified as Benign in ClinVar on August 08th 2020

    F1      F2      F3      F4      F5

Reference: Mahecha D, Nuñez H, Lattig M, Duitama J, Machine learning models for accurate prioritization of variants of uncertain significance. Hum Mutat . 2022 Apr;43(4):449-460. doi: 10.1002/humu.24339.   PUBMED  


Last updated: 2019-04-09 by Niloofar Shirvanizadeh.