A benchmark database for variations


Home | Instructions | Datasets | Citing | Disclaimer |


RNA splicing

DATASET 1

Variation datasets affecting mRNA splice sites

13 MLH1 and 6 MSH2 gene variants.

mlh1 msh2 variants
Reference: Arnold S, Buchanan DD, Barker M, Jaskowski L, Walsh MD, Birney G, Woods MO, Hopper JL, Jenkins MA, Brown MA et al. Classifying MLH1 and MSH2 variants using bioinformatic prediction, splicing assays, segregation, and tumor characteristics.  Hum. Mutat. 2009, 30, 757-770.   PUBMED  


DATASET 2

DBASS3 and DBASS5

DBASS3 is a database with information on the human disease-causing mutation induced aberrant 3' splice sites. It contains currently 381 (191 in exons and 192 in introns). DBASS5 is a similar database for human disease-causing variation induced aberrant 5' splice sites. It contains 693 records (330 in exons and 363 in introns). Both the databases are regularly updated.

http://www.som.soton.ac.uk/research/geneticsdiv/dbass5/
http://www.som.soton.ac.uk/research/geneticsdiv/dbass3/

References

Buratti E, Chivers M, Kralovicova J, Romano M, Baralle M, Krainer AR, Vorechovsky I:Aberrant 5' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res. 2007, 35(13):4250-4263.   PUBMED  

Vorechovsky I. Aberrant 3' splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res. 2006, 34(16):4630-4641.   PUBMED  


DATASET 3

In silico prediction of splice-altering single nucleotide variants in the human genome

2959 single nucleotide variants within splicing consensus regions.

Supplementary_Table_S1-S6.xlsx
Reference: Jian, X., Boerwinkle, E., Liu, X., 2014. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 42: 13534-13544  PUBMED  


DATASET 4

BRCA1 and BRCA2 variants

BRCA1 and BRCA2 splice site study of 272 variants of unknown significance.

    F1     F2     F3     F4     F5     F6
Reference: Houdayer, C. et al., 2012. Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat. 33: 1228-1238  PUBMED  


DATASET 5

Variants influencing mRNA splicing

F1 contains 424 variants from dbSNP resulting in the usage of cryptic splice sites. F2 contains 57 exon skipping intron variants and 12 variants resulting in the usage of cryptic splice sites. F3 contains 15 exonic variations known to result in splicing defects. F4 contains 20 Exonic Splicing Enhancers (ESEs) and Exonic Splicing Silencers (ESSs).

    F1     F2     F3     F4
Reference: Desmet, F. O., Hamroun, D., Lalande, M., Collod-Béroud, G., Claustres, M., & Béroud, C. (2009). Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic acids research, 37(9), e67.   PUBMED  


DATASET 6

Dataset used for MutPred Splice

2354 putative disease-causing splice altering variants and 638 unseen test set of 352 variants (238 SAVs and 114 SNVs).

    F1     F2
Reference: Mort, M., Sterne-Weiler, T., Li, B., Ball, E.V., Cooper, David N. Radivojac, Predrag, Sanford, Jeremy R. , Mooney, Sean D. (2014) MutPred Splice: machine learning-based prediction of exonic variants that disrupt splicing, Genome Biology, 15(R19), IS - 1, DOI - 10.1186/gb-2014-15-1-r19.  PUBMED  


DATASET 7

Dataset used for ASSEDA

41 mRNA splice-altering variations, 8 mRNA splice-altering variations by qRT-PCR and 12 regulatory ESE/ISS variations altering mRNA splicing by exon definition analysis.

    F1     F2     F3
Reference: Mucaki, E.J., Shirley, B.C., Rogan, P.K., Prediction of Mutant mRNA Splice Isoforms by Information Theory‐Based Exon Definition, Volume34, Issue4, Pages 557-565, April 2013.  PUBMED  


DATASET 8

This dataset contains RB1 gene variants (31 intronic and eight exonic). There are 17 disruptions of the canonical AG/GT splice sites of the RB1 gene, 13 deleterious intronic, 6 exonic and 3 negative variants.

    F1     F2     F3     F4
Reference: Houdayer, C. , Dehainault, C. , Mattler, C. , Michaux, D. , Caux‐Moncoutier, V. , Pagès‐Berhouet, S. , d'Enghien, C. D., Laugé, A. , Castera, L. , Gauthier‐Villars, M. and Stoppa‐Lyonnet, D. (2008), Evaluation of in silico splice tools for decision‐making in molecular diagnosis. Hum. Mutat., 29: 975-982. doi:10.1002/humu.20765.  PUBMED  


DATASET 9

18 intronic variations in LDLR gene on pre-mRNA splicing.

    F1     F2
Reference: Holla, O.L., Nakken, S, Mattingsdal, M., Ranheim, T., Berge, KE, Defesche, JC, Leren, TP, Effects of intronic mutations in the LDLR gene on pre-mRNA splicing: Comparison of wet-lab and bioinformatics analyses, Molecular Genetics and Metabolism, Volume 96, Issue 4, 2009, Pages 245-252, ISSN 1096-7192, https://doi.org/10.1016/j.ymgme.2008.12.014.  PUBMED  


DATASET 10

Intronic variants, 29 splice-site prediction of intronic variants in BRCA1 and BRCA2 and 19 splice-site prediction of intronic variants in BRCA1.

    F1     F2     F3     F4
Reference: Vreeswijk, M. P., Kraan, J. N., van der Klift, H. M., Vink, G. R., Cornelisse, C. J., Wijnen, J. T., Bakker, E. , van Asperen, C. J. and Devilee, P. (2009), Intronic variants in BRCA1 and BRCA2 that affect RNA splicing can be reliably selected by splice‐site prediction programs. Hum. Mutat., 30: 107-114. doi:10.1002/humu.20811.  PUBMED  


DATASET 11

53 unclassified variants of the BRCA genes, 4 BRCA1 splice altering variants, 6 not splice altering variants and 5 BRCA2 splice altering variants.

    F1     F2     F3     F4     F5
Reference: Théry, J. C., Krieger, S., Gaildrat, P., Révillion, F., Buisine, M. P., Killian, A., Duponchel, C., Rousselin, A., Vaur, D., Peyrat, J. P., Berthet, P., Frébourg, T., Martins, A., Hardouin, A., … Tosi, M. (2011). Contribution of bioinformatics predictions and functional splicing assays to the interpretation of unclassified variants of the BRCA genes. European journal of human genetics : EJHG, 19(10), 1052-8.   PUBMED  


DATASET 12

24 unclassified variants at BRCA1 and BRCA2 splice sites.

    F1     F2     F3     F4     F5     F6     F7
Reference: Colombo, M., De Vecchi, G., Caleca, L., Foglia, C., Ripamonti, C. B., Ficarazzi, F., Barile, M., Varesco, L., Peissel, B., Manoukian, S., … Radice, P. (2013). Comparative in vitro and in silico analyses of variants in splicing regions of BRCA1 and BRCA2 genes and characterization of novel pathogenic mutations. PloS one, 8(2), e57173.   PUBMED  


DATASET 13

Variations in the first nucleotide position of exon in 39 AG-dependent splice sites. F1, F2, F3 contain exon border preserved test, borderline and evaluation sets. F4, F5, F6 contain exon border not preserved test, borderline and evaluation sets. F7 contains E+1 test borderline set. F8 contains splicing affecting set.

    F1     F2     F3     F4     F5     F6     F7     F8
Reference: Grodecká, L., Lockerová, P., Ravčuková, B., Buratti, E., Baralle, F. E., Dušek, L., & Freiberger, T. (2014). Exon first nucleotide mutations in splicing: evaluation of in silico prediction tools. PloS one, 9(2), e89570. doi:10.1371/journal.pone.0089570.  PUBMED  


DATASET 14

This dataset contains 222 pathogenic variations in F1 and 50 benign ones in F2 within consensus splice region of the major U2-type introns. 18 intronic variations in LDLR gene on pre-mRNA splicing.

    F1     F2
Reference: Tang, R., Prosser, D.O., Love, DR. Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions. Adv Bioinformatics. 2016;2016 5614058. doi:10.1155/2016/5614058. PMID: 27313609; PMCID PMC4894998.   PUBMED  


DATASET 15

Dataset for splice-altering variant prediction with scdbNSFP

Training data of 2959 variants and test set of 45 variants

    F
Reference: Jian, X., Boerwinkle, E., Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res. 2014 Dec 16; 42(22): 13534–13544. doi: 10.1093/nar/gku1206 PMID: 25416802; PMCID PMC4267638.   PUBMED  


DATASET 16

Dataset for EX-SKIP and HOT-SKIP

F1 contains 37 exon skipping and 37 control variants, F2 contains 12 CFTR exon inclusions and 42 investigated minigenes

    F1    F2
Reference: Prediction of single-nucleotide substitutions that result in exon skipping: identification of a splicing silencer in BRCA1 exon 6, Hum Mutat.;32(4):436-44. doi: 10.1002/humu.21458.  PUBMED  


DATASET 17

Dataset for SQUIRLS

    F
Reference: Danis, D., Jacobsen, J.O.B., Carmody, L.C et al,. Interpretable prioritization of splice variants in diagnostic next-generation sequencing, Am J Hum Genet;108(9):1564-1577. doi: 10.1016/j.ajhg.2021.06.014.  PUBMED  


DATASET 18

Dataset for cancer gene analysis

Discovery set 99 variants for HBOC and Lynch Syndrome, Validation set of 346 variants

    F
Reference: Bonache, S., Esteban, I., Moles-Fernández, A. et al,. Multigene panel testing beyond BRCA1/2 in breast/ovarian cancer Spanish families and clinical actionability of findings,J Cancer Res Clin Oncol;144(12):2495-2513. doi: 10.1007/s00432-018-2763-9.  PUBMED  


DATASET 19

Dataset for splice-altering variant prediction with scdbNSFP

Training data of 2959 variants and test set of 45 variants

    F
Reference: Jian, X., Boerwinkle, E., Liu, X. et al., In silico prediction of splice-altering single nucleotide variants in the human genome, Nucleic Acids Res;42(22):13534-44. doi: 10.1093/nar/gku1206.  PUBMED  


DATASET 20

Dataset for SPiCE

Training data 142 variants in BRCA1 and BRCA2, test set of 163 BRCA1 and BRCA2 variants, test set of 90 variants in other genes

    F
Reference: Leman, R., Gaildrat, P., Gac, G.L et al., Novel diagnostic tool for prediction of variant spliceogenicity derived from a set of 395 combined in silico/in vitro studies: an international collaborative effort, Nucleic Acids Res 2018;46(15):7913-7923. doi: 10.1093/nar/gky372.  PUBMED  


DATASET 21

Dataset for CADD-Splice

The files contains training data for GRCh38 in vcf.gz.tbi format. F1 contains human derived indel, F2 contains human derived SNV, F3 contains simulation indel, F4 contains simulation SNV

    F1     F2     F3     F4
Reference: Rentzsch, P., Schubach, M., Shendure, J., Kircher, M., CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores, Genome Med . 2021 Feb 22;13(1):31. doi: 10.1186/s13073-021-00835-9..  PUBMED  


Last updated: 2021-02-07 by Niloofar Shirvanizadeh.