############################################################################################################################# Manuscript Title: Evaluation of tools used to predict the impact of amino acid substitutions is hindered by two types of circularity ############################################################################################################################# Manuscripts Authors: Dominik G. Grimm, Chloe Agathe Azencott, Fabian Aicheler, Udo Gieraths, Daniel G. MacArthur, Kaitlin E. Samocha, David N. Cooper, Peter D. Stenson, Mark J. Daly, Jordan W. Smoller, Laramie E. Duncan, Karsten M. Borgwardt This Archive containes filtered versions of five publicly available benchmark datasets for pathognicity prediction: - filtered subset of HumVar (Adzhubei et al. 2010) - filtered subset of ExoVar (Li et al. 2013) - filtered subset of VariBench (Thusberg et al. 2011; Nair and Vihinen 2013) - filtered subset of predictSNP (Bendl et al. 2014) - filtered subset of SwissVar Dec. 2014 (Mottaz et al. 2010) ############################################### Header Explanation for Data files in ToolScores ############################################### The first row is the header! Each row contains one variant and the tool-scores and predicted labels for different tools Here is a description of the different coloumns: ------------------------------------------------ Column 1: True Label - The true label of this variant (1 = pathogenic, -1 = neutral) Column 2: #RS-ID - if available the rs identifier for this variant Column 3: CHR - chromosome at which the variant is located Column 4: Nuc-Pos - nucleotide position of the variant Column 5: REF-Nuc - the reference nucleotide Column 6: ALT-Nuc - the alternative nucleotide Column 7: MAF - minor allele frequence if available Column 8: Ensembl-Gene-ID - ensembl gene id for this variant Column 9: Ensembl-Protein-ID - ensembl protein if for this variant Column 10: Ensembl-Transcript-ID - ensemble transcript id for this variant Column 11: UniProt-Accession - UniProt accession id Column 12: AA-Pos - amino acid position on the transcript Column 13: REF-AA - the reference amino acid Column 14: ALT-AA - the alternative amino acid for this variant Column 15: MutationTaster - the score retrived from the MutationTaster2 website for this variant Column 16: MutationTaster Predicted Label for this variant Column 17: MutationAssessor - the score retrived from the MutationAssessor website for this variant Column 18: MutationAssessor Predicted Label for this variant Column 19: PolyPhen2 - the score retrived from the PolyPhen2 website for this variant Column 20: PolyPhen2 Predicted Label for this variant Column 21: CADD - the score retrived from the CADD website for this variant Column 22: SIFT - the score retrived from the SIFT website for this variant Column 23: SIFT Predicted Label for this variant Column 24: LRT - the score retrived from the LRT website for this variant Column 25: LRT Predicted Label for this variant Column 26: FatHMM-U - the score retrived from the FatHMM-U website for this variant Column 27: FatHMM-U Predicted Label for this variant Column 28: FatHMM-W - the score retrived from the FatHMM-W website for this variant Column 29: FatHMM-W weighting feature ln(Wd) Column 30: FatHMM-W weighting feature ln(Wn) Column 31: FatHMM-W Predicted Label for this variant Column 32: GERP++ - the GERP++ score Column 33: phyloP - the phyloP score Column 34: Condel (PP2 + MutationAssessor + SIFT) - the Condel score Column 35: Condel Predicted Label for this variant Column 36: Condel+ (PP2 + MutationAssessor + SIFT + FatHMM-W) - the Condel+ score Column 37: Condel+ Predicted Label for this variant Column 38: Logit (PP2 + MutationAssessor + SIFT) - the Logit score Column 39: Logit Predicted Label for this variant Column 40: Logit+ (PP2 + MutationAssessor + SIFT + FatHMM-W) - the Logit+ score Column 41: Logit+ Predicted Label for this variant