| Home | Instructions | Datasets | Citing | Disclaimer | |
A. Generic protein-protein interactions
DATASET 1
Dataset for mmCSM-PPI
1344 entries, 12 neutral, 347 binding affinty increasing and 985 decreasing variants.
Reference: Rodrigues C, Pires D, Ascher D, mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions, Nucleic Acids Res;49(W1):W417-W424. doi: 10.1093/nar/gkab273. PUBMED
DATASET 2
Dataset for CC/PBSA
582 variations in 7 proteins.
Reference: Benedix A, Becker C, Groot B, Caflisch A, Böckmann R, Predicting free energy changes using structural ensembles, Nat Methods.:3-4. doi: 10.1038/nmeth0109-3. PUBMED
DATASET 3
Dataset for prediction of effects on protein-protein binding affinity
Reference: Li M, Petukh M, Alexov E, Panchenko A, Predicting the Impact of Missense Mutations on Protein-Protein Binding Affinity,J Chem Theory Comput:1770-1780. doi: 10.1021/ct401022c. PUBMED
DATASET 4
Dataset for MutaBind
1925 substitutions in 80 protein–protein complexes
Reference: Li M, Simonetti F, Goncearenco A, Panchenko A, MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions, Nucleic Acids Res.(W1):W494-501. doi: 10.1093/nar/gkw374. PUBMED
DATASET 5
Dataset for BindProfX
1,402 variants in 116 proteins, 1,131 single-point variations, 195 double-point variations, and 76 three or higher-order variations
Reference: Zhang C, Zheng W, Zhang Y, BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts, J Mol Biol.;429(3):426-434. doi: 10.1016/j.jmb.2016.11.022. PUBMED
DATASET 6
Dataset for iSEE
1102 single variants in 57 protein. NM dataset of 19 variants. S487, 478 variants in 56 protein complexes. 33 variants in MDM2–p53 complex.
Reference: Geng C, Vangone A , Folkers G, Xue L, Bonvin A, iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations, Proteins:110-119. doi: 10.1002/prot.25630. PUBMED
DATASET 6
Dataset for mCSM-PPI2
F2, 4196 variants in 319 complexes. F1, 378 alanine scanning variations from SKEMPI2.
Reference: Rodrigues C, Myung Y, Pires D, Ascher D, mCSM-PPI2: predicting the effects of mutations on protein-protein interactions, Nucleic Acids Res.;47(W1):W338-W344. doi: 10.1093/nar/gkz383. PUBMED
DATASET 8
Dataset for MutaBind2
S4191, 4191 single variants in 265 protein complexes, M1707, 1707 multiple variants in 120 protein complexes.
Reference: Zhang N, Chen Y, Lu H, Zhao F, Alvarez R, Goncearenco A,Panchenko A, Li M, MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions, iScience.;23(3):100939. doi: 10.1016/j.isci.2020.100939. PUBMED
DATASET 9
Dataset for SSIPe
Training set of 1470 variants in 118 structures, training sets of 734 vriants in 59 structures and 888 variants in 86 structures (Data in Tables S6-S8) CAPRI dataset T55 285 variants in 15 positions, T56 285 variants in 15 positions (Data in Tables S9 and 10)
Reference: Huang X, Zheng W, Pearce R, Zhang Y, Huang X, Zheng W, Pearce R, Zhang Y, SSIPe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function, Bioinformatics;36(8):2429-2437. doi: 10.1093/bioinformatics/btz926. PUBMED
DATASET 10
Dataset for NetTree
Reference: Wang M, Cang Z, Wei G, A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation, Nat Mach Intell):116-123. doi: 10.1038/s42256-020-0149-6. PUBMED
DATASET 11
Dataset for ProAffiMuSeq
Training dataset of 1061 variants, test dataset of 112 variants.
Reference: Jemimah S, Sekijima M, Gromiha M, ProAffiMuSeq: sequence-based method to predict the binding free energy change of protein–protein complexes upon mutation using functional classification, Bioinformatics;36(6):1725-1730. doi: 10.1093/bioinformatics/btz829. PUBMED
DATASET 12
taset for ELASPIC2
Training dataset >250000 variants associated with stability, >50000 variants associated with protein binding affinity Test datasets, 3749 SARS-CoV2 spike protein variants, 3669 human ACE2 receptor variants
Reference: Strokach A, Lu T, Kim P, ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations, J Mol Biol;433(11):166810. doi: 10.1016/j.jmb.2021.166810. PUBMED
DATASET 13
Dataset for interaction networks in e-MutPath
Reference: Li Y, Burgman B, Khatri I, Pentaparthi S, Su Z, McGrail D, Li Y, Wu E, Eckhardt S, Sahni N, Yi S, e-MutPath: computational modeling reveals the functional landscape of genetic mutations rewiring interactome networks, Nucleic Acids Res. 2021 Jan 11;49(1):e2. doi: 10.1093/nar/gkaa1015. PUBMED
B. Antibody-antigen affinity changes
DATASET 1
Dataset for mCSM-AB
558 variants in 24 antibody-antigen complexes
Reference: Pires D, Ascher D, mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures, Nucleic Acids Res. doi: 10.1093/nar/gkw458. PUBMED
DATASET 2
Dataset for SiPMAB
212 amino acid substitutions
Reference: Sulea T, Vivcharuk V, Corbeil C, Deprez C, Purisima E, Assessment of Solvated Interaction Energy Function for Ranking Antibody-Antigen Binding Affinities, J Chem Inf Model;56(7):1292-303. doi: 10.1021/acs.jcim.6b00043. PUBMED
DATASET 3
Dataset for free energy perturbation method
Reference: Kuhn M, Firth-Clark S, Tosco P, Mey A, Mackey M, Michel J, Assessment of Binding Affinity via Alchemical Free-Energy Calculations, J Chem Inf Model;60(6):3120-3130. doi: 10.1021/acs.jcim.0c00165. PUBMED
DATASET 4
Test dataset for consensus predictor
34 variants in the complex of antiVEGF, 12 variants in complex of anti-MCP
Reference: Kurumida Y, Saito Y, Kameda T, Predicting antibody affinity changes upon mutations by combining multiple predictors, Sci Rep;10(1):19533. doi: 10.1038/s41598-020-76369-8. PUBMED
DATASET 5
Dataset for mCSM-AB2
Reference: Myung Y, Rodrigues C, Ascher D, Pires D, mCSM-AB2: guiding rational antibody design using graph-based signatures, Bioinformatics;36(5):1453-1459. doi: 10.1093/bioinformatics/btz779. PUBMED
C. Protein-nucleic acid interactions
DATASET 1
Dataset for mCSM-NA, 331 variants in 38 complexes
Reference: Pires D, Ascher D, mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions, Nucleic Acids Res;45(W1):W241-W246. doi: 10.1093/nar/gkx236. PUBMED
DATASET 2
Dataset for SAMPDI
104 amino acid substitutions in 13 proteins
Reference: Peng Y, Sun L, Jia Z, Li L, Alexov E, Predicting protein-DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver, Bioinformatics;34(5):779-786. doi: 10.1093/bioinformatics/btx698. PUBMED
DATASET 3
Dataset for PremPDI
219 single amino acid substitutions in 49 complexes
Reference: Zhang N, Chen Y, Zhao F, Yang Q, Simonetti F, Li M, PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions, PLoS Comput Biol;14(12):e1006615. doi: 10.1371/journal.pcbi.1006615. PUBMED
DATASET 4
Dataset for DeepClip
Reference: Grønning A, Doktor T, Larsen S, Petersen U, Holm L, Bruun G, Hansen M, Hartung A, Baumbach J, Andresen B, DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning, Nucleic Acids Res;48(13):7099-7118. doi: 10.1093/nar/gkaa530. PUBMED
DATASET 5
Dataset for iPNHOT
86 hot spot residues and 207 non-hot spot residues
Reference: Zhu X, Liu L, He J, Fang T, Yi Xiong, Julie C Mitchell, iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots, BMC Bioinformatics;21(1):289. doi: 10.1186/s12859-020-03636-w. PUBMED
DATASET 6
Dataset for SAMPDI-3D
Dataset S419 contains 147 disruptive and 272 non-disruptive variants, S200 contains 53 disruptive and 147 non-disruptive variants, D463 contains 149 disruptive and 314 non-disruptive variants, D101 contains 50 disruptive and 51 non-disruptive variants
Reference: Li G, Panday S, Peng Y, Alexov E, SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions, Bioinformatics;btab567. doi: 10.1093/bioinformatics/btab567. PUBMED
DATASET 7
Dataset in Nabe
2506 variants in 473 complexes
Reference: Liu J, Liu S, Liu C, Zhang Y, Pan Y, Wang Z, Wang J, Wen T, Deng L, Nabe: an energetic database of amino acid mutations in protein-nucleic acid binding interfaces, Database (Oxford);2021:baab050. doi: 10.1093/database/baab050. PUBMED
Last updated: 2022-02-20 by Niloofar Shirvanizadeh.