VariBench

Protein interaction

A. Generic protein-protein interactions

DATASET 1

Dataset for mmCSM-PPI

1344 entries, 12 neutral, 347 binding affinty increasing and 985 decreasing variants.

Reference: Rodrigues C, Pires D, Ascher D, mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions, Nucleic Acids Res;49(W1):W417-W424. doi: 10.1093/nar/gkab273. PUBMED

DATASET 2

Dataset for CC/PBSA

582 variations in 7 proteins.

Reference: Benedix A, Becker C, Groot B, Caflisch A, Böckmann R, Predicting free energy changes using structural ensembles, Nat Methods.:3-4. doi: 10.1038/nmeth0109-3. PUBMED

DATASET 3

Dataset for prediction of effects on protein-protein binding affinity

Reference: Li M, Petukh M, Alexov E, Panchenko A, Predicting the Impact of Missense Mutations on Protein-Protein Binding Affinity,J Chem Theory Comput:1770-1780. doi: 10.1021/ct401022c. PUBMED

DATASET 4

Dataset for MutaBind

1925 substitutions in 80 protein–protein complexes

Reference: Li M, Simonetti F, Goncearenco A, Panchenko A, MutaBind estimates and interprets the effects of sequence variants on protein-protein interactions, Nucleic Acids Res.(W1):W494-501. doi: 10.1093/nar/gkw374. PUBMED

DATASET 5

Dataset for BindProfX

1,402 variants in 116 proteins, 1,131 single-point variations, 195 double-point variations, and 76 three or higher-order variations

Reference: Zhang C, Zheng W, Zhang Y, BindProfX: Assessing Mutation-Induced Binding Affinity Change by Protein Interface Profiles with Pseudo-Counts, J Mol Biol.;429(3):426-434. doi: 10.1016/j.jmb.2016.11.022. PUBMED

DATASET 6

Dataset for iSEE

1102 single variants in 57 protein. NM dataset of 19 variants. S487, 478 variants in 56 protein complexes. 33 variants in MDM2–p53 complex.

Reference: Geng C, Vangone A , Folkers G, Xue L, Bonvin A, iSEE: Interface structure, evolution, and energy-based machine learning predictor of binding affinity changes upon mutations, Proteins:110-119. doi: 10.1002/prot.25630. PUBMED

DATASET 6

Dataset for mCSM-PPI2

F2, 4196 variants in 319 complexes. F1, 378 alanine scanning variations from SKEMPI2.

Reference: Rodrigues C, Myung Y, Pires D, Ascher D, mCSM-PPI2: predicting the effects of mutations on protein-protein interactions, Nucleic Acids Res.;47(W1):W338-W344. doi: 10.1093/nar/gkz383. PUBMED

DATASET 8

Dataset for MutaBind2

S4191, 4191 single variants in 265 protein complexes, M1707, 1707 multiple variants in 120 protein complexes.

Reference: Zhang N, Chen Y, Lu H, Zhao F, Alvarez R, Goncearenco A,Panchenko A, Li M, MutaBind2: Predicting the Impacts of Single and Multiple Mutations on Protein-Protein Interactions, iScience.;23(3):100939. doi: 10.1016/j.isci.2020.100939. PUBMED

DATASET 9

Dataset for SSIPe

Training set of 1470 variants in 118 structures, training sets of 734 vriants in 59 structures and 888 variants in 86 structures (Data in Tables S6-S8) CAPRI dataset T55 285 variants in 15 positions, T56 285 variants in 15 positions (Data in Tables S9 and 10)

Reference: Huang X, Zheng W, Pearce R, Zhang Y, Huang X, Zheng W, Pearce R, Zhang Y, SSIPe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function, Bioinformatics;36(8):2429-2437. doi: 10.1093/bioinformatics/btz926. PUBMED

DATASET 10

Dataset for NetTree

Reference: Wang M, Cang Z, Wei G, A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation, Nat Mach Intell):116-123. doi: 10.1038/s42256-020-0149-6. PUBMED

DATASET 11

Dataset for ProAffiMuSeq

Training dataset of 1061 variants, test dataset of 112 variants.

Reference: Jemimah S, Sekijima M, Gromiha M, ProAffiMuSeq: sequence-based method to predict the binding free energy change of protein–protein complexes upon mutation using functional classification, Bioinformatics;36(6):1725-1730. doi: 10.1093/bioinformatics/btz829. PUBMED

DATASET 12

taset for ELASPIC2

Training dataset >250000 variants associated with stability, >50000 variants associated with protein binding affinity Test datasets, 3749 SARS-CoV2 spike protein variants, 3669 human ACE2 receptor variants

Reference: Strokach A, Lu T, Kim P, ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations, J Mol Biol;433(11):166810. doi: 10.1016/j.jmb.2021.166810. PUBMED

DATASET 13

Dataset for interaction networks in e-MutPath

Reference: Li Y, Burgman B, Khatri I, Pentaparthi S, Su Z, McGrail D, Li Y, Wu E, Eckhardt S, Sahni N, Yi S, e-MutPath: computational modeling reveals the functional landscape of genetic mutations rewiring interactome networks, Nucleic Acids Res. 2021 Jan 11;49(1):e2. doi: 10.1093/nar/gkaa1015. PUBMED

B. Antibody-antigen affinity changes

DATASET 1

Dataset for mCSM-AB

558 variants in 24 antibody-antigen complexes

Reference: Pires D, Ascher D, mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures, Nucleic Acids Res. doi: 10.1093/nar/gkw458. PUBMED

DATASET 2

Dataset for SiPMAB

212 amino acid substitutions

Reference: Sulea T, Vivcharuk V, Corbeil C, Deprez C, Purisima E, Assessment of Solvated Interaction Energy Function for Ranking Antibody-Antigen Binding Affinities, J Chem Inf Model;56(7):1292-303. doi: 10.1021/acs.jcim.6b00043. PUBMED

DATASET 3

Dataset for free energy perturbation method

Reference: Kuhn M, Firth-Clark S, Tosco P, Mey A, Mackey M, Michel J, Assessment of Binding Affinity via Alchemical Free-Energy Calculations, J Chem Inf Model;60(6):3120-3130. doi: 10.1021/acs.jcim.0c00165. PUBMED

DATASET 4

Test dataset for consensus predictor

34 variants in the complex of antiVEGF, 12 variants in complex of anti-MCP

Reference: Kurumida Y, Saito Y, Kameda T, Predicting antibody affinity changes upon mutations by combining multiple predictors, Sci Rep;10(1):19533. doi: 10.1038/s41598-020-76369-8. PUBMED

DATASET 5

Dataset for mCSM-AB2

Reference: Myung Y, Rodrigues C, Ascher D, Pires D, mCSM-AB2: guiding rational antibody design using graph-based signatures, Bioinformatics;36(5):1453-1459. doi: 10.1093/bioinformatics/btz779. PUBMED

C. Protein-nucleic acid interactions

DATASET 1

Dataset for mCSM-NA, 331 variants in 38 complexes

Reference: Pires D, Ascher D, mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions, Nucleic Acids Res;45(W1):W241-W246. doi: 10.1093/nar/gkx236. PUBMED

DATASET 2

Dataset for SAMPDI

104 amino acid substitutions in 13 proteins

Reference: Peng Y, Sun L, Jia Z, Li L, Alexov E, Predicting protein-DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver, Bioinformatics;34(5):779-786. doi: 10.1093/bioinformatics/btx698. PUBMED

DATASET 3

Dataset for PremPDI

219 single amino acid substitutions in 49 complexes

Reference: Zhang N, Chen Y, Zhao F, Yang Q, Simonetti F, Li M, PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions, PLoS Comput Biol;14(12):e1006615. doi: 10.1371/journal.pcbi.1006615. PUBMED

DATASET 4

Dataset for DeepClip

Reference: Grønning A, Doktor T, Larsen S, Petersen U, Holm L, Bruun G, Hansen M, Hartung A, Baumbach J, Andresen B, DeepCLIP: predicting the effect of mutations on protein-RNA binding with deep learning, Nucleic Acids Res;48(13):7099-7118. doi: 10.1093/nar/gkaa530. PUBMED

DATASET 5

Dataset for iPNHOT

86 hot spot residues and 207 non-hot spot residues

Reference: Zhu X, Liu L, He J, Fang T, Yi Xiong, Julie C Mitchell, iPNHOT: a knowledge-based approach for identifying protein-nucleic acid interaction hot spots, BMC Bioinformatics;21(1):289. doi: 10.1186/s12859-020-03636-w. PUBMED

DATASET 6

Dataset for SAMPDI-3D

Dataset S419 contains 147 disruptive and 272 non-disruptive variants, S200 contains 53 disruptive and 147 non-disruptive variants, D463 contains 149 disruptive and 314 non-disruptive variants, D101 contains 50 disruptive and 51 non-disruptive variants

Reference: Li G, Panday S, Peng Y, Alexov E, SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions, Bioinformatics;btab567. doi: 10.1093/bioinformatics/btab567. PUBMED

DATASET 7

Dataset in Nabe

2506 variants in 473 complexes

Reference: Liu J, Liu S, Liu C, Zhang Y, Pan Y, Wang Z, Wang J, Wen T, Deng L, Nabe: an energetic database of amino acid mutations in protein-nucleic acid binding interfaces, Database (Oxford);2021:baab050. doi: 10.1093/database/baab050. PUBMED

Last updated: 2022-02-20 by Niloofar Shirvanizadeh.

A benchmark database for variations

Protein interaction