Citing PON-All

  1. YANG Y, SHAO A, VIHINEN M. PON-All: Amino Acid Substitution Tolerance Predictor for All Organisms [J]. Frontiers in Molecular Biosciences,  2022, 9.(https://www.frontiersin.org/articles/10.3389/fmolb.2022.867572/full)

Start prediction

To start using PON-All, you just need to provide protein sequence and amino acid substitution information. For protein sequences, there are two ways to input: directly by providing FASTA sequences or list of IDs( GI, Ensemble ID or UniProt). These data can be written or pasted to the boxes in the input forms or uploaded as a file. Both types allow the submission of variants in multiple queries simultaneously.

In protein predict, this routine predicts all the 19 possible single amino acid substitutions.

To look for an input example, please click the "Example" text on the input page.

Input FASTA sequences

If complete FASTA sequence(s) is available, you can paste it to the input FASTA sequences box. FASTA sequence(s) and amino acid substitution(s) must be provided, e-mail is optional. If an email is provided, the results will be sent to you by email when ready.

  1. FASTA sequences have to contain a header line starting with greater than sign (>) followed by amino acids sequence. Amino acids sequence has to start from a new line.
  2. Information for amino acid substitutions has to contain the same header line as the sequence. An amino acid substitution consists of three parts in HGVS format: original amino acid, position, and new amino acid. For example, "A2M" means the second amino acid A (alanine) is substituted by M (methionine). Use single letter amino acid codes. Each protein sequence can contain multiple amino acid substitutions, each one indicated in a different line.

Example

  1. FASTA Sequence
  2. >Q13563
    MVNSSRVQPQQPGDAKRPPAPRAPDPGRLMAGCAAVGASLAAPGGLCEQRGLEIEMQRIRQAAARDPPAGAAASPSPPLSSCSRQAWSRDNPGFEAEEEEEEVEGEEGGMVVEMDVEWRPGSRRSAASSAVSSVGARSRGLGGYHGAGHPSGRRRRREDQGPPCPSPVGGGDPLHRHLPLEGQPPRVAWAERLVRGLRGLWGTRLMEESSTNREKYLKSVLRELVTYLLFLIVLCILTYGMMSSNVYYYTRMMSQLFLDTPVSKTEKTNFKTLSSMEDFWKFTEGSLLDGLYWKMQPSNQTEADNRSFIFYENLLLGVPRIRQLRVRNGSCSIPQDLRDEIKECYDVYSVSSEDRAPFGPRNGTAWIYTSEKDLNGSSHWGIIATYSGAGYYLDLSRTREETAAQVASLKKNVWLDRGTRATFIDFSVYNANINLFCVVRLLVEFPATGGVIPSWQFQPLKLIRYVTTFDFFLAACEIIFCFFIFYYVVEEILEIRIHKLHYFRSFWNCLDVVIVVLSVVAIGINIYRTSNVEVLLQFLEDQNTFPNFEHLAYWQIQFNNIAAVTVFFVWIKLFKFINFNRTMSQLSTTMSRCAKDLFGFAIMFFIIFLAYAQLAYLVFGTQVDDFSTFQECIFTQFRIILGDINFAEIEEANRVLGPIYFTTFVFFMFFILLNMFLAIINDTYSEVKSDLAQQKAEMELSDLIRKGYHKALVKLKLKKNTVDDISESLRQGGGKLNFDELRQDLKGKGHTDAEIEAIFTKYDQDGDQELTEHEHQQMRDDLEKEREDLDLDHSSLPRPMSSRSFPRSLDDSEEDDDEDSGHSSRRRGSISSGVSYEEFQVLVRRVDRMEHSIGSIVSKIDAVIVKLEIMERAKLKRREVLGRLLDGVAEDERLGRDSEIHREQMERLVREELERWESDDAASQISHGLGTPVGLNGQPRPRSSRPSSSQSTEGMEGAGGNGSSNVHV
    
    >Q99972
    MRFFCARCCSFGPEMPAVQLLLLACLVWDVGARTAQLRKANDQSGRCQYTFSVASPNESSCPEQSQAMSVIHNLQRDSSTQRLDLEATKARLSSLESLLHQLTLDQAARPQETQEGLQRELGTLRRERDQLETQTRELETAYSNLLRDKSVLEEEKKRLRQENENLARRLESSSQEVARLRRGQCPQTRDTARAVPPGSREVSTWNLDTLAFQELKSELTEVPASRILKESPSGYLRSGEGDTGCGELVWVGEPLTLRTAETITGKYGVWMRDPKPTYPYTQETTWRIDTVGTDVRQVFEYDLISQFMQGYPSKVHILPRPLESTGAVVYSGSLYFQGAESRTVIRYELNTETVKAEKEIPGAGYHGQFPYSWGGYTDIDLAVDEAGLWVIYSTDEAKGAIVLSKLNPENLELEQTWETNIRKQSVANAFIICGTLYTVSSYTSADATVNFAYDTGTGISKTLTIPFKNRYKYSSMIDYNPLEKKLFAWDNLNMVTYDIKLSKM
    
  3. Amino Acid Substitution
  4. >Q13563
    R322W
    R440S
    >Q99972
    Q48H
    

Input Protein IDs

PON-All also accepts sequence IDs. Similar to FASTA Protein IDs, amino acid substitutions and types of IDs must be provided, and email is optional. If an email is provided, the results will be sent to you by email when ready.

The IDs should be preceded by greater than sign (>). After that, provide amino acid substitutions starting from the next line. All the variants in a protein in a single list. After that, details for another sequence can be provided. Substitutions are provided in the HGVS format.

Example

  1. ID and amino acid substitution
  2. >P05062
    A338V
    C135R
  3. ID Type
  4. UniProtKB/Swiss-Prot ID
    

Protein Prediction

Protein prediction will predict all the 19 possible single substitutions. Only one protein can be provided at a time. Provide the sequences either in FASTA format or use an ID (GI, Ensemble ID or UniProt). It will take a long time, please be patient and the result will be sent to email.

Prediction results

PON-All provides results on separate web pages and if e-mail is provided they are mailed to the submitter. For each submission, which is called "Task" and for each amino acid substitution "Record" there will be detailed pages.

Data sets

Extensive data mining was performed for obtaining cases for training and testing.  download
Variations were collected from several sources. The numbers in next Table indicate how many proteins and variants there were, respectively, separated by a slash sign.

Dataset 10CV BlindTest Total
pathogenic neutral total pathogenic neutral total pathogenic neutral total
Human 2173/17504 12141/23600 13383/41104 170/1980 669/1967 740/3926 2343/19484 12810/25567 14123/45030
Animal 117/162 116/144 232/306 109/155 125/169 233/324 226/317 241/313 465/630
Plant 913/2601 629/1562 1150/4163 228/736 152/374 288/1110 1141/3337 781/1936 1438/5273
Total 3203/20267 12886/25306 14765/45573 507/2871 946/2510 1261/5360 3710/23138 13832/27816 16026/50933

Performance of PON-All

We compared the performance of PON-All to several widely used generic variant tolerance predictors. The compared tools included CADD (Kircher et al., 2014), FATHMM (Rogers et al., 2018), MetaLR and MetaSVM (Dong et al., 2015), MutationTaster (Schwarz et al., 2014), PolyPhen2 (Adzhubei et al., 2010), PON-P2 (Niroula et al., 2015), PROVEAN (Choi et al., 2012) and SIFT (Vaser et al., 2016).

Measure PON-All wGO PON-All woGO PON-P2 Sift4G Polyphen2 MutationTaster FATHMM PROVEAN MetaSVM MetaLR CADD_10* CADD_15* CADD_20*
TP 1274 789 831 1391 1530 1544 1113 1364 1239 1234 1630 1599 1545
TN 1421 1052 1032 1197 1003 1044 1472 1320 1651 1639 498 710 1020
FP 138 148 141 591 790 771 323 490 166 178 1319 1107 797
FN 94 154 132 288 149 135 563 313 440 445 49 80 134
PPV 0.902 0.842 0.855 0.702 0.659 0.667 0.775 0.736 0.882 0.874 0.553 0.591 0.660
NPV 0.938 0.872 0.887 0.806 0.871 0.885 0.723 0.808 0.790 0.786 0.910 0.899 0.884
TPR 0.931 0.837 0.863 0.828 0.911 0.920 0.664 0.813 0.738 0.735 0.971 0.952 0.920
TNR 0.911 0.877 0.880 0.669 0.559 0.575 0.820 0.729 0.909 0.902 0.274 0.391 0.561
ACC 0.921 0.859 0.872 0.746 0.730 0.741 0.745 0.770 0.827 0.822 0.609 0.660 0.734
MCC 0.841 0.714 0.742 0.503 0.500 0.523 0.491 0.543 0.659 0.649 0.337 0.410 0.512
OPM 0.781 0.63 0.661 0.423 0.416 0.436 0.414 0.459 0.570 0.559 0.291 0.341 0.426
Coverage 0.746 0.546 0.544 0.883 0.884 0.890 0.884 0.888 0.890 0.890 0.890 0.890 0.890

* For CADD, 10,15 and 20 are three common thresholds.

Mirror website

Contact

If you have any problems, please contact Aibin Shao (20194227016@stu.suda.edu.cn).