YANG Y, SHAO A, VIHINEN M. PON-All: Amino Acid Substitution Tolerance Predictor for All Organisms [J]. Frontiers in Molecular Biosciences, 2022, 9.(https://www.frontiersin.org/articles/10.3389/fmolb.2022.867572/full)
To start using PON-All, you just need to provide protein sequence and amino acid substitution information. For protein sequences, there are two ways to input: directly by providing FASTA sequences or list of IDs( GI, Ensemble ID or UniProt). These data can be written or pasted to the boxes in the input forms or uploaded as a file. Both types allow the submission of variants in multiple queries simultaneously.
In protein predict, this routine predicts all the 19 possible single amino acid substitutions.
To look for an input example, please click the "Example" text on the input page.
If complete FASTA sequence(s) is available, you can paste it to the input FASTA sequences box. FASTA sequence(s) and amino acid substitution(s) must be provided, e-mail is optional. If an email is provided, the results will be sent to you by email when ready.
>Q13563
MVNSSRVQPQQPGDAKRPPAPRAPDPGRLMAGCAAVGASLAAPGGLCEQRGLEIEMQRIRQAAARDPPAGAAASPSPPLSSCSRQAWSRDNPGFEAEEEEEEVEGEEGGMVVEMDVEWRPGSRRSAASSAVSSVGARSRGLGGYHGAGHPSGRRRRREDQGPPCPSPVGGGDPLHRHLPLEGQPPRVAWAERLVRGLRGLWGTRLMEESSTNREKYLKSVLRELVTYLLFLIVLCILTYGMMSSNVYYYTRMMSQLFLDTPVSKTEKTNFKTLSSMEDFWKFTEGSLLDGLYWKMQPSNQTEADNRSFIFYENLLLGVPRIRQLRVRNGSCSIPQDLRDEIKECYDVYSVSSEDRAPFGPRNGTAWIYTSEKDLNGSSHWGIIATYSGAGYYLDLSRTREETAAQVASLKKNVWLDRGTRATFIDFSVYNANINLFCVVRLLVEFPATGGVIPSWQFQPLKLIRYVTTFDFFLAACEIIFCFFIFYYVVEEILEIRIHKLHYFRSFWNCLDVVIVVLSVVAIGINIYRTSNVEVLLQFLEDQNTFPNFEHLAYWQIQFNNIAAVTVFFVWIKLFKFINFNRTMSQLSTTMSRCAKDLFGFAIMFFIIFLAYAQLAYLVFGTQVDDFSTFQECIFTQFRIILGDINFAEIEEANRVLGPIYFTTFVFFMFFILLNMFLAIINDTYSEVKSDLAQQKAEMELSDLIRKGYHKALVKLKLKKNTVDDISESLRQGGGKLNFDELRQDLKGKGHTDAEIEAIFTKYDQDGDQELTEHEHQQMRDDLEKEREDLDLDHSSLPRPMSSRSFPRSLDDSEEDDDEDSGHSSRRRGSISSGVSYEEFQVLVRRVDRMEHSIGSIVSKIDAVIVKLEIMERAKLKRREVLGRLLDGVAEDERLGRDSEIHREQMERLVREELERWESDDAASQISHGLGTPVGLNGQPRPRSSRPSSSQSTEGMEGAGGNGSSNVHV
>Q99972
MRFFCARCCSFGPEMPAVQLLLLACLVWDVGARTAQLRKANDQSGRCQYTFSVASPNESSCPEQSQAMSVIHNLQRDSSTQRLDLEATKARLSSLESLLHQLTLDQAARPQETQEGLQRELGTLRRERDQLETQTRELETAYSNLLRDKSVLEEEKKRLRQENENLARRLESSSQEVARLRRGQCPQTRDTARAVPPGSREVSTWNLDTLAFQELKSELTEVPASRILKESPSGYLRSGEGDTGCGELVWVGEPLTLRTAETITGKYGVWMRDPKPTYPYTQETTWRIDTVGTDVRQVFEYDLISQFMQGYPSKVHILPRPLESTGAVVYSGSLYFQGAESRTVIRYELNTETVKAEKEIPGAGYHGQFPYSWGGYTDIDLAVDEAGLWVIYSTDEAKGAIVLSKLNPENLELEQTWETNIRKQSVANAFIICGTLYTVSSYTSADATVNFAYDTGTGISKTLTIPFKNRYKYSSMIDYNPLEKKLFAWDNLNMVTYDIKLSKM
>Q13563
R322W
R440S
>Q99972
Q48H
PON-All also accepts sequence IDs. Similar to FASTA Protein IDs, amino acid substitutions and types of IDs must be provided, and email is optional. If an email is provided, the results will be sent to you by email when ready.
The IDs should be preceded by greater than sign (>). After that, provide amino acid substitutions starting from the next line. All the variants in a protein in a single list. After that, details for another sequence can be provided. Substitutions are provided in the HGVS format.
>P05062
A338V
C135R
UniProtKB/Swiss-Prot ID
Protein prediction will predict all the 19 possible single substitutions. Only one protein can be provided at a time. Provide the sequences either in FASTA format or use an ID (GI, Ensemble ID or UniProt). It will take a long time, please be patient and the result will be sent to email.
PON-All provides results on separate web pages and if e-mail is provided they are mailed to the submitter. For each submission, which is called "Task" and for each amino acid substitution "Record" there will be detailed pages.
Extensive data mining was performed for obtaining cases for training and testing.
download
Variations were collected from several sources. The numbers in next Table indicate how many proteins and variants there were, respectively, separated by a slash sign.
Dataset
10CV
BlindTest
Total
pathogenic
neutral
total
pathogenic
neutral
total
pathogenic
neutral
total
Human
2173/17504
12141/23600
13383/41104
170/1980
669/1967
740/3926
2343/19484
12810/25567
14123/45030
Animal
117/162
116/144
232/306
109/155
125/169
233/324
226/317
241/313
465/630
Plant
913/2601
629/1562
1150/4163
228/736
152/374
288/1110
1141/3337
781/1936
1438/5273
Total
3203/20267
12886/25306
14765/45573
507/2871
946/2510
1261/5360
3710/23138
13832/27816
16026/50933
We compared the performance of PON-All to several widely used generic variant tolerance predictors. The compared tools included CADD (Kircher et al., 2014), FATHMM (Rogers et al., 2018), MetaLR and MetaSVM (Dong et al., 2015), MutationTaster (Schwarz et al., 2014), PolyPhen2 (Adzhubei et al., 2010), PON-P2 (Niroula et al., 2015), PROVEAN (Choi et al., 2012) and SIFT (Vaser et al., 2016).
Measure
PON-All wGO
PON-All woGO
PON-P2
Sift4G
Polyphen2
MutationTaster
FATHMM
PROVEAN
MetaSVM
MetaLR
CADD_10*
CADD_15*
CADD_20*
TP
1274
789
831
1391
1530
1544
1113
1364
1239
1234
1630
1599
1545
TN
1421
1052
1032
1197
1003
1044
1472
1320
1651
1639
498
710
1020
FP
138
148
141
591
790
771
323
490
166
178
1319
1107
797
FN
94
154
132
288
149
135
563
313
440
445
49
80
134
PPV
0.902
0.842
0.855
0.702
0.659
0.667
0.775
0.736
0.882
0.874
0.553
0.591
0.660
NPV
0.938
0.872
0.887
0.806
0.871
0.885
0.723
0.808
0.790
0.786
0.910
0.899
0.884
TPR
0.931
0.837
0.863
0.828
0.911
0.920
0.664
0.813
0.738
0.735
0.971
0.952
0.920
TNR
0.911
0.877
0.880
0.669
0.559
0.575
0.820
0.729
0.909
0.902
0.274
0.391
0.561
ACC
0.921
0.859
0.872
0.746
0.730
0.741
0.745
0.770
0.827
0.822
0.609
0.660
0.734
MCC
0.841
0.714
0.742
0.503
0.500
0.523
0.491
0.543
0.659
0.649
0.337
0.410
0.512
OPM
0.781
0.63
0.661
0.423
0.416
0.436
0.414
0.459
0.570
0.559
0.291
0.341
0.426
Coverage
0.746
0.546
0.544
0.883
0.884
0.890
0.884
0.888
0.890
0.890
0.890
0.890
0.890
If you have any problems, please contact Aibin Shao (20194227016@stu.suda.edu.cn).