To start using PON-IDR, provide UniProt IDs and amino acid substitution information. For protein sequences, there are two ways to input: directly by providing sequences in Fasta format or list of IDs (UniProt). These data can be written or pasted to the boxes in the input forms or uploaded as a file. Both types allow the submission of variants in multiple queries simultaneously. To look for an input example, please click the "Example" text on the input page.
FASTA sequences have to contain a header line starting with greater than sign (>) followed by amino acids sequence. Amino acids sequence has to start from a new line.
Information for amino acid substitutions has to contain the same header line as the sequence. An amino acid substitution consists of three parts in HGVS format: original amino acid, position, and new amino acid. For example, "A61T" means 61st amino acid A (alanine) is substituted by T (threonine). Use single letter amino acid codes. Each protein sequence can contain multiple amino acid substitutions, each one indicated in a different line.
>P06400
MPPKTPRKTAATAAAAAAEPPAPPPPPPPEEDPEQDSGPEDLPLVRLEFEETEEPDFTALCQKLKIPDHVRERAWLTWEKVSSVDGVLGGYIQKKKELWGICIFIAAVDLDEMSFTFTELQKNIEISVHKFFNLLKEIDTSTKVDNAMSRLLKKYDVLFALFSKLERTCELIYLTQPSSSISTEINSALVLKVSWITFLLAKGEVLQMEDDLVISFQLMLCVLDYFIKLSPPMLLKEPYKTAVIPINGSPRTPRRGQNRSARIAKQLENDTRIIEVLCKEHECNIDEVKNVYFKNFIPFMNSLGLVTSNGLPEVENLSKRYEEIYLKNKDLDARLFLDHDKTLQTDSIDSFETQRTPRKSNLDEEVNVIPPHTPVRTVMNTIQQLMMILNSASDQPSENLISYFNNCTVNPKESILKRVKDIGYIFKEKFAKAVGQGCVEIGSQRYKLGVRLYYRVMESMLKSEEERLSIQNFSKLLNDNIFHMSLLACALEVVMATYSRSTSQNLDSGTDLSFPWILNVLNLKAFDFYKVIESFIKAEGNLTREMIKHLERCEHRIMESLAWLSDSPLFDLIKQSKDREGPTDHLESACPLNLPLQNNHTAADMYLSPVRSPKKKGSTTRVNSTANAETQATSAFQTQKPLKSTSLSLFYKKVYRLAYLRLNTLCERLLSEHPELEHIIWTLFQHTLQNEYELMRDRHLDQIMMCSMYGICKVKNIDLKFKIIVTAYKDLPHAVQETFKRVLIKEEEYDSIIVFYNSVFMQRLKTNILQYASTRPPTLSPIPHIPRSPYKFPSSPLRIPGGNIYISPLKSPYKISEGLPTPTKMTPRSRILVSIGESFGTSEKFQKINQMVCNSDRVLKRSAEGSNPPKPLKKLRFDIEGSDEADGSKHLPGESKFQQKLAEMTSTRTRMQKQKMNDSMDTSNKEEK
>P06400
R358G
Y93C
PON-IDR accepts also sequence IDs.The IDs should be preceded by greater than sign (>). After that, provide amino acid substitutions starting from the next line. All the variants in a protein in a single list. After that, details for another sequence can be provided. Substitutions are provided in the HGVS format.
>P06400
R358G
Y93C
Dataset | pathogenic | benign | total |
---|---|---|---|
Training dataset | 544 | 464 | 1008 |
Blind Test dataset | 50 | 50 | 100 |
noIDR of PON-ALL test set | 2080 | 2267 | 4347 |
noIDR of ClinVar test set | 7167 | 2089 | 9976 |
The dataset can be download from here.
Training set
Blind test set
If you have any problems, please contact Wei Wang (20215227100@stu.suda.edu.cn).