Exercises for bioinformatics.psc.edu:
Classification Libraries - Results
If there was a match between your sequence and the Pfam hidden Markov models, your output will contain the results from a global search (with results similar to the following:)
HMMER 2.3.2 (Oct 2003) Copyright (C) 1992-2003 HHMI/Washington University School of Medicine Freely distributed under the GNU General Public License (GPL) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - HMM file: /home/biomed/db/pfam/current/Pfam_ls Sequence file: carp_rhich.FASTA - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Query sequence: CARP_RHICH Accession: [none] Description: STANDARD; PRT; 393 AA., 393 bases, 6385C502 checksum. Scores for sequence family classification (score includes all domains): Model Description Score E-value N -------- ----------- ----- ------- --- Asp Eukaryotic aspartyl protease 544.9 6.7e-161 1 ApbE ApbE family -167.8 5.1 1 RNA_pol_Rpb8 RNA polymerase Rpb8 -71.9 5.2 1 Cystatin Cystatin domain -16.1 5.9 1 GTP_EFTU_D2 Elongation factor Tu domain 2 -9.6 6.2 1 Parsed for domains: Model Domain seq-f seq-t hmm-f hmm-t score E-value -------- ------- ----- ----- ----- ----- ----- ------- ApbE 1/1 5 312 .. 1 345 [] -167.8 5.1 Cystatin 1/1 29 141 .. 1 100 [] -16.1 5.9 Asp 1/1 79 391 .. 1 359 [] 544.9 6.7e-161 RNA_pol_Rpb8 1/1 88 177 .. 1 152 [] -71.9 5.2 GTP_EFTU_D2 1/1 273 372 .. 1 75 [] -9.6 6.2 Alignments of top-scoring domains: ApbE: domain 1 of 1, from 5 to 312: score -167.8, E = 5.1 *->lalaflallalglrAcraatkavsleGkaMGtlyrvrilgassaaea l++ + a++al +++ + a+ +k + +++ ++++sa CARP_RHICH 5 LISSCIAIAALAVAVDA-APG-----EKKISIPLAKNPNYKPSA--- 42 aeileevitreldrlErllSlyrkDSeLsrlNrna.gqpvavspelaelL + +++++i+ +++ ++ +N +++g+ + ++++ CARP_RHICH 43 KNAIQKAIA--------KYNKHK-------INTSTgGIVPDAGVGTVPMT 77 kesldfaekTgGaFDPTVGPlvnLWgfgfdsedkPptiPspdalke..al + + d+ +G +T+G + +++ fd+ s+d + +++ CARP_RHICH 78 DYGNDVEY--YGQ--VTIGTPGKKFNLDFDT-------GSSDLWIAstLC 116 alvGwkkleLsa....................neekvtlqkanPgmaLDL +++G ++++ +++++++ + ++++ + + +++ ++ l+k+n ++L CARP_RHICH 117 TNCGSRQTKYDPkqsstyqadgrtwsisygdgSSASGILAKDN----VNL 162 nG.iakGfAaDrlaelLe..aegienylVdlGGeiralGkrpeGrpWrVa +G + kG ++ +++++ a g ++l+ lG + ++ ++ CARP_RHICH 163 GGlLIKGQTIELAKREAAsfANGPNDGLLGLG--FDTITTVRG------- 203 irdPtdagegavgavidlrdrAv....aTSGpygryf...drdGkrfsHI +P d + +i++++ +v ++a++G g y+ +++++ k f+ CARP_RHICH 204 VKTPMDN--LISQGLISRPIFGVylgkASNGGGGEYIfggYDSTK-FK-- 248 lDPrTGRrPlehnSyPvrsVSViAptaaeADAlaTALf....vlgekksa + +T +P+ S + ++V ++a++ + +T++++ +++l+ CARP_RHICH 249 -GSLTT-VPIDN-SRGWWGITV--DRATV--GTSTVASsfdgILDTGT-- 289 riaalrevwAvlrliddgsvfaenlavlriekass<-* + +l + +A + a+++ +++ CARP_RHICH 290 TLLILPNNVA------------ASVARAYGASDNG 312 Cystatin: domain 1 of 1, from 29 to 141: score -16.1, E = 5.9 *->GglspaddNendpevqeaadfAvakyNeks.dgykfelv........ + + ++ + p ++ a +A+akyN++ ++++ +v++ + ++ CARP_RHICH 29 SIPLAKNP-NYKPSAKNAIQKAIAKYNKHKiNTSTGGIVpdagvgtv 74 ......evveaksQvVaGt.ltnYyievevgettCskeskkdledCplld + ++ ++ ve+ Qv Gt++ ++ +++++g ++ ++ ++C + CARP_RHICH 75 pmtdygNDVEYYGQVTIGTpGKKFNLDFDTGSSDLWI-ASTLCTNCGSR- 122 qpeeawegfCkfqvfkkpw<-* q++ +++ ++q++ + w CARP_RHICH 123 QTKYDPKQSSTYQADGRTW 141 Asp: domain 1 of 1, from 79 to 391: score 544.9, E = 6.7e-161 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx *->yldDaeYygtIsIGTPpQkFtVvFDTGSSDLWVPDsSvyCtssySaq y +D+eYyg+++IGTP++kF+++FDTGSSDLW+ S+ Ct+++ CARP_RHICH 79 YGNDVEYYGQVTIGTPGKKFNLDFDTGSSDLWIA--STLCTNCG--- 120 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx TniACkshgtFdPskSSTYkslGttIffsIsYGdGSSasGflgqDTVtvG ++++++dP++SSTY+++G+t + sIsYGdGSSasG+l++D+V++G CARP_RHICH 121 -----SRQTKYDPKQSSTYQADGRT-W-SISYGDGSSASGILAKDNVNLG 163 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx GisvtnQqFGlatkePGsfFvtavfDGILGlGfpsieavggssaftytpV G+ +++Q+++la++e++s F++ ++DG+LGlGf++i++v+g +++ CARP_RHICH 164 GLLIKGQTIELAKREAAS-FANGPNDGLLGLGFDTITTVRG-----VKTP 207 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx fdnlksQGlIdspaFSvYLNsddgsaqasgGeiiFGGvDpskYtGsltwv +dnl+sQGlI++p+F+vYL++ ++ ++gGe+iFGG+D++k++Gslt+v CARP_RHICH 208 MDNLISQGLISRPIFGVYLGKASN---GGGGEYIFGGYDSTKFKGSLTTV 254 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx pVtssddegdivsqgyWqitldsitvggsaCHttfcssGcqAIlDTGTsL p+++ s+g+W+it+d+ tvg s t+ ss +++IlDTGT+L CARP_RHICH 255 PIDN--------SRGWWGITVDRATVGTS----TVASS-FDGILDTGTTL 291 RF xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx lygPssivskiakavGAsese.GeYvvdCdsisslpdvtFfigGkkitVP l++P++++++ a+a+GAs++ +G+Y+++Cd++ +++ ++F+i+G++++V+ CARP_RHICH 292 LILPNNVAASVARAYGASDNGdGTYTISCDTS-RFKPLVFSINGASFQVS 340 RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx psayvlqnseggssPndiClsGfqssddipppggplwILGDvFLRsyYvV p+++v+++++g +C+ Gf + +++++ I+GD+FL+++YvV CARP_RHICH 341 PDSLVFEEYQG------QCIAGFGY------GNFDFAIIGDTFLKNNYVV 378 RF xxxxxxxxxxxxx FDrdNnrvGlApa<-* F+++ ++v++Ap+ CARP_RHICH 379 FNQGVPEVQIAPV 391 RNA_pol_Rpb8: domain 1 of 1, from 88 to 177: score -71.9, E = 5.2 *->dDIFkVksvDPDGkKydkVSRieAeSesldqMeLiLDINsqlYPlav +V+ P GkK++ + + +s d CARP_RHICH 88 ----QVTIGTP-GKKFN----LDFDTGSSD----------------- 108 gDkfrLviAstLNlEDgtaddgsatreynPtkaddrpsYLaDkYEYvMYG L iAstL + + gs y+P + +Y aD G CARP_RHICH 109 -----LWIASTL-----CTNCGSRQTKYDP--KQSS-TYQAD-------G 138 KvYriegDEtsiegktPklsLvYvSFGGLLMrLqGdarnLhgFelDsrlY i+ + +g + ++s G L + d nL g++ CARP_RHICH 139 RTWSIS---YG-DG-S-SAS-------GILAK---DNVNLGGLLIKGQTI 172 LLlrR<-* L +R CARP_RHICH 173 ELAKR 177 GTP_EFTU_D2: domain 1 of 1, from 273 to 372: score -9.6, E = 6.2 *->GtVatgRvesGtlkkGdeveiggngtkkyvpevtrVtslemfhg.ld Gt ++ ++G+l G ++ i++n + s+ + g d CARP_RHICH 273 GTSTVASSFDGILDTGTTLLILPN---------NVAASVARAYGaSD 310 ea................................vaGdnaGlivagiglk ++++ + + ++++ ++ + ++ + + ++++ v +++ G +ag g CARP_RHICH 311 NGdgtytiscdtsrfkplvfsingasfqvspdslVFEEYQGQCIAGFGYG 360 daa.ikrGdvla<-* + +Gd++ CARP_RHICH 361 NFDfAIIGDTFL 372 //
To find out more information about the matching families, note the name of the matching hidden Markov model file. In the example above, the best matching model (ie the one with the smallest E-value) was asp To look at the alignment of the matching family, go to: http://pfam.wustl.edu/browse.shtml and click on asp under section A.
The alignments shown can be interpreted as follows: a capital letter shows the dominate conserved residue within the hidden Markov Model while lower case letters indicate the most common residue in areas where the conservation is not as strong. The middle line also contains a + to indicate relatively good matches with the hidden Markov Model while a space indicates poor conservation.
If there are no matches in Pfam to your sequence, you will get no results.
PROSITE
The EMBOSS patmatmotifs program can search the ambiguous patterns in the PROSITE database. Patterns in prosite will either match, or not match your protein; Thus all matching sites should be investigated, especially the larger sites. Below is an example of output from searching PROSITE:
######################################## # Program: patmatmotifs # Rundate: Fri Jul 09 15:55:24 2004 # Report_format: dbmotif # Report_file: carp_rhich.ProSite-Out ######################################## #======================================= # # Sequence: CARP_RHICH from: 1 to: 393 # HitCount: 3 # # Full: Yes # Prune: Yes # Data_file: /home/biomed/emboss/share/EMBOSS/data/PROSITE/prosite.lines # #======================================= Length = 4 Start = position 94 of sequence End = position 97 of sequence Motif = AMIDATION VTIGTPGKKFNLDF | | 94 97 Length = 12 Start = position 100 of sequence End = position 111 of sequence Motif = ASP_PROTEASE GKKFNLDFDTGSSDLWIASTLC | | 100 111 Length = 12 Start = position 283 of sequence End = position 294 of sequence Motif = ASP_PROTEASE ASSFDGILDTGTTLLILPNNVA | | 283 294 #--------------------------------------- # # Motif: AMIDATION # Count: 1 # # ****************** # * Amidation site * # ****************** # # The precursor of hormones and other active peptides which are C-terminally # amidated is always directly followed [1,2] by a glycine residue which provides # the amide group, and most often by at least two consecutive basic residues # (Arg or Lys) which generally function as an active peptide precursor cleavage # site. Although all amino acids can be amidated, neutral hydrophobic residues # such as Val or Phe are good substrates, while charged residues such as Asp or # Arg are much less reactive. C-terminal amidation has not yet been shown to # occur in unicellular organisms or in plants. # # -Consensus pattern: x-G-[RK]-[RK] # [x is the amidation site] # -Last update: June 1988 / First entry. # # [ 1] Kreil G. # Meth. Enzymol. 106:218-223(1984). # [ 2] Bradbury A.F., Smyth D.G. # Biosci. Rep. 7:907-916(1987). # # +----------------------------------------------------------------------------+ # | This PROSITE entry is copyright by the Swiss Institute of Bioinformatics | # | (SIB). There are no restrictions on its use by non-profit institutions as | # | long as its content is in no way modified and this statement is not | # | removed. Usage by and for commercial entities requires a license agreement | # | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch). | # +----------------------------------------------------------------------------+ # # *************** # # Motif: ASP_PROTEASE # Count: 2 # # ***************************************************************** # * Eukaryotic and viral aspartyl proteases signature and profile * # ***************************************************************** # # Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely # distributed family of proteolytic enzymes [1,2,3] known to exist in # vertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate # proteases of eukaryotes are monomeric enzymes which consist of two domains. # Each domain contains an active site centered on a catalytic aspartyl residue. # The two domains most probably evolved from the duplication of an ancestral # gene encoding a primordial domain. Currently known eukaryotic aspartyl # proteases are: # # - Vertebrate gastric pepsins A and C (also known as gastricsin). # - Vertebrate chymosin (rennin), involved in digestion and used for making # cheese. # - Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34). # - Mammalian renin (EC 3.4.23.15) whose function is to generate angiotensin I # from angiotensinogen in the plasma. # - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18), candidapepsin # (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin # (EC 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizopuspepsin # (EC 3.4.23.21). # - Yeast saccharopepsin (EC 3.4.23.25) (proteinase A) (gene PEP4). PEP4 is # implicated in posttranslational regulation of vacuolar hydrolases. # - Yeast barrierpepsin (EC 3.4.23.35) (gene BAR1); a protease that cleaves # alpha-factor and thus acts as an antagonist of the mating pheromone. # - Fission yeast sxa1 which is involved in degrading or processing the mating # pheromones. # # Most retroviruses and some plant viruses, such as badnaviruses, encode for an # aspartyl protease which is an homodimer of a chain of about 95 to 125 amino # acids. In most retroviruses, the protease is encoded as a segment of a # polyprotein which is cleaved during the maturation process of the virus. It # is generally part of the pol polyprotein and, more rarely, of the gag # polyprotein. # # Conservation of the sequence around the two aspartates of eukaryotic aspartyl # proteases and around the single active site of the viral proteases allows us # to develop a single signature pattern for both groups of protease. A profile # was developed to specifically detect viral aspartyl proteases, which are # missed by the pattern. # # -Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]- # x-[LIVMFSTNC]-x-[LIVMFGTA] # [D is the active site residue] # -Sequences known to belong to this class detected by the pattern: ALL. # -Other sequence(s) detected in Swiss-Prot: 37. # # -Sequences known to belong to this class detected by the profile: ALL viral- # type proteases. # -Other sequence(s) detected in Swiss-Prot: 3. # # -Note: these proteins belong to families A1 and A2 in the classification of # peptidases [4,E1]. # # -Last update: December 2001 / Text revised; profile added. # # [ 1] Foltmann B. # Essays Biochem. 17:52-84(1981). # [ 2] Davies D.R. # Annu. Rev. Biophys. Biophys. Chem. 19:189-215(1990). # [ 3] Rao J.K.M., Erickson J.W., Wlodawer A. # Biochemistry 30:4663-4671(1991). # [ 4] Rawlings N.D., Barrett A.J. # Meth. Enzymol. 248:105-120(1995). # [E1] http://www.expasy.org/cgi-bin/lists?peptidas.txt # # +----------------------------------------------------------------------------+ # | This PROSITE entry is copyright by the Swiss Institute of Bioinformatics | # | (SIB). There are no restrictions on its use by non-profit institutions as | # | long as its content is in no way modified and this statement is not | # | removed. Usage by and for commercial entities requires a license agreement | # | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch). | # +----------------------------------------------------------------------------+ # # *************** # # #--------------------------------------
For more information about the matching PROSITE patterns, see the files: /biomed/db/prosite/prosite.dat and /biomed/db/prosite/prosite.doc





