Exercises for bioinformatics.psc.edu:
Classification Libraries - Results

If there was a match between your sequence and the Pfam hidden Markov models, your output will contain the results from a global search (with results similar to the following:)


HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file:                 /home/biomed/db/pfam/current/Pfam_ls
Sequence file:            carp_rhich.FASTA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Query sequence: CARP_RHICH
Accession:      [none]
Description:    STANDARD;      PRT;   393 AA., 393 bases, 6385C502 checksum.

Scores for sequence family classification (score includes all domains):
Model           Description                             Score    E-value  N
--------        -----------                             -----    ------- ---
Asp             Eukaryotic aspartyl protease            544.9   6.7e-161   1
ApbE            ApbE family                            -167.8        5.1   1
RNA_pol_Rpb8    RNA polymerase Rpb8                     -71.9        5.2   1
Cystatin        Cystatin domain                         -16.1        5.9   1
GTP_EFTU_D2     Elongation factor Tu domain 2            -9.6        6.2   1

Parsed for domains:
Model           Domain  seq-f seq-t    hmm-f hmm-t      score  E-value
--------        ------- ----- -----    ----- -----      -----  -------
ApbE              1/1       5   312 ..     1   345 []  -167.8      5.1
Cystatin          1/1      29   141 ..     1   100 []   -16.1      5.9
Asp               1/1      79   391 ..     1   359 []   544.9 6.7e-161
RNA_pol_Rpb8      1/1      88   177 ..     1   152 []   -71.9      5.2
GTP_EFTU_D2       1/1     273   372 ..     1    75 []    -9.6      6.2

Alignments of top-scoring domains:
ApbE: domain 1 of 1, from 5 to 312: score -167.8, E = 5.1
                   *->lalaflallalglrAcraatkavsleGkaMGtlyrvrilgassaaea
                      l++ + a++al +++ + a+      +k + +++    ++++sa
  CARP_RHICH     5    LISSCIAIAALAVAVDA-APG-----EKKISIPLAKNPNYKPSA--- 42

                   aeileevitreldrlErllSlyrkDSeLsrlNrna.gqpvavspelaelL
                   + +++++i+        +++ ++       +N +++g+  +  ++++
  CARP_RHICH    43 KNAIQKAIA--------KYNKHK-------INTSTgGIVPDAGVGTVPMT 77

                   kesldfaekTgGaFDPTVGPlvnLWgfgfdsedkPptiPspdalke..al
                   + + d+    +G   +T+G   + +++ fd+        s+d + +++
  CARP_RHICH    78 DYGNDVEY--YGQ--VTIGTPGKKFNLDFDT-------GSSDLWIAstLC 116

                   alvGwkkleLsa....................neekvtlqkanPgmaLDL
                   +++G ++++ +++++++ + ++++ + + +++ ++   l+k+n    ++L
  CARP_RHICH   117 TNCGSRQTKYDPkqsstyqadgrtwsisygdgSSASGILAKDN----VNL 162

                   nG.iakGfAaDrlaelLe..aegienylVdlGGeiralGkrpeGrpWrVa
                   +G + kG    ++ +++++ a g  ++l+ lG  + ++   ++
  CARP_RHICH   163 GGlLIKGQTIELAKREAAsfANGPNDGLLGLG--FDTITTVRG------- 203

                   irdPtdagegavgavidlrdrAv....aTSGpygryf...drdGkrfsHI
                     +P d      + +i++++ +v  ++a++G  g y+ +++++ k f+
  CARP_RHICH   204 VKTPMDN--LISQGLISRPIFGVylgkASNGGGGEYIfggYDSTK-FK-- 248

                   lDPrTGRrPlehnSyPvrsVSViAptaaeADAlaTALf....vlgekksa
                    + +T  +P+   S  +  ++V  ++a++  + +T++++ +++l+
  CARP_RHICH   249 -GSLTT-VPIDN-SRGWWGITV--DRATV--GTSTVASsfdgILDTGT-- 289

                   riaalrevwAvlrliddgsvfaenlavlriekass<-*
                    + +l + +A            +  a+++   +++
  CARP_RHICH   290 TLLILPNNVA------------ASVARAYGASDNG    312

Cystatin: domain 1 of 1, from 29 to 141: score -16.1, E = 5.9
                   *->GglspaddNendpevqeaadfAvakyNeks.dgykfelv........
                        + + ++ +  p ++ a  +A+akyN++  ++++  +v++ + ++
  CARP_RHICH    29    SIPLAKNP-NYKPSAKNAIQKAIAKYNKHKiNTSTGGIVpdagvgtv 74

                   ......evveaksQvVaGt.ltnYyievevgettCskeskkdledCplld
                   + ++ ++ ve+  Qv  Gt++ ++ +++++g ++      ++ ++C  +
  CARP_RHICH    75 pmtdygNDVEYYGQVTIGTpGKKFNLDFDTGSSDLWI-ASTLCTNCGSR- 122

                   qpeeawegfCkfqvfkkpw<-*
                   q++ +++   ++q++ + w
  CARP_RHICH   123 QTKYDPKQSSTYQADGRTW    141

Asp: domain 1 of 1, from 79 to 391: score 544.9, E = 6.7e-161
                RF    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   *->yldDaeYygtIsIGTPpQkFtVvFDTGSSDLWVPDsSvyCtssySaq
                      y +D+eYyg+++IGTP++kF+++FDTGSSDLW+   S+ Ct+++
  CARP_RHICH    79    YGNDVEYYGQVTIGTPGKKFNLDFDTGSSDLWIA--STLCTNCG--- 120

                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   TniACkshgtFdPskSSTYkslGttIffsIsYGdGSSasGflgqDTVtvG
                        ++++++dP++SSTY+++G+t + sIsYGdGSSasG+l++D+V++G
  CARP_RHICH   121 -----SRQTKYDPKQSSTYQADGRT-W-SISYGDGSSASGILAKDNVNLG 163

                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   GisvtnQqFGlatkePGsfFvtavfDGILGlGfpsieavggssaftytpV
                   G+ +++Q+++la++e++s F++ ++DG+LGlGf++i++v+g     +++
  CARP_RHICH   164 GLLIKGQTIELAKREAAS-FANGPNDGLLGLGFDTITTVRG-----VKTP 207

                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   fdnlksQGlIdspaFSvYLNsddgsaqasgGeiiFGGvDpskYtGsltwv
                   +dnl+sQGlI++p+F+vYL++ ++   ++gGe+iFGG+D++k++Gslt+v
  CARP_RHICH   208 MDNLISQGLISRPIFGVYLGKASN---GGGGEYIFGGYDSTKFKGSLTTV 254

                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   pVtssddegdivsqgyWqitldsitvggsaCHttfcssGcqAIlDTGTsL
                   p+++        s+g+W+it+d+ tvg s    t+ ss +++IlDTGT+L
  CARP_RHICH   255 PIDN--------SRGWWGITVDRATVGTS----TVASS-FDGILDTGTTL 291

                RF xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   lygPssivskiakavGAsese.GeYvvdCdsisslpdvtFfigGkkitVP
                   l++P++++++ a+a+GAs++ +G+Y+++Cd++ +++ ++F+i+G++++V+
  CARP_RHICH   292 LILPNNVAASVARAYGASDNGdGTYTISCDTS-RFKPLVFSINGASFQVS 340

                RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                   psayvlqnseggssPndiClsGfqssddipppggplwILGDvFLRsyYvV
                   p+++v+++++g      +C+ Gf +      +++++ I+GD+FL+++YvV
  CARP_RHICH   341 PDSLVFEEYQG------QCIAGFGY------GNFDFAIIGDTFLKNNYVV 378

                RF xxxxxxxxxxxxx
                   FDrdNnrvGlApa<-*
                   F+++ ++v++Ap+
  CARP_RHICH   379 FNQGVPEVQIAPV    391

RNA_pol_Rpb8: domain 1 of 1, from 88 to 177: score -71.9, E = 5.2
                   *->dDIFkVksvDPDGkKydkVSRieAeSesldqMeLiLDINsqlYPlav
                          +V+   P GkK++    +  + +s d
  CARP_RHICH    88    ----QVTIGTP-GKKFN----LDFDTGSSD----------------- 108

                   gDkfrLviAstLNlEDgtaddgsatreynPtkaddrpsYLaDkYEYvMYG
                        L iAstL     + + gs    y+P  +    +Y aD       G
  CARP_RHICH   109 -----LWIASTL-----CTNCGSRQTKYDP--KQSS-TYQAD-------G 138

                   KvYriegDEtsiegktPklsLvYvSFGGLLMrLqGdarnLhgFelDsrlY
                       i+    + +g + ++s       G L +   d  nL g++
  CARP_RHICH   139 RTWSIS---YG-DG-S-SAS-------GILAK---DNVNLGGLLIKGQTI 172

                   LLlrR<-*
                    L +R
  CARP_RHICH   173 ELAKR    177

GTP_EFTU_D2: domain 1 of 1, from 273 to 372: score -9.6, E = 6.2
                   *->GtVatgRvesGtlkkGdeveiggngtkkyvpevtrVtslemfhg.ld
                      Gt ++   ++G+l  G ++ i++n          +  s+ +  g  d
  CARP_RHICH   273    GTSTVASSFDGILDTGTTLLILPN---------NVAASVARAYGaSD 310

                   ea................................vaGdnaGlivagiglk
                    ++++ + + ++++ ++   + ++ + + ++++ v +++ G  +ag g
  CARP_RHICH   311 NGdgtytiscdtsrfkplvfsingasfqvspdslVFEEYQGQCIAGFGYG 360

                   daa.ikrGdvla<-*
                   +     +Gd++
  CARP_RHICH   361 NFDfAIIGDTFL    372

//

To find out more information about the matching families, note the name of the matching hidden Markov model file. In the example above, the best matching model (ie the one with the smallest E-value) was asp To look at the alignment of the matching family, go to: http://pfam.wustl.edu/browse.shtml and click on asp under section A.

The alignments shown can be interpreted as follows: a capital letter shows the dominate conserved residue within the hidden Markov Model while lower case letters indicate the most common residue in areas where the conservation is not as strong. The middle line also contains a + to indicate relatively good matches with the hidden Markov Model while a space indicates poor conservation.

If there are no matches in Pfam to your sequence, you will get no results.

PROSITE

The EMBOSS patmatmotifs program can search the ambiguous patterns in the PROSITE database. Patterns in prosite will either match, or not match your protein; Thus all matching sites should be investigated, especially the larger sites. Below is an example of output from searching PROSITE:

########################################
# Program: patmatmotifs
# Rundate: Fri Jul 09 15:55:24 2004
# Report_format: dbmotif
# Report_file: carp_rhich.ProSite-Out
########################################

#=======================================
#
# Sequence: CARP_RHICH     from: 1   to: 393
# HitCount: 3
#
# Full: Yes
# Prune: Yes
# Data_file: /home/biomed/emboss/share/EMBOSS/data/PROSITE/prosite.lines
#
#=======================================

Length = 4
Start = position 94 of sequence
End = position 97 of sequence

Motif = AMIDATION

VTIGTPGKKFNLDF
     |  |
    94  97

Length = 12
Start = position 100 of sequence
End = position 111 of sequence

Motif = ASP_PROTEASE

GKKFNLDFDTGSSDLWIASTLC
     |          |
   100          111

Length = 12
Start = position 283 of sequence
End = position 294 of sequence

Motif = ASP_PROTEASE

ASSFDGILDTGTTLLILPNNVA
     |          |
   283          294


#---------------------------------------
#
# Motif: AMIDATION
# Count: 1
#
# ******************
# * Amidation site *
# ******************
#
# The precursor of  hormones  and other active  peptides  which are C-terminally
# amidated is always directly followed [1,2] by a glycine residue which provides
# the amide group, and  most often by at  least two  consecutive  basic residues
# (Arg or Lys) which generally function as an active peptide  precursor cleavage
# site.  Although all amino acids can be amidated,  neutral hydrophobic residues
# such as Val or Phe are good substrates, while  charged residues such as Asp or
# Arg  are much less reactive.  C-terminal  amidation has not  yet been shown to
# occur in unicellular organisms or in plants.
#
# -Consensus pattern: x-G-[RK]-[RK]
#                     [x is the amidation site]
# -Last update: June 1988 / First entry.
#
# [ 1] Kreil G.
#      Meth. Enzymol. 106:218-223(1984).
# [ 2] Bradbury A.F., Smyth D.G.
#      Biosci. Rep. 7:907-916(1987).
#
# +----------------------------------------------------------------------------+
# | This PROSITE entry  is copyright  by the Swiss Institute of Bioinformatics |
# | (SIB).  There are no restrictions on its use by non-profit institutions as |
# | long as its  content  is  in  no  way  modified  and this statement is not |
# | removed. Usage by and for commercial entities requires a license agreement |
# | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch).      |
# +----------------------------------------------------------------------------+
#
# ***************
#
# Motif: ASP_PROTEASE
# Count: 2
#
# *****************************************************************
# * Eukaryotic and viral aspartyl proteases signature and profile *
# *****************************************************************
#
# Aspartyl  proteases, also  known as acid proteases, (EC 3.4.23.-) are a widely
# distributed family   of   proteolytic   enzymes  [1,2,3]  known  to  exist  in
# vertebrates, fungi, plants, retroviruses and some plant viruses.     Aspartate
# proteases of  eukaryotes are  monomeric  enzymes which consist of two domains.
# Each domain contains an  active site centered on a catalytic aspartyl residue.
# The two  domains  most  probably  evolved from the duplication of an ancestral
# gene encoding   a  primordial  domain.  Currently  known  eukaryotic  aspartyl
# proteases are:
#
#  - Vertebrate gastric pepsins A and C (also known as gastricsin).
#  - Vertebrate  chymosin  (rennin),  involved  in digestion and used for making
#    cheese.
#  - Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34).
#  - Mammalian renin (EC 3.4.23.15) whose function  is to generate angiotensin I
#    from angiotensinogen in the plasma.
#  - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18),   candidapepsin
#    (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin),  endothiapepsin
#    (EC 3.4.23.22),   polyporopepsin    (EC 3.4.23.29),    and   rhizopuspepsin
#    (EC 3.4.23.21).
#  - Yeast saccharopepsin (EC 3.4.23.25)  (proteinase A)  (gene PEP4).   PEP4 is
#    implicated in posttranslational regulation of vacuolar hydrolases.
#  - Yeast  barrierpepsin  (EC 3.4.23.35)  (gene BAR1);  a protease that cleaves
#    alpha-factor and thus acts as an antagonist of the mating pheromone.
#  - Fission  yeast sxa1 which is involved in degrading or processing the mating
#    pheromones.
#
# Most retroviruses and some plant  viruses, such as badnaviruses, encode for an
# aspartyl protease  which  is an homodimer of  a chain of about 95 to 125 amino
# acids. In  most  retroviruses, the  protease  is  encoded  as  a  segment of a
# polyprotein which  is  cleaved  during the maturation process of the virus. It
# is generally  part  of  the  pol  polyprotein  and,  more  rarely,  of the gag
# polyprotein.
#
# Conservation of  the sequence around the two aspartates of eukaryotic aspartyl
# proteases and  around  the single active site of the viral proteases allows us
# to  develop a single signature pattern for both groups of protease.  A profile
# was developed  to  specifically  detect  viral  aspartyl  proteases, which are
# missed by the pattern.
#
# -Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]-
#                     x-[LIVMFSTNC]-x-[LIVMFGTA]
#                     [D is the active site residue]
# -Sequences known to belong to this class detected by the pattern: ALL.
# -Other sequence(s) detected in Swiss-Prot: 37.
#
# -Sequences known to belong to this class detected by the profile: ALL viral-
#  type proteases.
# -Other sequence(s) detected in Swiss-Prot: 3.
#
# -Note: these  proteins  belong  to families A1 and A2 in the classification of
#  peptidases [4,E1].
#
# -Last update: December 2001 / Text revised; profile added.
#
# [ 1] Foltmann B.
#      Essays Biochem. 17:52-84(1981).
# [ 2] Davies D.R.
#      Annu. Rev. Biophys. Biophys. Chem. 19:189-215(1990).
# [ 3] Rao J.K.M., Erickson J.W., Wlodawer A.
#      Biochemistry 30:4663-4671(1991).
# [ 4] Rawlings N.D., Barrett A.J.
#      Meth. Enzymol. 248:105-120(1995).
# [E1] http://www.expasy.org/cgi-bin/lists?peptidas.txt
#
# +----------------------------------------------------------------------------+
# | This PROSITE entry  is copyright  by the Swiss Institute of Bioinformatics |
# | (SIB).  There are no restrictions on its use by non-profit institutions as |
# | long as its  content  is  in  no  way  modified  and this statement is not |
# | removed. Usage by and for commercial entities requires a license agreement |
# | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch).      |
# +----------------------------------------------------------------------------+
#
# ***************
#
#
#--------------------------------------

For more information about the matching PROSITE patterns, see the files: /biomed/db/prosite/prosite.dat and /biomed/db/prosite/prosite.doc

Search NRBSC


NRBSC Gateways

Microphysiology Gateway image.

Volumetric Data and Viz Gateway Analysis.

Quantum Mechanics/Molecular Mechanics Simulation Gateway.


NRBSC projects are made possible by these sponsors:

NIH logo. Pittsburgh Supercomputing Center logo. NCRR logo.