Exercises for bioinformatics.psc.edu:
Classification Libraries - Results
If there was a match between your sequence and the Pfam hidden Markov models, your output will contain the results from a global search (with results similar to the following:)
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
HMM file: /home/biomed/db/pfam/current/Pfam_ls
Sequence file: carp_rhich.FASTA
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query sequence: CARP_RHICH
Accession: [none]
Description: STANDARD; PRT; 393 AA., 393 bases, 6385C502 checksum.
Scores for sequence family classification (score includes all domains):
Model Description Score E-value N
-------- ----------- ----- ------- ---
Asp Eukaryotic aspartyl protease 544.9 6.7e-161 1
ApbE ApbE family -167.8 5.1 1
RNA_pol_Rpb8 RNA polymerase Rpb8 -71.9 5.2 1
Cystatin Cystatin domain -16.1 5.9 1
GTP_EFTU_D2 Elongation factor Tu domain 2 -9.6 6.2 1
Parsed for domains:
Model Domain seq-f seq-t hmm-f hmm-t score E-value
-------- ------- ----- ----- ----- ----- ----- -------
ApbE 1/1 5 312 .. 1 345 [] -167.8 5.1
Cystatin 1/1 29 141 .. 1 100 [] -16.1 5.9
Asp 1/1 79 391 .. 1 359 [] 544.9 6.7e-161
RNA_pol_Rpb8 1/1 88 177 .. 1 152 [] -71.9 5.2
GTP_EFTU_D2 1/1 273 372 .. 1 75 [] -9.6 6.2
Alignments of top-scoring domains:
ApbE: domain 1 of 1, from 5 to 312: score -167.8, E = 5.1
*->lalaflallalglrAcraatkavsleGkaMGtlyrvrilgassaaea
l++ + a++al +++ + a+ +k + +++ ++++sa
CARP_RHICH 5 LISSCIAIAALAVAVDA-APG-----EKKISIPLAKNPNYKPSA--- 42
aeileevitreldrlErllSlyrkDSeLsrlNrna.gqpvavspelaelL
+ +++++i+ +++ ++ +N +++g+ + ++++
CARP_RHICH 43 KNAIQKAIA--------KYNKHK-------INTSTgGIVPDAGVGTVPMT 77
kesldfaekTgGaFDPTVGPlvnLWgfgfdsedkPptiPspdalke..al
+ + d+ +G +T+G + +++ fd+ s+d + +++
CARP_RHICH 78 DYGNDVEY--YGQ--VTIGTPGKKFNLDFDT-------GSSDLWIAstLC 116
alvGwkkleLsa....................neekvtlqkanPgmaLDL
+++G ++++ +++++++ + ++++ + + +++ ++ l+k+n ++L
CARP_RHICH 117 TNCGSRQTKYDPkqsstyqadgrtwsisygdgSSASGILAKDN----VNL 162
nG.iakGfAaDrlaelLe..aegienylVdlGGeiralGkrpeGrpWrVa
+G + kG ++ +++++ a g ++l+ lG + ++ ++
CARP_RHICH 163 GGlLIKGQTIELAKREAAsfANGPNDGLLGLG--FDTITTVRG------- 203
irdPtdagegavgavidlrdrAv....aTSGpygryf...drdGkrfsHI
+P d + +i++++ +v ++a++G g y+ +++++ k f+
CARP_RHICH 204 VKTPMDN--LISQGLISRPIFGVylgkASNGGGGEYIfggYDSTK-FK-- 248
lDPrTGRrPlehnSyPvrsVSViAptaaeADAlaTALf....vlgekksa
+ +T +P+ S + ++V ++a++ + +T++++ +++l+
CARP_RHICH 249 -GSLTT-VPIDN-SRGWWGITV--DRATV--GTSTVASsfdgILDTGT-- 289
riaalrevwAvlrliddgsvfaenlavlriekass<-*
+ +l + +A + a+++ +++
CARP_RHICH 290 TLLILPNNVA------------ASVARAYGASDNG 312
Cystatin: domain 1 of 1, from 29 to 141: score -16.1, E = 5.9
*->GglspaddNendpevqeaadfAvakyNeks.dgykfelv........
+ + ++ + p ++ a +A+akyN++ ++++ +v++ + ++
CARP_RHICH 29 SIPLAKNP-NYKPSAKNAIQKAIAKYNKHKiNTSTGGIVpdagvgtv 74
......evveaksQvVaGt.ltnYyievevgettCskeskkdledCplld
+ ++ ++ ve+ Qv Gt++ ++ +++++g ++ ++ ++C +
CARP_RHICH 75 pmtdygNDVEYYGQVTIGTpGKKFNLDFDTGSSDLWI-ASTLCTNCGSR- 122
qpeeawegfCkfqvfkkpw<-*
q++ +++ ++q++ + w
CARP_RHICH 123 QTKYDPKQSSTYQADGRTW 141
Asp: domain 1 of 1, from 79 to 391: score 544.9, E = 6.7e-161
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
*->yldDaeYygtIsIGTPpQkFtVvFDTGSSDLWVPDsSvyCtssySaq
y +D+eYyg+++IGTP++kF+++FDTGSSDLW+ S+ Ct+++
CARP_RHICH 79 YGNDVEYYGQVTIGTPGKKFNLDFDTGSSDLWIA--STLCTNCG--- 120
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TniACkshgtFdPskSSTYkslGttIffsIsYGdGSSasGflgqDTVtvG
++++++dP++SSTY+++G+t + sIsYGdGSSasG+l++D+V++G
CARP_RHICH 121 -----SRQTKYDPKQSSTYQADGRT-W-SISYGDGSSASGILAKDNVNLG 163
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
GisvtnQqFGlatkePGsfFvtavfDGILGlGfpsieavggssaftytpV
G+ +++Q+++la++e++s F++ ++DG+LGlGf++i++v+g +++
CARP_RHICH 164 GLLIKGQTIELAKREAAS-FANGPNDGLLGLGFDTITTVRG-----VKTP 207
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
fdnlksQGlIdspaFSvYLNsddgsaqasgGeiiFGGvDpskYtGsltwv
+dnl+sQGlI++p+F+vYL++ ++ ++gGe+iFGG+D++k++Gslt+v
CARP_RHICH 208 MDNLISQGLISRPIFGVYLGKASN---GGGGEYIFGGYDSTKFKGSLTTV 254
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
pVtssddegdivsqgyWqitldsitvggsaCHttfcssGcqAIlDTGTsL
p+++ s+g+W+it+d+ tvg s t+ ss +++IlDTGT+L
CARP_RHICH 255 PIDN--------SRGWWGITVDRATVGTS----TVASS-FDGILDTGTTL 291
RF xxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx
lygPssivskiakavGAsese.GeYvvdCdsisslpdvtFfigGkkitVP
l++P++++++ a+a+GAs++ +G+Y+++Cd++ +++ ++F+i+G++++V+
CARP_RHICH 292 LILPNNVAASVARAYGASDNGdGTYTISCDTS-RFKPLVFSINGASFQVS 340
RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
psayvlqnseggssPndiClsGfqssddipppggplwILGDvFLRsyYvV
p+++v+++++g +C+ Gf + +++++ I+GD+FL+++YvV
CARP_RHICH 341 PDSLVFEEYQG------QCIAGFGY------GNFDFAIIGDTFLKNNYVV 378
RF xxxxxxxxxxxxx
FDrdNnrvGlApa<-*
F+++ ++v++Ap+
CARP_RHICH 379 FNQGVPEVQIAPV 391
RNA_pol_Rpb8: domain 1 of 1, from 88 to 177: score -71.9, E = 5.2
*->dDIFkVksvDPDGkKydkVSRieAeSesldqMeLiLDINsqlYPlav
+V+ P GkK++ + + +s d
CARP_RHICH 88 ----QVTIGTP-GKKFN----LDFDTGSSD----------------- 108
gDkfrLviAstLNlEDgtaddgsatreynPtkaddrpsYLaDkYEYvMYG
L iAstL + + gs y+P + +Y aD G
CARP_RHICH 109 -----LWIASTL-----CTNCGSRQTKYDP--KQSS-TYQAD-------G 138
KvYriegDEtsiegktPklsLvYvSFGGLLMrLqGdarnLhgFelDsrlY
i+ + +g + ++s G L + d nL g++
CARP_RHICH 139 RTWSIS---YG-DG-S-SAS-------GILAK---DNVNLGGLLIKGQTI 172
LLlrR<-*
L +R
CARP_RHICH 173 ELAKR 177
GTP_EFTU_D2: domain 1 of 1, from 273 to 372: score -9.6, E = 6.2
*->GtVatgRvesGtlkkGdeveiggngtkkyvpevtrVtslemfhg.ld
Gt ++ ++G+l G ++ i++n + s+ + g d
CARP_RHICH 273 GTSTVASSFDGILDTGTTLLILPN---------NVAASVARAYGaSD 310
ea................................vaGdnaGlivagiglk
++++ + + ++++ ++ + ++ + + ++++ v +++ G +ag g
CARP_RHICH 311 NGdgtytiscdtsrfkplvfsingasfqvspdslVFEEYQGQCIAGFGYG 360
daa.ikrGdvla<-*
+ +Gd++
CARP_RHICH 361 NFDfAIIGDTFL 372
//
To find out more information about the matching families, note the name of the matching hidden Markov model file. In the example above, the best matching model (ie the one with the smallest E-value) was asp To look at the alignment of the matching family, go to: http://pfam.wustl.edu/browse.shtml and click on asp under section A.
The alignments shown can be interpreted as follows: a capital letter shows the dominate conserved residue within the hidden Markov Model while lower case letters indicate the most common residue in areas where the conservation is not as strong. The middle line also contains a + to indicate relatively good matches with the hidden Markov Model while a space indicates poor conservation.
If there are no matches in Pfam to your sequence, you will get no results.
PROSITE
The EMBOSS patmatmotifs program can search the ambiguous patterns in the PROSITE database. Patterns in prosite will either match, or not match your protein; Thus all matching sites should be investigated, especially the larger sites. Below is an example of output from searching PROSITE:
########################################
# Program: patmatmotifs
# Rundate: Fri Jul 09 15:55:24 2004
# Report_format: dbmotif
# Report_file: carp_rhich.ProSite-Out
########################################
#=======================================
#
# Sequence: CARP_RHICH from: 1 to: 393
# HitCount: 3
#
# Full: Yes
# Prune: Yes
# Data_file: /home/biomed/emboss/share/EMBOSS/data/PROSITE/prosite.lines
#
#=======================================
Length = 4
Start = position 94 of sequence
End = position 97 of sequence
Motif = AMIDATION
VTIGTPGKKFNLDF
| |
94 97
Length = 12
Start = position 100 of sequence
End = position 111 of sequence
Motif = ASP_PROTEASE
GKKFNLDFDTGSSDLWIASTLC
| |
100 111
Length = 12
Start = position 283 of sequence
End = position 294 of sequence
Motif = ASP_PROTEASE
ASSFDGILDTGTTLLILPNNVA
| |
283 294
#---------------------------------------
#
# Motif: AMIDATION
# Count: 1
#
# ******************
# * Amidation site *
# ******************
#
# The precursor of hormones and other active peptides which are C-terminally
# amidated is always directly followed [1,2] by a glycine residue which provides
# the amide group, and most often by at least two consecutive basic residues
# (Arg or Lys) which generally function as an active peptide precursor cleavage
# site. Although all amino acids can be amidated, neutral hydrophobic residues
# such as Val or Phe are good substrates, while charged residues such as Asp or
# Arg are much less reactive. C-terminal amidation has not yet been shown to
# occur in unicellular organisms or in plants.
#
# -Consensus pattern: x-G-[RK]-[RK]
# [x is the amidation site]
# -Last update: June 1988 / First entry.
#
# [ 1] Kreil G.
# Meth. Enzymol. 106:218-223(1984).
# [ 2] Bradbury A.F., Smyth D.G.
# Biosci. Rep. 7:907-916(1987).
#
# +----------------------------------------------------------------------------+
# | This PROSITE entry is copyright by the Swiss Institute of Bioinformatics |
# | (SIB). There are no restrictions on its use by non-profit institutions as |
# | long as its content is in no way modified and this statement is not |
# | removed. Usage by and for commercial entities requires a license agreement |
# | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch). |
# +----------------------------------------------------------------------------+
#
# ***************
#
# Motif: ASP_PROTEASE
# Count: 2
#
# *****************************************************************
# * Eukaryotic and viral aspartyl proteases signature and profile *
# *****************************************************************
#
# Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a widely
# distributed family of proteolytic enzymes [1,2,3] known to exist in
# vertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate
# proteases of eukaryotes are monomeric enzymes which consist of two domains.
# Each domain contains an active site centered on a catalytic aspartyl residue.
# The two domains most probably evolved from the duplication of an ancestral
# gene encoding a primordial domain. Currently known eukaryotic aspartyl
# proteases are:
#
# - Vertebrate gastric pepsins A and C (also known as gastricsin).
# - Vertebrate chymosin (rennin), involved in digestion and used for making
# cheese.
# - Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34).
# - Mammalian renin (EC 3.4.23.15) whose function is to generate angiotensin I
# from angiotensinogen in the plasma.
# - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18), candidapepsin
# (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin
# (EC 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizopuspepsin
# (EC 3.4.23.21).
# - Yeast saccharopepsin (EC 3.4.23.25) (proteinase A) (gene PEP4). PEP4 is
# implicated in posttranslational regulation of vacuolar hydrolases.
# - Yeast barrierpepsin (EC 3.4.23.35) (gene BAR1); a protease that cleaves
# alpha-factor and thus acts as an antagonist of the mating pheromone.
# - Fission yeast sxa1 which is involved in degrading or processing the mating
# pheromones.
#
# Most retroviruses and some plant viruses, such as badnaviruses, encode for an
# aspartyl protease which is an homodimer of a chain of about 95 to 125 amino
# acids. In most retroviruses, the protease is encoded as a segment of a
# polyprotein which is cleaved during the maturation process of the virus. It
# is generally part of the pol polyprotein and, more rarely, of the gag
# polyprotein.
#
# Conservation of the sequence around the two aspartates of eukaryotic aspartyl
# proteases and around the single active site of the viral proteases allows us
# to develop a single signature pattern for both groups of protease. A profile
# was developed to specifically detect viral aspartyl proteases, which are
# missed by the pattern.
#
# -Consensus pattern: [LIVMFGAC]-[LIVMTADN]-[LIVFSA]-D-[ST]-G-[STAV]-[STAPDENQ]-
# x-[LIVMFSTNC]-x-[LIVMFGTA]
# [D is the active site residue]
# -Sequences known to belong to this class detected by the pattern: ALL.
# -Other sequence(s) detected in Swiss-Prot: 37.
#
# -Sequences known to belong to this class detected by the profile: ALL viral-
# type proteases.
# -Other sequence(s) detected in Swiss-Prot: 3.
#
# -Note: these proteins belong to families A1 and A2 in the classification of
# peptidases [4,E1].
#
# -Last update: December 2001 / Text revised; profile added.
#
# [ 1] Foltmann B.
# Essays Biochem. 17:52-84(1981).
# [ 2] Davies D.R.
# Annu. Rev. Biophys. Biophys. Chem. 19:189-215(1990).
# [ 3] Rao J.K.M., Erickson J.W., Wlodawer A.
# Biochemistry 30:4663-4671(1991).
# [ 4] Rawlings N.D., Barrett A.J.
# Meth. Enzymol. 248:105-120(1995).
# [E1] http://www.expasy.org/cgi-bin/lists?peptidas.txt
#
# +----------------------------------------------------------------------------+
# | This PROSITE entry is copyright by the Swiss Institute of Bioinformatics |
# | (SIB). There are no restrictions on its use by non-profit institutions as |
# | long as its content is in no way modified and this statement is not |
# | removed. Usage by and for commercial entities requires a license agreement |
# | (See http://www.isb-sib.ch/announce/ or email to license@isb-sib.ch). |
# +----------------------------------------------------------------------------+
#
# ***************
#
#
#--------------------------------------
For more information about the matching PROSITE patterns, see the files: /biomed/db/prosite/prosite.dat and /biomed/db/prosite/prosite.doc





