Exercises for bioinformatics.psc.edu:
Classification Libraries

Classification libraries are used to assist in the determination and classification of sequences. Searching classification libraries can help aid the researcher by helping to predict the structure, function, family or superfamily that an unknown sequence belongs to.

This exercise is designed to take you through the steps of searching classification libraries. Please read the entire step before typing in anything on the computer. Also please make a hardcopy of this exercise and fill in the blank lines. Your responses will often be referred to in later steps. Note that you will need to substitute the actual filenames for names enclosed in square brackets (eg. [infilename]). This exercise assumes that the participant has successfully completed the Digital Unix Hands-On.

Get search sequence

  1. In this example we will use as our search sequence RHIZOPUSPEPSIN PRECURSOR (EC Copy this sequence, which is in the file named carp_rhich.swiss in the directory /biomed/lib/example Enter cp /biomed/lib/example/carp_rhich.swiss carp_rhich.swiss

Search the data libraries

  1. Enter: makseq to run the makseq program
  2. To run the readseq program, place /home/biomed/bin in your UNIX path. Otherwise, you will have to substitute /biomed/bin/readseq wherever "readseq" is mentioned below (Step 6.1). People who have successfully completed the Unix Operating System Hands-On will already have /home/biomed/bin added to their UNIX paths.
  3. Choose to search the classification libraries
  4. Enter the name of the file containing the search sequence (See step 1.1)
  5. Write the name of the script file name (the file ending in .job) below:
  6. Submit the script file to the PBS queue. Enter: qsub [scriptfile] -o [logfile] where [scriptfile] is the filename in step 2.4 and [logfile] is a file name that you made up. A good practice in naming a [logfile] is to simply substitute .log for .job -- For example if your [scriptfile] was named Classify.job then a good [logfile] name would be Classify.log. (In this example, one would submit the job with the command "qsub Classify.job -o Classify.log"). Write the name of the [logfile] below:
  7. When the script file is successfully submitted, the system will respond with an identifier (e.g. 132.codon.psc.edu). Write that identifier here:
  8. The script file will take between 5 and 30 minutes to run, depending on how many other workshop participants are running items. (Remember, you can check on the status of your job by typing in "qstat"). When your job is complete, examine the log file (step 2.5) for errors.
  9. After the script file is complete, you should notice several new files in your directory:
  10. ProSite:
    1. ProSite output files should now be in your directory. This file should end in .ProSite-Out. Write the name of the file here:
    2. Examine the output file listed in step 2.9.1. Which Prosite Motifs match the motifs in your query sequence?
  11. Pfam library
    1. Two output files from searching Pfam should now be in your directory. These files should end in .HMM-local-Out and .HMM-global-Out Write the name of the files here:
    2. Examine the output files listed in step 2.10.1. Which Pfam hidden Markov Models match the motifs in your query sequence?

Interpret results

  1. Which structure or function is it most likely that query sequence has (or what family is it most likely that the query sequence belongs to?)

Search NRBSC

NRBSC Gateways

Microphysiology Gateway image.

Volumetric Data and Viz Gateway Analysis.

Quantum Mechanics/Molecular Mechanics Simulation Gateway.

NRBSC projects are made possible by these sponsors:

NIH logo. Pittsburgh Supercomputing Center logo. NCRR logo.