Exercises for bioinformatics.psc.edu:
Multiple Sequence Alignment

A multiple sequence alignment shows the juxtaposition of residues between a set of sequences. The juxtaposition can help highlight which residues are conserved or selectively mutated, showing residues that may be structurally/functionally important. A multiple sequence alignment can be thought of as representing the best guess as to the detailed evolutionary history of the sequences being aligned.

This exercise is designed to take you through the steps of performing multiple sequence alignmnents. This exercise is not a complete step-by-step example, You will have to think about the problem and what you are trying you accomplish before moving onto the next step. Please read the entire step, before typing in anything on the computer. Also please fill in the blank lines. Your recorded responses are often referred to in later steps. This exercise assumes that the user is in the C shell (csh) and the participant has successfully completed the Unix Operating System Hands-On exercises.


Get related sequences to align

  1. In this example we will align set of aspartyl proteases sequences. Copy the file containing these sequences which is called asp.pir in the directory /biomed/lib/examples Enter cp /biomed/lib/examples/asp.pir asp.pir Write the name of the file in your directory containing the aspartyl proteases sequences here:
    ________________________________________________________________________
  2. Copy the file containing additional aspartial sequences, which is in a file called addasp.pir in the directory /biomed/lib/examples. Enter cp /biomed/lib/examples/addasp.pir addasp.pir Write the name of the file in your directory containing the additional aspartyl proteases sequences here:
    ________________________________________________________________________

Use the MSA program to create a multiple sequence alignment

  1. Use the MAKSEQ program to write a script file. Enter makseq to start the Makseq program
  2. Select Multiple sequence alignment.
  3. Select MSA.
  4. Enter the file written down in step 1.1 as the query file.
  5. Enter YES; these sequences are proteins.
  6. Enter NO This will do both the Heuristic alignment and the optimal alignment. Normally one would only do the optimal alignment after reviewing the results of the heuristic alignment
  7. Enter NO. You will not need the heuristic alignment returned as an MSF file.
  8. Answer YES to the question asking for an optimal alignment returned as an MSF file.
  9. Elect not to penalize terminal gaps the same as internal gaps.
  10. Enter YES to use the evolutionary tree.
  11. Enter NO. We are not specifying a maximum divergence
  12. Enter NO. We are not defining the epsilons.
  13. Select the MSA-250 matrix.
  14. Select NO we are not adjusting the epsilons.
  15. Use the default output file. Write that file name below:
    ___________________________________________________________
  16. Write the script file name below:
    _______________________________________________________
  17. Write the filename for the optimal alignment in MSF format below:
    __________________________________________________________________
  18. Submit the script file to the PBS queue. Enter: qsub [scriptfile] -o [logfile] where [scriptfile] is the filename in step 2.16 and [logfile] is a file name that you made up. A good practice in naming a [logfile] is to simply substitute .log for .job -- For example if your [scriptfile] was named Fasta.job then a good [logfile] name would be Fasta.log. (In this example, one would submit the job with the command "qsub Fasta.job -o Fasta.log"). Write the name of the [logfile] below:
    ______________________________________________________________
  19. When the script file is successfully submitted, the system will respond with an identifier (e.g. 132.codon.psc.edu). Write that identifier here:
    ____________________________________________________________________
  20. The script file will take several minutes to run, depending on how many other workshop participants are running items. Unless the system is particularally heavily loaded, The script should complete within about fifteen minutes. (Remember, you can check on the status of your job by typing in "qstat"). When your job is complete, examine the log file (step 2.18) for errors. Next examine the optimal multiple alignment file (step 2.17).

Use the T-Coffee Program to produce a multiple sequence alignment

  1. Use the MAKSEQ program to write a script file. Enter makseq to start the Makseq program
  2. Select Multiple sequence alignment
  3. Select T-COFFEE
  4. Enter the file written down in step 1.1 as the query file.
  5. Enter YES; these sequences are proteins
  6. Select Gotoh's;
  7. Select the Slow tree computational method
  8. Select the Slow normalize method
  9. Accept the default output file name. Write that file name below:
    ________________________________________________________
  10. Write the script file name below:
    ________________________________________________________
  11. Write the MSF filename below:
    ________________________________________________________
  12. Write the Score filename below:
    ________________________________________________________
  13. Write the dendogram filename below:
    ________________________________________________________
  14. Submit the script file to the PBS queue. Enter: qsub [scriptfile] -o [logfile] where [scriptfile] is the filename in step 3.10 and [logfile] is a file name that you made up. A good practice in naming a [logfile] is to simply substitute .log for .job -- For example if your [scriptfile] was named Fasta.job then a good [logfile] name would be Fasta.log. (In this example, one would submit the job with the command "qsub Fasta.job -o Fasta.log"). Write the name of the [logfile] below:
    _______________________________________________________________________
  15. When the script file is successfully submitted, the system will respond with an identifier (e.g. 132.codon.psc.edu). Write that identifier here:
    _____________________________________________________________________
  16. The script file will take several minutes to run, depending on how many other workshop participants are running items. Unless the system is particularally heavily loaded, The script should complete within about fifteen minutes. (Remember, you can check on the status of your job by typing in "qstat"). When your job is complete, examine the log file (step 3.14) for errors. Next examine the multiple alignment file (step 3.11). Finally, examine the score file (step 3.12). Note that the score file is a postscript file. You should review this file by printing it on a postscript printer or looking at it with a postscript previewer such as ghostscript.

Use the Clustal Program to produce a multiple sequence alignment & phylogenetic tree.

  1. Enter clustalw
  2. Enter 1 to load your sequences.
  3. Enter the filename containing the sequences in the NBRF-PIR format (see step 1.1)
  4. Enter 2 for the multiple alignment menu.
  5. Enter 9 for the output format menu.
  6. Select 3 to produce an MSF file.
  7. Hit the Enter key to leave the output format menu.
  8. Enter 8 to turn the screen display off.
  9. Enter 1 to produce a complete multiple sequence alignment.
  10. Take the default Clustal output file name. Write the that name below: ________________________________________________________________________
  11. Take the default GCG MSF file name. Write that name here: ________________________________________________________________________
  12. Take the default Clustal guide tree name. Write the that name here: ________________________________________________________________________
  13. Hit the Enter key to leave the multiple alignment menu.
  14. Select 4 phylogenetic trees.
  15. Select 2 to exclude positions with gaps.
  16. Select 3 to correct for multiple substitutions.
  17. Draw the tree. Enter 4.
  18. Take the default phylip tree output name. Write the that name here: ________________________________________________________________________
  19. Hit the Enter key to leave the phylogenetic tree menu.
  20. Enter X to leave the program.
  21. Examine the MSF output file. (The name is written in step 4.11)

Add more sequences to an already existing alignment

  1. Edit the optimal alignment file (.msf), produced by the MSA program in step 2.17, to make it identifiable by the Clustal W program (a GCG .msf file). Open the file the file using your prefered text editor and insert the word "PileUp" (without the quotes) at the left edge of the first line in the file. Save the file.
  2. Start the CLUSTAL W program by entering the command: clustalw
  3. Select Profile / Structure Alignments
  4. From the Profile Alignment menu, select Input 1st profile
  5. You will be asked to enter the name of a file containing a multiple sequence alignment. Enter the name of the file you edited in step 5.1 above (originally recorded in step 2.17). The menu should reappear with the notation "loaded" after item 1 in the menu.
  6. Now select Input 2nd profile/sequences from the Profile Alignment menu.
  7. You will be asked for the name of a file containing a multiple sequence alignment or individual sequences. Type in the name of the file containiung the additional aspartyl protease sequences (step 1.2) The names of the sequences should scroll across the screen and the menu should reappear with the notation "loaded" after item 2 in the menu.
  8. Select Output format options from the menu.
  9. Select Toggle GCG/MSF format output from the output format menu so that the item is marked "ON".
  10. Press the Enter key to return to the Profile Alignment menu.
  11. Select Align sequences to 1st profile from the menu. You will be asked to enter names for the alignment output files and the guide tree output file. Write the name of the clustal output file name below:
    _________________________________________________________
  12. Write the name of the clustal GCG MSF file name below:
    ______________________________________________________
  13. Write the name of the clustal guide tree file name below:
    _______________________________________________________________
  14. Exit the Clustal W program.
  15. An optional, but informative exercise, is to put all eight aspartyl proteases included in this alignment into a single file and align them as a single set using the Clustal W program and compare this result to that obtained in step 5.12 above.

Search NRBSC


NRBSC Gateways

Microphysiology Gateway image.

Volumetric Data and Viz Gateway Analysis.

Quantum Mechanics/Molecular Mechanics Simulation Gateway.


NRBSC projects are made possible by these sponsors:

NIH logo. Pittsburgh Supercomputing Center logo. NCRR logo.