Exercises for bioinformatics.psc.edu:
Phylogenetic Analysis

This exercise is designed to introduce you to the steps involved in creating and viewing phylogenetic trees. This exercise is not a complete step-by-step example, you will have to think about the problem and what you are trying to acomplish before moving on to the next step. Please read the entire step before typing in anything on the computer. Also please make a printout of this web page and fill in the blank lines. Your responses will often be referred to in later steps.


Program Setup

  1. Make a directory for the phylogenetics exercise by typing: mkdir phylo
  2. Change your directory to the phylo directory. Enter: cd phylo
  3. Copy a sample multiple sequence alignment to use for this exercise. enter: cp /biomed/lib/example/sprot30.msf sprot30.msf . Write the name of the copied file below (i.e. sprot30.msf). _________________________________________________________________________
  4. First convert the alignment to the PHYLIP interleaved format with the readseq program. Enter the command: readseq
  5. Enter an output file name (a name that you made up) such as sprot30.phylip. Write that name below: _________________________________________________________________________
  6. Select output file format 12 (PHYLIP).
  7. Enter the file name listed on step 1.4 as the input file.
  8. Select to convert all sequences.
  9. Press Enter to quit readseq.

Create a file of PHYLIP pairwise distances using protdist

  1. Enter the command protdist.
  2. The program will responds can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 1.6.
  3. You will now see a list of options. You may want to use the P option to page through the gatagories until the Dayhoff PAM matrix is listed. Next, indicate that the settings are correct by entering Y.
  4. The program will then compute distances and will require only a few seconds of cpu time. The results will be written to a file names outfile.
  5. Rename the output file created by the protdist program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30.protdist, then you would enter: mv outfile sprot30.protdist). Write the name that you selected for the [newoutname] file below:
    ________________________________________________________________

Create a neighbor joining tree from the set of distances file of PHYLIP pair wise
distances using the neighbor program

  1. Compute the tree by entering the neighbor command.
  2. The program will responds can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 2.5.
  3. You will now see a list of options. Once again these options should all be correct, but you may want to check to make sure that the Neighbor-joining tree option and not the UPGMA tree option is selected. Next, indicate that the settings are correct by entering Y.
  4. The program will then compute distances and will require only a few seconds of cpu time. The results will be written to two files: outfile and outtree
  5. Rename the outfile file created by the neighbor program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30.neighor, then you would enter: mv outfile sprot30.neighbor). Write the name that you selected for the [newoutname] file below: _________________________________________________________________________
  6. Use the more command to examine the output file listed in step 3.5
  7. Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30.tree, then you would enter: mv outfile sprot30.tree). Write the name that you selected for the [newtreename] file below: _________________________________________________________________________ The outtree file is a New Hampshire formated file that can be viewed with a variety of freeware tree viewers.

View the tree file
There are a number of ways that the treefile can be viewed. First, the outtree file can be read in by the drawtree program which can display the tree and write out a postscript file of the tree (which can then be previewed using a postscript previewer such as ghostscript or downloaded and printed on a local postscript printer.) Second, the ATV program can be used to display the tree file on a remote x-windows conpatable computer. Finally, The outtree file can be downloaded onto to a local machine and viewed with a variety of freeware viewers such as TreeView, from Rod Page, or the ATV program from the Eddy group at University of Washington at St. Louis.

  1. Displaying trees using drawtree
    1. In order to create graphics files using the drawtree program, you must first copy a required fontfile to working directory. Execute the command: cp /biomed/lib/phylip/font2 fontfile
    2. The drawtree program can allow you to preview the tree directly from bioinformatics ONLY IF you are using a workstation or PC that can recieve X-window graphics. If you are using such a machine, do the following:
      1. On bioinformatics.psc.edu Instruct the computer where it is to send the remote graphics display by giving it the name of your local computer by entering the command setenv DISPLAY localcomputer:0.0 where localcomputer is the Internet name of your local computer and will have a form similar to computer.site.sitetype such as bioinformatics.psc.edu. (For example, setenv DISPLAY ctc01.psc.edu:0.0) If you are unsure of the local address of the computer that you are using, issue the command who -m. The localcomputer that you are using will be listed within parenthesis at the end of the line starting with your user id. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, this name will be something like ctc01.psc.edu.
      2. Make sure that your localcomputer is set up to accept and display remote windows from bioinformatics.psc.edu. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, follow the separate instructions given earlier on what needs to be done with the X-WIN32 software
    3. Run drawtree. Enter the command drawtree
    4. The program will responds can't find input tree file intree. At the Please enter a new filename> prompt, enter the New Hampshire outtree name listed in step 3.7 (or step 5.33)
    5. Enter V to display the preview options.
    6. IF YOU ARE USING A WORKSTATION THAT CAN DISPLAY X-WINDOWS GRAPHICS, select option X OTHERWISE select option N - will not be previewed.
    7. Enter P to display the plotting device options.
    8. Select L to create a postscript file.
    9. Enter Y to accept the default settings
    10. Select will not be previewed. Enter: N
    11. Accept the default settings. Enter: Y
    12. The program will show you a preview of the results if a preview was selected in step 4.1.6 Once you are done viewing the preview select menu option FILE then QUIT
    13. The (Postscript) results will be written to the file named: plotfile
    14. Rename the plotfile file created by the drawtree program. Use the mv command to do this by entering mv plotfile [newplotname] where [newplotname] is a filename that you made up. (For example, if the [newplotname] that you made up was sprot30.ps, then you would enter: mv outfile sprot30.ps). Write the name that you selected for the [newplotname] file below:
      ______________________________________________________________
    15. The postscript file can now be transfered to your local computer and printed on a local postscript printer or be displayed with a postscript previewer such as ghostscript.
  2. Displaying trees using the AVT tree viewing program.
    1. The ATV tree viewing program is installed on bioinformatics, however in order to use it you MUST be using a workstation or PC that can recieve X-window graphics. If you are using such a machine, do the following:
      1. On bioinformatics.psc.edu Instruct the computer where it is to send the remote graphics display by giving it the name of your local computer by entering the command setenv DISPLAY localcomputer:0.0 where localcomputer is the Internet name of your local computer and will have a form similar to computer.site.sitetype such as bioinformatics.psc.edu. (For example, setenv DISPLAY ctc01.psc.edu:0.0) If you are unsure of the local address of the computer that you are using, issue the command who -m. The localcomputer that you are using will be listed within parenthesis at the end of the line starting with your user id. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, this name will be something like ctc01.psc.edu. (NOTE YOU DO NOT NEED TO DO THIS STEP IF YOU ALREADY DID STEP 4.1.2.1 ABOVE)
      2. Make sure that your localcomputer is set up to accept and display remote windows from bioinformatics.psc.edu. If you are doing this exercise at a workshop held on-site at the Pittsburgh Supercomputing Center, follow the sepparate instructions given earlier on what needs to be done with the X-WIN32 software (NOTE YOU DO NOT NEED TO DO THIS STEP IF YOU ALREADY DID STEP 4.1.2.2 ABOVE)
    2. Use the ATV program to view the New Hampshire outtree file. Enter atv [outtree] where [outtree] is the file you listed in step 3.7 (or step 5.33)
    3. To quit the ATV program, select menu option File then Exit
  3. To display the outtree file and/or a drawtree postscript file on a local computer is beyond the scope of this hands on. However, if you want to pursue thsi option, consider the following:
    1. The postscript file can be be viewed by the ghostscript program on a number of different types of computers and operating systems. The ghostscript program can be downloaded from: http://www.ghostscript.com/
    2. The New Hampshire outtree can be be viewed by the ATV program on a number of different types of computers. The ATV program can be downloaded from: http://www.genetics.wustl.edu/eddy/atv/

    Create a bootstrap consensus tree

  4. Run the seqboot and create 10 resampled alignments. Enter: seqboot
  5. The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 1.7.
  6. You will now be presented with a menu of options. Select R (replicates)
  7. Enter 10 as the number of replicates.
  8. You will now be presented with a menu of options. Select Y to indicate that the settings are correct.
  9. You will now be asked to enter a random number seed. Enter an odd number such as 459863
  10. The program will then write the results to the file named outfile.
  11. Rename the outfile file created by the program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.seqboot, then you would enter: mv outfile sprot30_boot.seqboot). Write the name that you selected for the [newoutname] file below:
    ______________________________________________________________
  12. Enter the command protdist.
  13. The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.8
  14. You will now see a list of options. Select option M to indicate that multiple data sets are to be analyzed.
  15. You will now be asked if you have multiple data sets or multiple weights.. Select option D to indicate that multiple data sets are to be analyzed.
  16. Enter 10 because there are 10 data sets.
  17. You may want to use the P option to page through the catagories until the Dayhoff PAM matrix is listed (to be consistant with step 2.3 above)
  18. Next, indicate that the settings are correct by entering Y.
  19. The program will then compute distances and will require a few minutes to run. The results will be written to a file named outfile.
  20. Rename the output file created by the protdist program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.protdist, then you would enter: mv outfile sprot30_boot.protdist). Write the name that you selected for the [newoutname] file below:
    _____________________________________________________________
  21. The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.17
  22. You will now see a list of options. Select option M to indicate that multiple data sets are to be analyzed.
  23. Enter 10 because there are 10 data sets.
  24. You will now be asked to enter a random number seed. Enter an odd number such as 459863
  25. Next, indicate that the settings are correct by entering Y.
  26. The program will then require a few minutes to run. The results will be written to two files: outfile and outtree
  27. Rename the outfile file created by the neighbor program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.neighor, then you would enter: mv outfile sprot30_boot.neighbor). Write the name that you selected for the [newoutname] file below:
    ______________________________________________________________
  28. Use the more command to examine the output file listed in step 5.25
  29. Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30_boot.tree, then you would enter: mv outtree sprot30_boot.tree). Write the name that you selected for the [newtreename] file below:
    ______________________________________________________________
    Finally, the trees (in the New Hampshire formatted file) need to be analyzed by the PHYLIP program consense which will produce a consensus tree. Enter: consense
  30. The program will respond can't read infile. At the Please enter a new filename> prompt, enter the file listed in step 5.27
  31. Next, indicate that the settings are correct by entering Y.
  32. The program will take a few seconds to run and will produce two output files outfile and outtree
  33. Rename the outfile file created by the consense program. Use the mv command to do this by entering mv outfile [newoutname] where [newoutname] is a filename that you made up. (For example, if the [newoutname] that you made up was sprot30_boot.consense, then you would enter: mv outfile sprot30_boot.consense). Write the name that you selected for the [newoutname] file below:
    _______________________________________________________________
  34. Use the more command to examine the output file listed in step 5.31
  35. Rename the outtree file created by the neighbor program. Use the mv command to do this by entering mv outtree [newtreename] where [newtreename] is a filename that you made up. (For example, if the [newtreename] that you made up was sprot30_boot.constree, then you would enter: mv outtree sprot30_boot.constree). Write the name that you selected for the [newtreename] file below:
    ____________________________________________________________
    The outtree can be viewed or converted into a postscript file by following step 4 (View the tree files) with the outtree listed in step 5.33. How does the single alignment tree compare to the bootstrapped tree?

Search NRBSC


NRBSC Gateways

Microphysiology Gateway image.

Volumetric Data and Viz Gateway Analysis.

Quantum Mechanics/Molecular Mechanics Simulation Gateway.


NRBSC projects are made possible by these sponsors:

NIH logo. Pittsburgh Supercomputing Center logo. NCRR logo.