Running and Using GeneDoc

GeneDoc installs as a regular windows program, into a GeneDoc program group. After starting GeneDoc, use file open and read in a MSF (Multiple Sequence File) file. GeneDoc saves configuration information for these files in the comment section of the MSF, so if the file was saved by GeneDoc, then it will be reopened with the same settings it was last saved with.

Reading/Importing Data

You can import non MSF files. Use the File/New menu, then select File/Import. This import dialog allows import from the Clipboard, Disk Files or manual input. Clustal, fasta and a few other types can be read this way.

GeneDoc Web Viewer

GeneDoc supports a command line argument as a file name. This allows GeneDoc to be run from another program, such as a web browser or database program.

DDE support

GeneDoc provides minimal DDE support, File Open and Print. This allows you to setup windows to have GeneDoc open or print files automatically by double clicking or right click on a MSF file icon.

GeneDoc Display Modes

GeneDoc has several view modes. These are accessed through the Windows menu or Project toolbar. The Alignment view, Summary View and Tree view can be opened and used at any time. The Report View and Plot view will be empty or not able to open if you have not selected a report to show or a plot to view.

Alignment View

When GeneDoc is first opened, you will see the Alignment view. The Alignment view should be considered the primary view of GeneDoc. In this view, you will be able to see the properties of each residue in the alignment with the various shading modes, you will be able to arrange the residues to improve your alignment. You will be able to score various sections on the alignment and add manual comments and manual shading.

Summary View

A useful view is the Summary view. This view shows you the presence or absence of a residue, but not its value, thus the alignment is viewed in a compressed fashion. GeneDoc will also apply shading to the Summary view. You can control the summary view settings in the Project Configuration dialog. GeneDoc can compress the Summary View up to one dot per column, so on a high resolution printer the amount of compression can be considerable.

Tree View

GeneDoc provides a phylogenetic tree view. This view is can be used to construct the phylogenetic tree with a point and click interface, and the Manage Expression dialog will edit, load and save phylogenetic information in Vermont syntax. GeneDoc does not attempt to compute any phylogenetic information for the alignment, though it will use it to score a selected set of columns. The view is used only to build and view phylogenetic trees.

Report View

GeneDoc creates several text reports, and will show them the Reports View. The Reports view holds the information, but does not save it on the disk. There is a menu entry in the reports menu for saving the report to a text file of your choice.

Plot View

GeneDoc displays a few Cumulative Distribution plots in the Plot view. At its simplest, the plot view can be used to show Percent Identity or Percent Favorable Substitutions for the alignment as a whole. Using the Super Family Groups of GeneDoc, the plot can be used to show that the scoring within groups is significantly better than scoring between different groups, thus demonstrating that your super family groups are well chosen. These functions are controlled through the Plot menu.

Gel View

GeneDoc displays a simple Enzyme Gel Simulation view. This view contains two lists, the list of sequences and the list of loaded enzymes. Select multiple sequences and multiple enzymes and click the 'Run Gel' button. The resulting sequence fragments are plotted on the view on a log scale.

Project Settings

GeneDoc has a rich set of Project Configurationsettings. While some of these settings are controlled through menus, all of the settings are found in the Configuration Dialog. This dialog is accessed either through the Project menu or the Project toolbar.

Configure Dialog

The Configuration Dialog holds ten tabs. Each tab holds various GeneDoc settings related to each other and described by the tab title. Tab functions can be put into three groups, Project Setup, Print Control and Shading Control. The first tab, Project, controls font size, consensus lines, alignment blocking and other settings that apply to every display. The Print tab controls printer margins, page headers, footers, numbers and the like. A Shade tab mimics a lot of the entries found in the Shade menu, with a few other settings for the conserved and quantified shading style. There is a scoring tab that allows you to select which DayHoff or PAM scoring tables and substitution groups you want to use. The rest of the tabs are for control of individual shading modes. There is a tab for Properties, Physiochemical properties, Pattern Search, Log Odds, Identities and Structure. All aspects of these display modes are controlled through these configuration tabs. Here is where you change colors, and add, edit and delete patterns or properties, load data files for display modes, whatever. The configuration dialog does not have anything to do with manual sequence arranging, though scoring settings can be controlled here.

Sequence Edit Dialog

The Project menu also holds the Edit Sequences dialog. In this dialog, sequences can be added or imported, deleted. You can Complement, Reverse and Duplicate Sequences here. Comments about the sequences can be entered. Weights can be changed, which are used by the Log Odds displays.

Project Titling Facility The Project menu also has the Titling Facility. The titling facility gives you a convenient way to enter comments at the top of the MSF file. These comments are not saved in the usual GeneDoc encoded header, but above them in ascii text, so anyone or any program will have access to them.

Save and Load User Defaults

Save and Load User Defaults is a way to save the current settings as GeneDoc's default settings. These would apply when you open a MSF file that has not been previously saved by GeneDoc. If you want to apply these settings to a file that has GeneDoc settings, then load the file and then use the Load User Defaults, these settings will replace whatever GeneDoc's current settings are.

Arranging Alignments

Arranging alignments is a primary function of GeneDoc. There are many features provided to help accomplish this task. Both mouse and keyboard operations are supported. GeneDoc's Grab and Drag arrangement mode allows you to move residues around like beads on a string. You can Slide residues, which only inserts or delete gaps immediately in front of the selected residue, preserving other gap placements. You can insert and delete a gap with either the mouse or the INS and DEL keys. You can insert gaps into every other residue from the one clicked on, and insert columns of gaps. GeneDoc also provides the ability to select groups of sequences to work on. See Tips on Arrangement and Alignment View.

Using The Mouse

Using the mouse for arrangements gives you quick adjustment abilities. Grab and Drag and Slide modes work by clicking on the desired residue and while holding the mouse button down moving the mouse back and forth. Other modes, such as insert single gaps or insert gaps into other sequences, use a single mouse click. You can select columns for certain alignment functions by enabling the column select mode then clicking on the start and end columns to be selected. Moving the mouse over the alignment updates the location indicator in the lower right corner of the GeneDoc window. This will show you what sequence and residue number the mouse is positioned over.

Using the Keyboard

The keyboard is very useful for arrangement of alignments. The keyboard is more precise than the mouse, especially if the mouse is dirty or jumpy. The keyboard can be slower than the mouse, but every effort has been made to help keep use of the keyboard efficient. There are a few shortcut keys that are helpful as well. After you become more familiar with GeneDoc, the keyboard is often a faster way to work with GeneDoc than the mouse. See Tips on Arrangement and Alignment View.

Additional Arrangement Functions

In addition to the movement of residues for arrangement purposes, there are few other helpful functions. You can select a range of columns to be deleted. As mentioned before, you can select a set of sequences to be worked upon. When you select sequences, clicking within the selected set performs work upon the selected set. When you click outside of the selected set, work is performed upon the unselected set.

Copy To & From Other Projects

A rather useful function, for specific cases, is the ability to copy a range of selected columns to another project, edit just that range, then replace them back into the original project. This can be useful for a variety of purposes, from manually arranging a restricted area, to exporting an area of the alignment to be realigned or aligned with a different alignment program.

Editing functions for DNA projects

There are a few editing functions specific to DNA projects as well. These include complementing and reversing sequences, copying data between sequences, and creating a fasta style output of the consensus row. A specific use of the Properties mode is very helpful for DNA projects as well. If the configuration dialog has the project set to a non protein type, then Property Shading Mode level 3 has been pre setup to shade each nucleotide in a unique color.

DNA Ambiguity Support

GeneDoc has built in ambiguity support for DNA projects. This support is used in the realm of determining conservation for columns and where ever else is appropriate.

DNA/Protein Translation/ReGap

A DNA to Protein translation feature is incorporated into GeneDoc. You can choose from a set of translation tables or modify one to your own. There are several Frame Options for the translation process. If you make arrangement changes to a Protein alignment, you can ReGap the corresponding DNA project. Using the cylce of Translate/ReGap, you can keep both the DNA and Protein projects in sync.

Scoring while Arranging

GeneDoc provides scoring functions for the alignment view. Scores can be calculated for a selected range, presumably a range of columns you are rearranging, and then have the area recalculated conveniently after you have made arrangement changes. This is intended to give you an objective basis for determining whether your alignment changes are actually better or not. Scoring can be done with Sum of Pairs, Phylogenetic Tree, or Log Odds methods. For larger numbers of sequences, phylogenetic scoring can be rather slow.

Scoring Tables

Several Dayhoff and PAM similarity scoring tables are built into GeneDoc. These tables are used anywhere a score between two residues is needed. For example, while the conserved and quantify display modes depend mostly on the residue counts within a column, ties are broken by determining which of a set scores better against the rest of the column. The scoring tables are also used in the scoring features of GeneDoc.

Similarity Tables

Also, Similarity tables have been built for each scoring tables. These similarity tables represent selections of favorable substitutions of residues for a given scoring table. These scoring tables can be altered and saved in user defaults. They are used in the conserved and quantify display modes. When used, they show conservation of favorable substitutions as well as identities.

Find and Replace functions

GeneDoc also includes Find and Replace functions. The Find function is the more useful, as the replace is simple and replaces only segments of the same length. The Find function has features for mismatches and insertions and deletions, and you can select which sequences to search. There are also functions for moving the cursor to the found location or shading all found locations.

Manual Editing

You can enter comments, change residues and make manual shading within your alignment. These changes are saved for you. Manual comments can be entered on any non sequence line, and will over write any other characters which may be displayed on those lines, such as column number or consensus. Manual shading is done with the mouse, by selecting a color and clicking and dragging over any residues you want shaded. Comment lines may not be shaded. You can change the values of residues in your alignment as well. These changes will change the data in your MSF file, so you should have good reason for doing this.

Shading Alignments

Shading alignments is the other major function of GeneDoc. There are quite a few shading modes, and with super family group functions, the possibilities are extended a quite a bit. The different shading modes each represent application of different algorithms to the alignment. Each shading mode is controlled in the Configuration Dialog, as previously mentioned. Many of the toolbar buttons of the alignment view control the shading modes. You can also apply differences mode to the alignment view. This mode changes the display of residues, but not the shading. It highlights where residues are different from the consensus of residues for the column.

Pre Configured Shading Modes

There are three shading modes that don't require and setup by the user, though they can be adjusted. These are Conserved, Quantified, and Physiochemical display modes. The Conserved and Quantified modes are based on the identity of residues, the similarity tables, and the score table for breaking ties in identity. These modes show percentage of conservation within a column in the alignment. Physiochemical display mode is based on the physical and chemical properties of amino acids and identifies where those properties are conserved within and alignment. These groups were originally proposed and presented as a Venn diagram by W.R. Taylor (1986), though GeneDoc has changed them somewhat.

User Configured Shading Modes

Another two other shading modes used by GeneDoc requires some user setup. These are Properties and Identities. Actually, Properties mode is configured for some obvious chemical properties by default, but its expected you would want to setup your own property groups. Properties mode is useful for highlighting sets of amino acids, either conserved or where ever found. For example, you could enter a set of hydrophobic amino acids and have them highlighted. Identities mode compares the identity of one or more sequences to the rest of the alignment, so you have to choose one or more sequences to get any output from this display mode. If more than one sequence is in the chosen set, then the chosen set of sequences must be conserved before they will be compared to the rest of the alignment.

Shading With External Files

There are three shading modes that require external files to be loaded into GeneDoc for them to function. These are Search, Log Odds and Structure modes. Files that need to be loaded would be ReBase or ProSite Files, Meme or similarly formatted Log Odds matrices, and Protein Structure files such as PDB or DSSP files. GeneDoc will reload structure files automatically whenever the alignment is reopened if the file is located in the same disk directory as the MSF file. Search mode is configurable by the user, but you will need to enter search strings in ReBase or ProSite syntax.

Search: ProSite and ReBase files

In Search mode, you load a ReBase or ProSite file and GeneDoc location enzymes or motifs found in your alignment and highlights them. The configuration dialog then allows you to delete or disable patterns, changes coloring, etc. Search mode stores the located patterns in your alignment, so you do not need to keep a copy of the ReBase or ProSite file with the alignment. You can also export a list of found Enzymes in ReBase syntax for your alignment.

Log Odds: Meme files

In Log Odds mode, motifs described by Log Odds arrays, such as are produced by the Meme program maintained at San Diego Supercomputing Center (www.sdsc.edu/MEME), are loaded into GeneDoc and used to highlight your alignment. Again, the Configuration Dialog contains several options for controlling usage of the motifs and shading of the alignment. You can then shade the alignment based on the motifs found in the Log Odds file.

Secondary Structure Files

Structure shading mode uses several different external file types. A file type can contain more than one set of information. PDB file can be read in for example, and the secondary structure information applied to the sequence in the alignment that corresponds to the PDB file. DSSP file, from EMBL can be read as well. Some EMBL sponsored programs that predict secondary structure can be read in and the prediction or other information, such as prediction probability, can be applied to the appropriate sequence as well.

User Defined Files

While this shading mode is called Structure mode, any information can be used, such as accessibility. GeneDoc reads in PSDB files, which are derived from the PDB information, and contain quite a large set of calculations that may be applied to a sequence in the alignment. More than one file type can be applied to each sequence, and each sequence can have file types applied to it. There is a scheme for allowing the user to define a custom data set and read these user files into GeneDoc for shading. This allows considerable flexibility, though at the expense of some initial effort.

Super Family Group Features

GeneDoc supports Super Family Groups. It does this first by allowing you to define groups of sequences within the alignment as belonging to the same family through the Group Configuration Dialog. In this dialog, you can apply a subset of the the shading modes to each of the groups. For example, you could show conserved shading for each group in the alignment. The Shade Group Configuration function switches you between non group and group shading.

Specialized Super Family displays

Additionally, there are a number of shading functions specialized for super family groups. These functions typically contrast scores or identities between the groups or to shade across all groups before shading within groups. These specialized shading modes can be specific to either Protein or DNA alignments as well.

Plots

Some analysis functions are available that plot the results in graphical format. These tell about inter sequence identities or scores in graphical format. This can be convenient for alignments with a lot of sequences, since the equivalent report would be too large to be read or printed easily. GeneDoc will also use its score tables to show you favorable substitution levels as well as identity levels.

The DStat function

You can do more a complicated super family group analysis, using statistical functions to show the validity of your groupings. After the super family groups have been setup, then you can select a range and make a DStat plot. You will get two lines on your plot, one for scores within groups and one for score between groups. If your groups are properly selected, the will show statistical differences.

Reports: Stats, Score, Composition

GeneDoc provides a few reports for your alignment. A common Statistics report is given, though GeneDoc also includes favorable substitutions as determined by the current scoring table in this report. GeneDoc shows a score report, which gives scores between sequences in your alignment. There is a Base Composition report. There is a specialized enzyme report based on the sequence fragments identified by using the ReBase Search mode that sorts fragments lengths, finds unique ones, and identifies unique fragment lengths across super family groups. You can select a range of columns and compute a Log Odds matrix for it.

Exporting and Copying Figures

GeneDoc provides a lot of ways to write shaded (or non shaded) alignments to the clipboard or a file for use in other programs genedoc_output. You must first select a block within your alignment. GeneDoc typically breaks up the alignment into blocks and displays them vertically on the screen for scrolling, and after you select one or more blocks you can then copy or export menu_copy them. Export types include RTF, PICT , Meta Files, HTML, Bitmap and Text. Every attempt to include all shading and other display features has been included, though of course formats like text won't support such.

Getting postscript file output

There is no built in support for PostScript. You should find it convenient enough to load a postscript driver and set the output to a file for creating PostScript files genedoc_postscript.

Toolbars

GeneDoc puts many common functions into Toolbars. There are a couple sets of toolbars. The two most common are the Project Toolbar and the Alignment View Toolbar.

Project Toolbar

The Project Toolbar is the upper toolbar and is always visible regardless of which view you are showing. This toolbar shows functions that are applicable for whatever view, such as file save and print. The Project Toolbar has buttons for the Configuration Dialog, the Edit Sequence List Dialog and the Group Configuration Dialog. The Project Toolbar also has buttons for switching between the various views of GeneDoc.

Alignment Toolbar

The Alignment Toolbar controls features of the Alignment view. The Alignment view is where most of the work gets done in GeneDoc. Mainly, buttons here control the shading menu_shade and arranging menu_arrange features of the alignment view. This toolbar is only visible when the alignment view is active.

Tree Toolbar

The Tree View Toolbar has a few buttons for use in constructing phylogenetic relationships with the GeneDoc interface. GeneDoc provides a GUI interface for creating and deleting nodes of the tree.

Tips on using the Toolbar

The toolbar can be used to access the configuration dialog in a quick fashion. If you click on a display mode, such as conserved, then GeneDoc will switch to that shading mode. If you click on the conserved toolbar button again, the configuration dialog will be opened with the conserved setup tab selected. See Toolbar Tips toolbar_tips.

RasMol Scripts

GeneDoc has the ability to create simple scripts for the RasMol program. These scripts can be imported into RasMol and used to color the molecule with the shading done in GeneDoc. This is useful for coloring molecules in ways RasMol does not support, or applying the information of an alignment to the molecule for visualization purposes.