Tools to Support Phylogenetic Analysis
Introduction
Over the coming year FIG and its friends plan on developing and releasing a number of tools
to support phylogenetic analysis. At this point, we are making
available a tool for inserting taxa into a tree, assuming that one has
an alignment of SSU rRNA and a tree that includes some subset of the
taxa in the alignment. At one time this technology was used to extend
the tree distributed by thye Ribosomal Database Project. We have
revived it and make it available now as a first step in supporting the
development of a large SSU-based phylogenetic tree. By itself, it is
not adequate for many tasks. A few key tools are needed to complement
the set we are making available. We plan on making these additional
tools available over the coming year.
Extending an Existine Tree by Insertion of One Sequence at a Time
Assuming that you have
- a new alignment (call it ssu_alignment.fasta),
- a table giving a correspondence between IDs and organisms (call
it ssu.names), and
- an old tree (call it old.ssu.tree).
you should follow these steps to extend the tree:
-
First, you need to verify that all of the ids in the tree are still
in the alignment. To do this, run
compare_tree_and_alignment ssu_alignment.fasta old.ssu.tree tmp.tree.only tmp.ali.only tmp.both
If tmp.tree.only is not empty, run
mv old.ssu.tree old.ssu.tree.BAK
subtree_of old.ssu.tree.Bak < tmp.both > old.ssu.tree
-
If you are inserting into a tree that may contain fragments, you should
probably consider removing short sequences and then insert sequences in
descending order of length (i.e., nonambiguous characters).
count_bases < ssu_alignment.fasta | sort -n -r +1 > nonambiguous
gives counts for sequences in the alignment.
initial_set tmp.both nonambiguous > initial.ids
subtree_of old.ssu.tree < initial.ids > initial.tree
mv initial.tree old.ssu.tree
to_insert nonambiguous initial.ids > tmp.ali.only
is how we recommend handling this.
-
Now you need to get weights and rates for each column of the
alignment. To do this, run
make_rates ssu_alignment.fasta old.ssu.tree > weights_and_rates
- Now you are ready to do insertions. To accomplish this, run
insert_all ssu_alignment.fasta old.ssu.tree weights_and_rates < tmp.ali.only > new.ssu.tree
As the insertion runs, it updates the "old.ssu.tree". This means that if
the run gets terminated, take off the initial section of tmp.ali.only,
and just restart it.
You can display your tree (very crudely) using
display_tree new.ssu.tree ssu.names
Note that the tree is unrooted. We supply a command for rooting it,
but you do need to understand exactly where you wish to place the
root. To root it, use
root_at Node1 Node2 FractionBetween < UntootedTree > RootedTree
where the the nodes are specified as either
-
tip id, or
- three ids separated by commas (which gives a unique point in the
tree).
You can extract a representative tree by using
representative_tree big.tree N
where big.tree is the file containing a newick tree and
N is the number of nodes desired in the representative tree.