A given protein family can be divided into fundamental units of analysis called cohesion groups. Within a cohesion group, most members will be phylogenetically pure and their protein tree will parallel a small section of a 16S rRNA tree for the corresponding organisms, as expected for a vertical genealogy. Some members, denoted intruder sequences, have been translocated via lateral gene transfer (LGT) to genomes distant from those genomes containing the phylogenetically pure members. In such cases, the LGT scenario is clearly defined in that the recipient lineage is apparent and the donor lineage of the gene transferred is localized to those organisms which host the cohesion group. As such, cohesion groups are rigorous units for making bioinformatic and evolutionary inferences about various character states (e.g., substrate specificity) of the protein. Once the latter foundation is firmly in place, the scope of the analysis can be progressively enlarged because the continual availability of sequences from new genomes is expected to result, not only in the formulation of new cohesion groups, but also with the merging of cohesion groups as phylogenetic gaps are progressively filled. In addition, as exemplified by previous work with the seven proteins of tryptophan biosynthesis, concatenation of functionally coordinated proteins has been shown to confer greatly expanded resolving power. The assembly of such supercohesion groups, which correspond to metabolic segments, is envisioned as an advanced step that will support ever more rigorous bioinformatic and evolutionary analysis. The cohesion-group approach promises enablement of a focussed and correct knowledge base which is amenable to systematic and semi-automated expansion __ both within a given metabolic subsystem and ultimately to connected subsystems of biochemical networks.
So far, supercohesion groups have been assembled for tryptophan-protein concatenates (Xie et al., 2004), and cohesion groups have been determined for TyrA proteins (Song et al., 2005; Bonner et al., 2008). Completion of equivalent work with the remaining aromatic-pathway segments will identify the repertoire of bacterial organisms in possession of a pure vertical genealogy with respect to aromatic biosynthesis, as well as a list of those which are mosaic or partially mosaic http://aropath.lanl.gov/Phylogeny/CG/index.html. Each cohesion group is clickable to enanble progression to a Table where a listing of the membership of the various cohesion groups is maintained. A listing of orphan proteins that belong to none of the current cohesion groups or supercohesion groups is given. Such orphans reflect the lack of sufficient genome representation in particular phylogenetic regions and undoubtedly will become the nucleus for additional cohesion groups.