In this section we take a closer look at the results of running Graphitour
on raw data obtained from a PDB file of a DNA molecule by stripping
off all the information except for the list of atoms and the list of of covalent bonds. To simplify the representation of results, hydrogen atoms and corresponding edges were removed from the graph. Additionally, to support the symmetry, we artificially linked the DNA into a loop by extra bonds from 3' to 5' ends.
The resulting list of nodes and
edges in the incidence matrix
is presented as input to Graphitour and is depicted in Figure 5.
Figure 6.left and 6.right show two sample non-terminals
from the actual output from the Graphitour software, illustrating the
discovery of basic DNA elements.
[width=12.5cm,height=9.cm]WholeDNA.ps |
[height=7cm]Guanine.eps[height=7cm]SugarPhosphate.eps |
Figure 6.left shows the compound object induced by the Graphitour
algorithm, which corresponds to the base of Guanine. All four DNA bases
were identified as non-terminal nodes in the resulting graph grammar.
This particular structure is identified as a non-terminal by the number
in a list of non-terminals and described as type
in the
lexicon of compounds. It is at the depth
in the hierarchy tree of
the whole graph, being joined with the corresponding backbone phosphate
& sugar one level higher in the upstream;
subsequently joined with the complete nucleotide adjacent to it, and so forth.
Figure 6.right shows the compound object induced by the Graphitour
algorithm, which corresponds to the backbone of the molecule: joined phosphate
and sugar. This particular structure is identified as a non-terminal by
the number in a list of non-terminals and described as type
in the lexicon of compounds. It is at the depth
in the
hierarchy tree of the whole graph, same depth as compound in
Figure 6.left which makes sense. Note that although the number of
lexicon entries is over a hundred, the vast majority of numbers were only used for
intermediate entries, which got merged into composite elements. There is no reusing of the available lexicon slots. Such re-use could give some extra compression efficiency (compare to Sequitour [6]). The resulting lexicon is small and all the entries in the final lexicon are biologically relevant compounds.
Graphitour actually discovers the hierarchical structure corresponding to the kind of zoom-in hierarchical description we gave in the introduction. The algorithm identifies all of these structural elements without exception and does not select any elements which would not make biological sense. The significance of such result might not be immediately clear if one does not take into account that the algorithm does not have any background knowledge, any initial bias, nothing at all except for the list of nodes and edges.
[width=6cm]Star4Alon.eps[width=6cm]Star3Alon.eps
[width=6cm]Star2Alon.eps[width=6cm]X1Alon.eps |
As an example of a rather different application domain, we tried Graphitour on escherichia coli transcriptional regulation network, since it is one of the well known attempts to discover "network motifs" or building blocks in a graph as over-represented sub-graphs by Alon et al. [9]. In this network there are nodes corresponding to the genes or groups of genes jointly transcribed (operons). Edges correspond to the
known interactions:
each edge is directed from an operon that encodes a transcription factor to an operon that it directly regulates. For more information on the biological significance of the motif discovery in regulatory networks we refer you to the cited paper [9] and supplementary materials at the Nature Internet site.
Alon et al. find that much of the network could be composed of repeated appearances of three highly significant motifs, that is a sub-graph, of a fixed topology, in which nodes are instantiated with different node labels. One example is what they call a feedforward loop
combined with
, where
are instantiated by particular genes.
In order to compare to the results of Alon et al. we label nodes simply according to cardinality, omitting gene identities, and take into account directionality of the edges. No hierarchical structure or feedforward loops were distilled from the data. Rather, Graphitour curiously disassembled the network back into the sets of star-like components as illustrated by Figure 7, which ought to have been initially put together to create such network. It leaves open the question of what is a useful definition of a network component in biological networks, if not the one requiring the most parsimonious description of the network.
Leonid Peshkin 2007-03-23