Structure induction by lossless graph compression

Leonid Peshkin

Center for Biomedical Informatics

Harvard Medical School

Abstract:

This work is motivated by the necessity to automate the discovery of structure in vast and ever-growing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs.



Acknowledgement

A substantial part of this work was done while the author was visiting MIT CSAIL with Prof. Leslie Kaelbl ing, supported by the DARPA through the Department of the Interior, NBC, Acquisition Services Division, under Contract No. NBCH D030010.

Leonid Peshkin 2007-03-23