Using graph modularity analysis to identify transcription factor binding sites
Abstract
Despite the remarkable success of computational biology methods in some areas of application like gene finding and sequence alignment, there are still topics for which no definitive approaches have been proposed. One of these is the accurate detection of biologically significant cis-regulatory motifs, that remains an open problem, despite intensive research in the field. Probabilistic motif finders are most popular, mainly because combinatorial motif finders generate extensive and hard to understand lists of potential motifs. In this work, we present Needle, a method for de novo motif discovery that works by post-processing the output of a combinatorial motif finder, using graph analysis techniques. The method is based on the identification of highly connected modules in the graph that is obtained by connecting the nodes that correspond to motifs if these motifs are co-located in the sequences under analysis. We have tested this method against several well known motif finders, using a set of recently published large-scale compendium of transcription factors, derived from diverse high-throughput experiments in several metazoan. Preliminary results show that the method is highly competitive with state of the art methods that use much more extensive information. We expect that future versions of the algorithm, that will include a number of improvements, will become one of the methods of choice to identify significant cis-regulatory motifs that include only a small conserved core.
Origin | Files produced by the author(s) |
---|