Motif finding methods

The first step in MMF is launching motif discovery programs. Currently, four of them are available:

MMF lets the user set the following parameters suitable for all mentioned programs:

At the end of this step, the results are gathered together with the motifs from external database. Multiple copies of the same motif returned any of the programs are the removed not to bias the results.

BioProspector is a program using a Gibbs sampling strategy, and Markov background to model the base dependencies of non-motif bases.

Reference: Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pacific Symposium on Biocomputing 2001;:127-38.


License: MIT license

MDscan is a program designed specially for ChIP-array experiments, however can be used in other experiments where some of the sequences may contain motif sites. The algorithm combines the advantages of two search strategies: word enumeration and iterative updating of motif's PSSM.

Reference: Liu XS, Brutlag DL, Liu JS, An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiment, Nature Biotechnology 2002 Aug;20(8):835-9.


License: MIT license

MEME(Multiple EM For Motif Elicitation) tool uses a statistical method (EM - Expectation Maximisation) for identifying highly conserved regions.

References: Timothy L. Bailey, Charles Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, (28-36), AAAI Press, 1994.

Timothy L. Bailey, Nadya Williams, Chris Misleh, and Wilfred W. Li, MEME: discovering and analyzing DNA and protein sequence motifs, Nucleic Acids Research, Vol. 34, pp. W369-W373, 2006.


License: MEME is copyrighted software and can be licensed for commercial use.

Weeder searches for candidate motifs by scanning a suffix tree built for input sequences. Additionally, the program uses a background model based on pre-computed frequencies of all possible 6- and 8-bp subsequences from several most important organisms.

Reference: Giulio Pavesi, Giancarlo Mauri, Graziano Pesole, An algorithm for finding signals of unknown length in DNA sequence, Bioinformatics, Vol. 17 No Suppl. 1, June 2001, Pages: S207-S214.


License: Please see Weeder license.