L-SME
Loosely-Structured Motif Extractor
The discovery of information
encoded in biological sequences is assuming a prominent role in identifying
genetic diseases and in deciphering biological mechanisms. This information
is usually represented by patterns frequently occurring in the sequences,
also called motifs. In fact, motif
discovery has received much attention in the literature, and several
algorithms have already been proposed that, in particular, deal with motifs
exhibiting some kinds of “regular structure”. Motivated by biological
observations, this work focuses on more general classes of motifs where
several “exceptions” may be tolerated in pattern repetitions. To mine these loosely structured motifs, two
algorithms can exploited: an exhaustive version using data structures
specifically designed to deal with pattern variabilities, and a randomized
version presenting a linear-time (and space) requirement in the dimension of
the input database, and a theoretical guarantee on its performances. Despite
the ability of mining very complex kinds of patterns, performance
allow a genome-wide applicability of the algorithms. To access the system click here If the link does not work, pleas contact us Main Reference (please cite this): F. Fassetti, G. Greco, G. Terracina, Mining Loosely Structured Motifs from
Biological Data, IEEE Transaction on Knowledge and Data Engineering
(TKDE). 20(11), 1472-1489. 2008. IEEE Computer Other
references: F. Fassetti, G. Greco, G.
Terracina, Discovering Loosely Structured Motifs From Biological Data Sets,
Proc. of the 21st Annual ACM Symposium on Applied Computing (SAC 2006), |