L-SME

Loosely-Structured Motif Extractor

The discovery of information encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually represented by patterns frequently occurring in the sequences, also called motifs. In fact, motif discovery has received much attention in the literature, and several algorithms have already been proposed that, in particular, deal with motifs exhibiting some kinds of “regular structure”. Motivated by biological observations, this work focuses on more general classes of motifs where several “exceptions” may be tolerated in pattern repetitions. To mine these loosely structured motifs, two algorithms can exploited: an exhaustive version using data structures specifically designed to deal with pattern variabilities, and a randomized version presenting a linear-time (and space) requirement in the dimension of the input database, and a theoretical guarantee on its performances. Despite the ability of mining very complex kinds of patterns, performance allow a genome-wide applicability of the algorithms.

To access the system click here

If the link does not work, pleas contact us

Main Reference (please cite this):

F. Fassetti, G. Greco, G. Terracina, Mining Loosely Structured Motifs from Biological Data, IEEE Transaction on Knowledge and Data Engineering (TKDE).  20(11), 1472-1489. 2008. IEEE Computer Society, USA. (http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.65)

Other references:

F. Fassetti, G. Greco, G. Terracina, Discovering Loosely Structured Motifs From Biological Data Sets, Proc. of the 21st Annual ACM Symposium on Applied Computing (SAC 2006),
151-155, Dijon, France, 2006, ACM Press.