Overview | Olex-GA standalone | Olex-GA for Weka | Documentation | License
Overview | Download | Publications | License
About the datasets | Download

Text Classification

Classification is the task of learning a model for the target discrete variable as a function of the explanatory variables.

Text Classification (also known as text categorization, or topic spotting) is the special case where the target variable is a thematic category and the explanatory variables are the terms occurring in a text. That is, the aim of text classification is that of assigning natural language texts to one or more thematic categories on the basis of their contents.

A number of machine learning methods have been proposed in the last years, including k-NN, Probabilistic Bayesian, Neural Networks and SVMs. In a different line, rule learning algorithms, such as Ripper and C4.5, have become a successful strategy for classifier induction. Rule-based classifiers provide the desirable property of being readable and, thus, easy for people to understand (and, possibly, modify).

In this website you can download the prototypes of two rule induction methods (rule-based classifiers) for text classification, namely, Olex-GA and OlexGreedy. The former is based on a Genetic Algorithm, while the latter on a greedy approach.

Please, see the pubblication section for more details about the proposed methods and have a look at the Olex-GA license and at the OlexGreedy license for a full description of the license terms and the usage conditions.