Classification is the task of learning a model for the target discrete variable as a function of the explanatory variables.
Text Classification (also known as text categorization, or topic spotting) is the special case where
the target variable is a thematic category and the explanatory variables are the terms occurring in a text.
That is, the aim of text classification is that of assigning natural language texts to one or more thematic
categories on the basis of their contents.
A number of machine learning methods have been proposed in the last years, including k-NN,
Probabilistic Bayesian, Neural Networks and SVMs.
In a different line, rule learning algorithms, such as Ripper and C4.5, have become a successful
strategy for classifier induction.
Rule-based classifiers provide the desirable property of being readable and, thus, easy for people
to understand (and, possibly, modify).
In this website you can download the prototypes of two rule induction methods
(rule-based classifiers) for text classification, namely,
Olex-GA and
OlexGreedy.
The former is based on a Genetic Algorithm, while the latter on a greedy approach.
Please, see the
pubblication
section for more details about the proposed methods and
have a look at the
Olex-GA license and at the
OlexGreedy license
for a full description of the license terms and the usage conditions.