Text Classification

Classification is the task of learning a model for the target discrete variable as a function of the explanatory variables.

Text Classification (also known as text categorization, or topic spotting) is the special case where the target variable is a thematic category and the explanatory variables are the terms occurring in a text. That is, the aim of text classification is that of assigning natural language texts to one or more thematic categories on the basis of their contents.

A number of machine learning methods have been proposed in the last years, including k-NN, Probabilistic Bayesian, Neural Networks and SVMs. In a different line, rule learning algorithms, such as Ripper and C4.5, have become a successful strategy for classifier induction. Rule-based classifiers provide the desirable property of being readable and, thus, easy for people to understand (and, possibly, modify).

In this website you can download:

  • The prototype of the evolutionary rule induction method Olex-GA
  • The prototype of the greedy-based rule induction method OlexGreedy
  • A filter for text feature selection
Please, see the publication section for more details about the proposed methods and have a look at the Olex-GA license and at the OlexGreedy license for a full description of the license terms and the usage conditions.