MetaBJTool - A package for metaquery evaluation
Introduction
The MetaBJTool prototype
includes the implementation of the various stages
in which the whole metarule
mining process is divided. The prototype
consists of two principal modules:
·
Management of the communication
·
Algorithms for computing a metaquery
Our prototype accepts a metaquery provided
in
XML format.
Results
are sent back
in XML
format as well.
Communication Protocol
The proposed communication protocol allows the interfacing between external modules (GUI, remote clients, etc.) and the “heart” of the prototype consisting of the metaquery computation module. The instantiation algorithm, which we will call MEE, takes in input a metarule, coupled with a set of parameters and constraints.
This information,
encoded in
XML
format,
is checked with respect to
the
Metarules Interchange Format.
The answer set to the
metarule is computed.
Then, this
set
is output in
XML format as well.
Here is a sketch of the MetaBJTool architecture:
The structure of the MIFIn and MIFOut formats are according to report D3.R2.
For instance a document in MIFIn format is as follows:
<?xml version="1.0" encoding="UTF-8"
?>
-
<metaquery
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="MIFIn.xsd"
support="50.00"
confidence="75.00">
|
MetaBJTool Internals
Our
prototype consists of four stages
·
Access to
the database
·
Overlap
stage
·
Compute
algorithm
·
Evaluation
algorithm
Overlap stage
Among the
various
steps which lead to the acquisition of new patterns, the preprocessing
and the analysis of the data play
an important role in the whole mining system.
Aim of this phase is to discard information considered useless and that can slow down the following steps. It is necessary to suitably choose the couples of attributes which could participate within an instantiated rule having sufficient values for support and confidence.
The same variables
appearing
in different
literal patterns
are usually
bound to attributes of different relations but with the same type.
But
also in presence of the same data types it
is possible
find couples of table columns
having very little
values in common (overlapping
values). It is the reason for what the concept of overlap was introduced in formal way.
The Instantiation step will generate rules considering only couples of attributes having enough overlapping values.
Instantiation
algorithm
The purpose of this step is to instantiate each metapattern ( R_1(X_1), ... R_n(X_n) ) of a given metaquery. At the end of the process, each R_i will be bound to a database relation and each variable to an attribute of some relation. To reduce computation times, the instantiation algorithm exploits the well-known technique of the forward checking with backjumping. This algorithm in fact allows perform the exhaustive search of all the possible solutions inside the space of the attributes of the database like the most traditional backtracking, although the search space is reduced.
The
indices evaluation algorithm
Package Interface
Our prototype can be
tested on
the
net using a set of a few simple procedures which allow to
remotely
start the process of computing a metaquery.
2. boolean setValues(double overlap, int levels)
3. void setDataInput(File f)
4.
boolean overlapCompute(String nomeFileLoad)
6. public void upSupportCompute()
7. public void supportCompute()
8. void getResult(File f)
9. String progress()
1. Set the parameters useful for the connection to the data source.
2. This function allows to set the overlap threshold value and the number of intervals in which numeric columns scopes have to be discretized.
3.
This function allows
to submit
an XML-file
in MIFIn format
containing a metaquery to be
calculated.
4. Starts the overlap process. It is possible to specify where to store results.
5.
Starts the
instantiation computation
for the last submitted metarules,
on the basis of the
overlap
analysis
stored in the file indicated as parameter.
6. Starts the process of computing the up-support for the current instantiation set. Not valid rules are filtered out from the current result set.
7. Computes support values for the current rules set. Not valid rules are filtered out from the current result set.
8. Produces the current result set in MIFOut format.
9.
Get
informations about the current progress of the computation.