D1.1-
Review of models and systems for information integration
Short Description:
A Data Integration System is constituted by three main
components: a
global schema, a source schema, comprising the schemas of all the
sources, and a mapping between the two. There exist two main approaches
for specifying the mapping: in the local-as-view (LAV) approach the
source structures are defined as views over the global schema; on the
contrary, in the global-as-view (GAV) approach each global element is
defined in terms of a view over the source schemas. The problem of
query processing is to find efficient methods for answering queries
posed to the global schema on the basis of the data stored at sources.
In LAV there exist two approaches to query processing: by query
rewriting, in which one tries to compute a rewriting of the query in
terms of the views and then evaluates such a rewriting, and by query
answering, in which one aims at directly answering the query based on
the view extensions. In GAV, existing systems deal with query
processing by simply unfolding each global concept in the query with
its definition in terms of the sources. In this report, we survey the
most important query processing algorithms proposed in the literature
for LAV, and we describe the principal GAV data integration systems and
the form of query processing they adopt. Furthermore, we review recent
studies showing that, in the presence of incomplete data sources and
integrity constraints on the global schema, query processing in GAV is
harder than simple unfolding. Finally, we analyze the main approaches
to query processing in the presence of inconsistent data with respect
to the global schema.