D1.1- Review of models and systems for information integration

Short Description:


A Data Integration System is constituted by three main components: a global schema, a source schema, comprising the schemas of all the sources, and a mapping between the two. There exist two main approaches for specifying the mapping: in the local-as-view (LAV) approach the source structures are defined as views over the global schema; on the contrary, in the global-as-view (GAV) approach each global element is defined in terms of a view over the source schemas. The problem of query processing is to find efficient methods for answering queries posed to the global schema on the basis of the data stored at sources. In LAV there exist two approaches to query processing: by query rewriting, in which one tries to compute a rewriting of the query in terms of the views and then evaluates such a rewriting, and by query answering, in which one aims at directly answering the query based on the view extensions. In GAV, existing systems deal with query processing by simply unfolding each global concept in the query with its definition in terms of the sources. In this report, we survey the most important query processing algorithms proposed in the literature for LAV, and we describe the principal GAV data integration systems and the form of query processing they adopt. Furthermore, we review recent studies showing that, in the presence of incomplete data sources and integrity constraints on the global schema, query processing in GAV is harder than simple unfolding. Finally, we analyze the main approaches to query processing in the presence of inconsistent data with respect to the global schema.