Consistent Answers to SQL Queries

Project Award Number IIS-0119186


Principal Investigator

Jan Chomicki
Department of Computer Science and Engineering, University at Buffalo
Buffalo, NY 14260
716-645-3180 ext.103
716-645-3464
chomicki@cse.buffalo.edu
http://www.cse.buffalo.edu/~chomicki


Collaborator

Leopoldo Bertossi
School of Computer Science, Carleton University
Ottawa, Canada
bertossi@scs.carleton.ca


Collaborator

Jerzy Marcinkowski
Institute of Informatics
, Wroclaw University
Wroclaw, Poland
Jerzy.Marcinkowski@ii.uni.wroc.pl

Keywords

Inconsistency, Integrity Constraints, Query Languages, Data Integration, SQL.

Project Summary

As the amount of information available in online data sources explodes, there is a growing concern about the consistency and quality of answers to user queries. This project addresses the issue of using logical integrity constraints to gauge the consistency and quality of query answers. Although it is impractical to enforce global integrity constraints across different data sources and correct integrity violations by updating individual sources, integrity constraints capture important semantic properties of data. This project studies the formal notions of database repair and consistent query answer: a consistent answer is true in every minimal repair of the database. The information about answer consistency serves as an important indication of its quality and reliability.

A variety of procedures for computing consistent query answers in the context of the relational data model and SQL are developed, and their computational complexity analyzed. The procedures exploit the properties of specific subsets of SQL and specific classes of integrity constraints. By providing information about query answer consistency, such procedures will enhance the functionality of existing DBMS in a non-intrusive way, particularly in the context of data integration applications.

Publications and Products

[1] Jan Chomicki, J. Marcinkowski. Minimal-Change Integrity Maintenance Using Tuple Deletions. Information and Computation, to appear. Early version: CoRR paper cs.DB/0212004.

[2] Jan Chomicki, J. Marcinkowski, S. Staworko. Computing Consistent Query Answers using Conflict Hypergraphs. Proc. Thirteenth Conference on Information and Knowledge Management (CIKM), 2004, to appear. Short version in IIWeb'04.

[3] Jan Chomicki, J. Marcinkowski. On the Computational Complexity of Minimal-Change Integrity Maintenance in Relational Databases. To appear in Inconsistency Tolerance, L. Bertossi, A. Hunter, T. Schaub, editors, Springer, 2005, 32pp.

[4] L. Bertossi, Jan Chomicki. Query Answering in Inconsistent Databases. In Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake, editors, Springer-Verlag, 2003, 41pp.

[5] M. Arenas, L. Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. Proc. ACM Symposium on Principles of Database Systems (PODS), 1999, pp. 68-79.

[6] M. Arenas, L. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, Jeremy Spinrad. Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science 296(3), 2003, pp. 405-434. Early version: ICDT'01.

[7] M. Arenas, L. Bertossi, Jan Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Practice of Logic Programming, 3(4&5), July/September 2003, pp. 393-424. arXiv.org paper cs.DB/0207094. Early version: FQAS'00.

[8] L. Bertossi, Jan Chomicki, Alvaro Cortes, Claudio Gutierrez. Consistent Answers from Integrated Data Sources. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2002, LNCS 2522, pp.71-85.

Project Impact

Human Resources. Research assistants at the University at Buffalo: Slawomir Staworko (Fall 03, Summer 04), Priyanka Kasliwal (Summer 04).

Education and curriculum development. The PI has incorporated the results of the project in the syllabus of an advanced graduate database course on Data Integration. The course is now offered at the University at Buffalo every academic year. The PI also gave several talks about the project.

Other. The PI is one of the editors of the book "Logics for Emerging Applications of Databases," published by Springer-Verlag in August 2003. He presented a keynote talk on consistent query answering at the IFIP TC-11 WG 11.5 Working Conference on Integrity and Internal Control in Information Systems, Nov. 2003, Lausanne, Switzerland. The notion of consistent query answer proposed in the course of this research has been applied and extended by research groups at 6 European and 2 Canadian universities.

Goals, Objectives and Targeted Activities

The PI, together with S. Staworko (a Ph.D. student supported by the current project) has extended the algorithms proposed in [1] to form the foundation of HIPPO - a front-end system for the computation of consistent query answers. The system uses PostgreSQL as a backend and scales up to large databases. It handles projection-free queries (projection leads to co-NP-completeness [1]) and denial constraints. The PI and S. Staworko have developed optimization techniques that reduce the number of backend accesses. Those techniques, the implementation of HIPPO, and some experimental results are described in [2]. The current version of HIPPO is operational and has been demonstrated at the IDM'03 workshop and EDBT'04. HIPPO will be made available to the public.

Together with J. Marcinkowski, Wroclaw University, Poland, the PI obtained a complete characterization of the computational complexity of consistent query answers (and related problems) for first-order queries and integrity constraints that are denial constraints, functional dependencies, and inclusion dependencies. The results are reported in [1] and [3] (a survey of the area).

Current work in the project includes: computing consistent query answers in the presence of preferences, dealing with more general queries and constraints, repairing XML documents, experimental comparisons with other systems that compute consistent query answers, generation of synthetic inconsistent databases.

Area Background

The problem of obtaining consistent information from inconsistent databases has been studied in the context of multiple-source data integration [9], disjunctive databases [10], and conflict resolution [11]. For a more complete discussion, see the references [4] and [5]. Our approach is unique in that it is not limited to specific application scenarios or particular classes of constraints or queries, and addresses relevant computational problems in detail.

Area References

[9] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. Proc. NGITS'99, Springer-Verlag, LNCS 1649, pp. 138--158.

[10] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. Proc. IEEE International Conference on Data Engineering, 1995.

[11] J. Lin and A. O. Mendelzon. Merging Databases under Constraints. International Journal of Cooperative Information Systems, 7(1):55--76, 1996.

Potential Related Projects

Data Integration and Cleaning, Query Answering Using Distributed Data Sources, Materialized Views, Semantic Query Optimization.

Project Websites


CQA: Consistent Query Answers.