Consistent Answers to SQL Queries

Project Award Number IIS-0119186


Principal Investigator

Jan
Chomicki
Department of Computer Science and Engineering
University at Buffalo
201 Bell Hall
Buffalo
NY
14260
716-645-3180 ext.103
716-645-3464
chomicki@cse.buffalo.edu
http://www.cse.buffalo.edu/~chomicki


Collaborator

Leopoldo Bertossi
School of Computer Science
Carleton University
Ottawa
Canada
bertossi@scs.carleton.ca


Collaborator

Jerzy Marcinkowski
Institute of Informatics
Wroclaw University
Wroclaw
Poland
Jerzy.Marcinkowski@ii.uni.wroc.pl

Keywords

Inconsistency
Integrity Constraints
Query Languages
Data Integration
SQL

Project Summary

As the amount of information available in online data sources explodes, there is a growing concern about the consistency and quality of answers to user queries. This project addresses the issue of using logical integrity constraints to gauge the consistency and quality of query answers. Although it is impractical to enforce global integrity constraints across different data sources and correct integrity violations by updating individual sources, integrity constraints capture important semantic properties of data. This project studies the formal notions of database repair and consistent query answer: a consistent answer is true in every minimal repair of the database. The information about answer consistency serves as an important indication of its quality and reliability.

A variety of procedures for computing consistent query answers in the context of the relational data model and SQL are developed, and their computational complexity analyzed. The procedures exploit the properties of specific subsets of SQL and specific classes of integrity constraints. By providing information about query answer consistency, such procedures will enhance the functionality of existing DBMS in a non-intrusive way, particularly in the context of data integration applications.

Publications and Products

[1] Jan Chomicki, J. Marcinkowski. Minimal-Change Integrity Maintenance Using Tuple Deletions. Submitted for journal publication, December 2002, 26pp. CoRR paper cs.DB/0212004.

[2] M. Arenas, L. Bertossi, Jan Chomicki, Xin He, Vijay Raghavan, Jeremy Spinrad. Scalar Aggregation in Inconsistent Databases. Theoretical Computer Science 296(3), 2003, pp. 405-434.

[3] M. Arenas, L. Bertossi, Jan Chomicki. Scalar Aggregation in FD-Inconsistent Databases. Proc. International Conference on Database Theory (ICDT), Springer-Verlag, 2001, LNCS 1973, pp. 39-53.

[4] L. Bertossi, Jan Chomicki. Query Answering in Inconsistent Databases. To appear in Logics for Emerging Applications of Databases, J. Chomicki, R. van der Meyden, G. Saake, editors, Springer-Verlag, 2003, 41pp.

[5] M. Arenas, L. Bertossi, Jan Chomicki. Consistent Query Answers in Inconsistent Databases. Proc. ACM Symposium on Principles of Database Systems (PODS), 1999, pp. 68-79.

[6] M. Arenas, L. Bertossi, Jan Chomicki. Specifying and Querying Database Repairs Using Logic Programs with Exceptions. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2000, pp.27-41.

[7] M. Arenas, L. Bertossi, Jan Chomicki. Answer Sets for Consistent Query Answering in Inconsistent Databases. Theory and Practice of Logic Programming, 3(4&5), July/September 2003, pp. 393-424. arXiv.org paper cs.DB/0207094.

[8] L. Bertossi, Jan Chomicki, Alvaro Cortes, Claudio Gutierrez. Consistent Answers from Integrated Data Sources. Proc. International Conference on Flexible Query Answering Systems, Springer-Verlag, 2002, LNCS 2522, pp.71-85.

Project Impact

Human Resources. Two research assistants at the University at Buffalo were funded by the project for the total of one year.

Education and curriculum development. The PI has incorporated the results of the project in the syllabus of an advanced graduate database course on Data Integration. The course has become permanent and is offered at the University at Buffalo every academic year. It has been taught twice so far, in fall 2001 and spring 2003, with the total enrollment of 28. The PI also gave talks about the project at a Dagstuhl seminar and 7 universities in U.S., Canada, and Europe.

Other. The PI is one of the editors of the book "Logics for Emerging Applications of Databases," to be published by Springer-Verlag. He also co-organized the IJCAI 2001 Workshop on Inconsistency in Data and Knowledge and served on the program committee of the FLOC 2002 Workshop on Paraconsistent Computational Logic. He has been invited to present a talk at the workshop on Logic-based Methods for Information Integration, Aug. 2003, Vienna, Austria, and a keynote talk at the Conference on Integrity and Internal Control in Information Systems, Nov. 2003, Lausanne, Switzerland. The notion of consistent query answer proposed in the course of this research has been applied and extended by research groups at 4 European and 1 Canadian universities.

Goals, Objectives and Targeted Activities

Together with J. Marcinkowski, Wroclaw University, Poland, the PI obtained a complete characterization of the computational complexity of consistent query answers (and related problems) for first-order queries and integrity constraints that are denial constraints, functional dependencies, and inclusion dependencies. They have studied the impact of the number of constraints, as well as the type and the size of the query on the complexity of this problem. They have identified several new tractable cases (compared to [5]) and proposed new algorithms. The results are reported in the paper [1], currently under submission to a journal. The PI has extended the algorithms proposed in [1] to form the foundation of HIPPO - a front-end system for the computation of consistent query answers. The first version of HIPPO, developed by Slawomir Staworko (a Ph. D. student supported by the current project), is operational. It uses the PostgresSQL back-end. Some preliminary experimental results have been obtained. Current work includes: optimizing HIPPO, broadening the class of queries that it can handle, experimental comparison with other approaches to the computation of consistent query answers.

In collaboration with several researchers, the PI has studied the computation of consistent query answers in the presence of distributed data sources [8]. The PI expanded (in collaboration with several researchers) the conference papers [3] and [6] to journal versions. The paper [2] studies the issue of computing consistent query answers for aggregation queries and functional dependencies. The paper [7] considers logical specification of repairs as answer sets of logic programs. Together with L. Bertossi, Carleton University, Ottawa, Canada, the PI has completed a survey [4] summarizing the up-to-date research on computing consistent query answers.

Area Background

The problem of obtaining consistent information from inconsistent databases has been studied in the context of multiple-source data integration [9], disjunctive databases [10], and conflict resolution [11]. For a more complete discussion, see the references [5] and [2]. Our approach is unique in that it is not limited to specific application scenarios or particular classes of constraints or queries, and addresses detailed computational problems.

Area References

[9] A. Motro. Multiplex: A Formal Model for Multidatabases and Its Implementation. Proc. NGITS'99, Springer-Verlag, LNCS 1649, pp. 138--158.

[10] S. Agarwal, A.M. Keller, G. Wiederhold, and K. Saraswat. Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases. Proc. IEEE International Conference on Data Engineering, 1995.

[11] J. Lin and A. O. Mendelzon. Merging Databases under Constraints. International Journal of Cooperative Information Systems, 7(1):55--76, 1996.

Potential Related Projects

Data Integration and Cleaning, Query Answering Using Distributed Data Sources, Materialized Views, Semantic Query Optimization.

Project Websites


CQA: Consistent Query Answers.

Main project website.