CSE 703 Data Quality

Registration #22316

Instructor

Dr. Jan Chomicki, Professor. Office hours: W 2-4 pm.

Time and location

Tue 5:30-8:00pm, Davis 338A.

Piazza

Class page

Talks

,
DateTopicsPresenterBibliographic information
02/07/2017Conditional functional dependenciesNing Deng
  1. Wenfei Fan, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2) (2008)
  2. Wenfei Fan, Floris Geerts, Jianzhong Li, Ming Xiong: Discovering Conditional Functional Dependencies. IEEE Trans. Knowl. Data Eng. 23(5): 683-698 (2011)
02/14/2017Record matching and data repairing Ladan Golshanara
  1. Wenfei Fan, Shuai Ma, Nan Tang, Wenyuan Yu: Interaction between Record Matching and Data Repairing. J. Data and Information Quality 4(4): 16:1-16:38 (2014)
02/21/2017Data cleaning Meghana Ananth Gad, Poonam Kumari
  1. Venkatesh Ganti, Anish Das Sarma: Data Cleaning: A Practical Perspective. Synthesis Lectures on Data Management, Morgan & Claypool Publishers 2013, ISBN 9781608456772, pp. 1-85

02/28/2017Data quality of temporal records and streams. Deepti Chavan, Sushmita Sinha
  1. Furong Li, Mong-Li Lee, Wynne Hsu, Wang-Chiew Tan: Linking Temporal Records for Profiling Entities SIGMOD Conference 2015: 593-605
  2. Tamraparni Dasu, Rong Duan, Divesh Srivastava: Data Quality for Temporal Streams. IEEE Data Eng. Bull. 39(2): 78-92 (2016)

03/28/2017 Crowdsourcing Pruthvi Mulagala, Vaibhav Sinha
  1. Guoliang Li, Jiannan Wang, Yudian Zheng, Michael J. Franklin: Crowdsourced Data Management: A Survey. IEEE Trans. Knowl. Data Eng. 28(9): 2296-2319 (2016).
  2. Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, Reynold Xin: CrowdDB: answering queries with crowdsourcing. SIGMOD Conference 2011: 61-72.
04/04/2017 Crowdsourcing Algorithms for Entity Resolution.Jay Narendra Shah
  1. Norases Vesdapunt, Kedar Bellare, Nilesh N. Dalvi: Crowdsourcing Algorithms for Entity Resolution. PVLDB 7(12): 1071-1082 (2014).
04/04/2017Causality in databases George Gunner
  1. Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, Dan Suciu: The Complexity of Causality and Responsibility for Query Answers and non-Answers. PVLDB 4(1): 34-45 (2010).
  2. Alexandra Meliou, Wolfgang Gatterbauer, Joseph Y. Halpern, Christoph Koch, Katherine F. Moore, Dan Suciu: Causality in Databases. IEEE Data Eng. Bull. 33(3): 59-67 (2010).
04/18/2017 Crowdsourcing queries Prashanth Seralathan, Shreya Ravi Kumar
  1. Susan B. Davidson, Sanjeev Khanna, Tova Milo, Sudeepa Roy: Using the crowd for top-k and group-by queries. ICDT 2013: 225-236.
  2. Lei Chen, Cyrus Shahabi: Spatial Crowdsourcing: Challenges and Opportunities. IEEE Data Eng. Bull. 39(4): 14-25 (2016)
04/25/2017 Stream data cleaning Rajeev Vaswani
  1. Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu: SCREEN: Stream Data Cleaning under Speed Constraints. SIGMOD Conference 2015: 827-841
04/25/2017 Approximate entity extraction Anuradha Ashavatha Rao
  1. Wei Wang, Chuan Xiao, Xuemin Lin, Chengqi Zhang: Efficient approximate entity extraction with edit distance constraints. SIGMOD Conference 2009: 759-770.
05/02/2017 Traffic monitoring and management Himal Dwarakanath, Neeharika Nelaturu
  1. Nikolaos Zygouras, Nikos Zacheilas, Vana Kalogeraki, Dermot Kinane, Dimitrios Gunopulos: Insights on a Scalable and Dynamic Traffic Management System. EDBT 2015: 653-664.
  2. Nikolaos Panagiotou, Nikolas Zygouras, Ioannis Katakis, Dimitrios Gunopulos, Nikos Zacheilas, Ioannis Boutsis, Vana Kalogeraki, Stephen Lynch, Brendan O'Brien: Intelligent Urban Data Monitoring for Smart Cities. ECML/PKDD (3) 2016: 177-192.
05/02/2017 Data fusion Arun Sharma
  1. Jens Bleiholder, Felix Naumann: Data fusion. ACM Comput. Surv. 41(1): 1:1-1:41 (2008)
05/02/2017 Finding related tables Shad Ullah Khan
  1. Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Y. Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu: Finding related tables. SIGMOD Conference 2012: 817-828
05/09/2017 Cleaning Urban Data Omkar Guruprasad Neogi
  1. Juliana Freire, Aline Bessa, Fernando Chirigati, Huy T. Vo, Kai Zhao: Exploring What not to Clean in Urban Data: A Study Using New York City Taxi Trips. IEEE Data Eng. Bull. 39(2): 63-77 (2016)
05/09/2017 Querying Raw Data Files Mythri Jonnavittula, Barath Eswer Nagasubramaniyan
  1. Ioannis Alagiannis, Renata Borovica, Miguel Branco, Stratos Idreos, Anastasia Ailamaki: NoDB: efficient query execution on raw data files. SIGMOD Conference 2012: 241-252.
05/09/2017 Data errorsSenthil Kumar Laguduva Yadindra Kumar
  1. Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang: Detecting Data Errors: Where are we and what needs to be done? PVLDB 9(12): 993-1004 (2016).

Resources

To access the papers in the UB digital library, you may need to use the proxy server and reload the appropriate page:
 http://libweb.lib.buffalo.edu/help/help.asp?ID1=442
Many papers can be googled on the author pages or retrieved from
dblp
.

Workload

  1. Prepare and present a talk based on one or more papers from the current computer science literature (I will distribute the papers and help with the presentation).
  2. The presentations will be problem-oriented, not paper-oriented. It may be necessary to read more than one paper and/or split the work with another presenter. For example, one presenter may give a general introduction to the area and the other present a specific approach in depth.
  3. Prepare a report based on the same material.
  4. Attend all the classes and participate in the discussions.
  5. There may also be presentations by the instructor and/or invited speakers.

Prerequisites

Required background: a course in databases. Some knowledge of logic or knowledge representation is helpful. .

Grading

The seminar is graded S/U and can be taken for 3 credits. An implementation project is a possibility: see the instructor.

Topics