Conflicts to Harmony: Integrating Massive Data by Trustworthiness Estimation and Truth Discovery

NSF IIS-1319973
Principle Investigator

Students
  • Qi Li. PhD Student.
  • Yaliang Li. PhD Student.
  • Houping Xiao. PhD Student.
  • Chuishi Meng, PhD Student.
  • Fenglong Ma. PhD Student.
  • Ben Reid, Undergrad Student.
  • Stephanie Richter, Undergrad Student.
  • Tri Nguyen, Undergrad Student.
  • DeSean Abraham, Undergrad Student.

Award Information

This website is based upon work supported by the National Science Foundation under Grant No. IIS-1319973, collaborative with NSF IIS-1320617. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Project Background and Goals

Big data leads to big challenges, not only in the volume of data but also in its dynamics and variety. Multiple descriptions about the same set of objects or events from different sources unavoidably lead to data or information inconsistency. Then, among conflicting pieces of data or information, it is crucial to tell which data source is reliable or which piece of information is correct. Accurate information is referred to as the truth and the chance of a source providing accurate information is denoted as source reliability or trustworthiness. The objective of this project is to detect truths without supervision, by integrating source reliability estimation and truth finding. A unified framework is developed to capture complex trustworthiness factors in truth discovery from multiple conflicting data sources of heterogeneous, streaming and large-scale data.


Project Impact

This project makes tangible contributions to data integration, information understanding and decision making, and benefits many applications where critical decisions have to be made based on the correct information extracted from diverse sources. Research results of this project are integrated into course materials and projects, and into training students and new generation researchers, especially female and minority students.


Publications       More

KDD16

Houping Xiao, Jing Gao, Zhaoran Wang, Shiyu Wang, Lu Su, Han Liu. A Truth Discovery Approach with Theoretical Guarantee. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 2016, to appear. Acceptance Rate: 142/784 = 18.1%. [Paper in PDF]

KDD16

Houping Xiao, Jing Gao, Qi Li, Fenglong Ma, Lu Su, Yunlong Feng, Aidong Zhang. Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 2016, to appear. [Paper in PDF]

SoCG16

Hu Ding, Jing Gao, Jinhui Xu. Finding Global Optimum for Truth Discovery: Entropy Based Geometric Variance. International Symposium on Computational Geometry, Boston, MA, June 2016, 34:1-34:16.

TKDE

Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han. Conflicts to Harmony: A Framework for Resolving Conflicts in Heterogeneous Data by Truth Discovery. IEEE Transactions on Knowledge and Data Engineering, accepted, March 2016.

KDD16

Mengting Wan, Xiangyu Chen, Lance Kaplan, Jiawei Han, Jing Gao, Bo Zhao. From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 2016, to appear.

Survey

Yaliang Li, Jing Gao, Chuishi Meng, Qi Li, Lu Su, Bo Zhao, Wei Fan, Jiawei Han. A Survey on Truth Discovery. SIGKDD Explorations, 17(12): 1-16, December 2015. [Paper]

SenSys15

Chuishi Meng, Wenjun Jiang, Yaliang Li, Jing Gao, Lu Su, Hu Ding, Yun Cheng. Truth Discovery on Crowd Sensing of Correlated Entities. ACM International Conference on Embedded Networked Sensor Systems, Seoul, South Korea, November 2015, 169-182. Acceptance Rate: 27/132 = 20.5%.

SenSys15

Chenglin Miao, Wenjun Jiang, Lu Su, Yaliang Li, Suxin Guo, Zhan Qin, Houping Xiao, Jing Gao, Kui Ren. Truth Discovery on Crowd Sensing of Correlated Entities. ACM International Conference on Embedded Networked Sensor Systems, Seoul, South Korea, November 2015, 183-196. Acceptance Rate: 27/132 = 20.5%.

VLDB15

Qi Li, Yaliang Li, Jing Gao, Lu Su, Bo Zhao, Murat Demirbas, Wei Fan, Jiawei Han. A Confidence-Aware Approach for Truth Discovery on Long-Tail Data. International Conference on Very Large Data Bases, Kohala Coast, HI, August 2015, 8(4): 425-436.

KDD15

Yaliang Li, Qi Li, Jing Gao, Lu Su, Bo Zhao, Wei Fan, Jiawei Han. On the Discovery of Evolving Truth. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, August 2015, 675-684.

KDD15

Fenglong Ma, Yaliang Li, Qi Li, Minghui Qui, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Heng Ji, Jiawei Han. FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, August 2015, 745-754.

KDD15

Shi Zhi, Bo Zhao, Wenzhu Tong, Jing Gao, Dian Yu, Heng Ji, Jiawei Han. Modeling Truth Existence in Truth Discovery. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, August 2015, 1543-1552.

SDM15

Houping Xiao, Yaliang Li, Jing Gao, Fei Wang, Liang Ge, Wei Fan, Long Vu, Deepak Turaga. Believe It Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data. SIAM International Conference on Data Mining, Vancouver, Canada, April 2015, 397-405.

SDM15

Bowen Dong, Sihong Xie, Jing Gao, Wei Fan, Philip S. Yu. OnlineCM: Real-time Consensus Classification with Missing Values. SIAM International Conference on Data Mining, Vancouver, Canada, April 2015, 685-693.

KDD14

Sihong Xie, Jing Gao, Wei Fan, Deepak Turaga, Philip S. Yu. Class-Distribution Regularized Consensus Maximization for Alleviating Overfitting in Model Combination. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, August 2014, 303-312. Acceptance Rate: 151/1036 = 14.6%. [Paper in PDF] [BIBTEX]

IAAI14

Bahadir Aydin, Yavuz Yilmaz, Yaliang Li, Qi Li, Jing Gao, Murat Demirbas. Crowdsourcing for Multiple-Choice Question Answering. Annual Conference on Innovative Applications of Artificial Intelligence, Quebec City, Canada, July 2014, 2946-2953. [Paper in PDF]

SIGMOD14

Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, Jiawei Han. Resolving Conflicts in Heterogeneous Data by Truth Discovery and Source Reliability Estimation. ACM SIGMOD International Conference on Management of Data, Snowbird, UT, June 2014, 1187-1198. [Paper in PDF] [Code&Data in ZIP] [More Informationn] [BIBTEX]


Courses       More

Code & Dataset       More
  • CATD: "A Confidence-Aware Truth Discovery Approach" in [VLDB15]

Resources

Last updated: June 2014.