Mining Reliable Information from Crowdsourced Data

NSF IIS-1553411
Principle Investigator

Students
Award Information

This website is based upon work supported by the National Science Foundation under Grant No. IIS-1553411. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Project Summary

With the proliferation of mobile devices and social media platforms, any person can publicize observations about any activity, event or object anywhere and at any time. The confluence of these enormous crowdsourced data can contribute to an inexpensive, sustainable and large-scale decision system that has never been possible before. Such a system could vastly improve the efficiency and cost of transportation, healthcare, and many other applications. The main obstacle in building such a system lies in the problem of information veracity, i.e., individual users might provide unreliable or even misleading information. This project identifies important research questions in the task of mining reliable information from noisy and unreliable crowdsourced data, and pursues an integrated research and education plan to address these questions. Through integrating data from various sources, this project addresses information veracity, which will benefit the many applications where crowdsourced data are ubiquitous but veracity can be suspect.

In particular, this project develops novel methods to mine reliable information by taking into consideration various properties of crowdsourcing: 1) Crowdsourcing platforms collect users' observations about certain objects. Other valuable information sources, such as spatial-temporal, user influence, and textual data, are leveraged to effectively detect reliable information from these observations. 2) Effective privacy protection and budget allocation mechanisms are designed to better motivate active crowdsourcing. These investigations are integrated with the exploration of both theoretical and practical aspects of the proposed methods. From the theoretical perspective, fundamental questions regarding the confidence in the estimated reliability and the convergence of the proposed methods are explored. From the practical perspective, the proposed methods are adapted to tackle challenging problems in various applications such as transportation, healthcare and education to enable new insights into these domains. In addition to the research advances, this project contributes to educational innovation, as the proposed methods are applied to educational methodologies such as peer assessment and question answering.


Publications       More

WSDM17

Yaliang Li, Nan Du, Chaochun Liu, Yusheng Xie, Wei Fan, Qi Li, Jing Gao, Huan Sun. Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts. ACM International Conference on Web Search and Data Mining, Cambridge, UK, February 2017, 253-261. Acceptance Rate: 80/505 = 16%.

CIKM16

Hengtong Zhang, Qi Li, Fenglong Ma, Houping Xiao, Yaliang Li, Jing Gao, Lu Su. Influence-Aware Truth Discovery. ACM International Conference on Information and Knowledge Management, Indianapolis, IN, October 2016, 851-860. Acceptance Rate: 165/935 = 17.6%.

TBD

Yaliang Li, Chaochun Liu, Nan Du, Wei Fan, Qi Li, Jing Gao, Chenwei Zhang, Hao Wu. Extracting Medical Knowledge from Crowdsourced Question Answering Website. IEEE Transactions on Big Data, accepted, September 2016.

KDD16

Houping Xiao, Jing Gao, Zhaoran Wang, Shiyu Wang, Lu Su, Han Liu. A Truth Discovery Approach with Theoretical Guarantee. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 1925-1934. Acceptance Rate: 142/784 = 18.1%. [Paper in PDF]

KDD16

Houping Xiao, Jing Gao, Qi Li, Fenglong Ma, Lu Su, Yunlong Feng, Aidong Zhang. Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, August 2016, 1935-1944. Acceptance Rate: 142/784 = 18.1%. [Paper in PDF]


Courses       More

Last updated: April 2017.