Large-scale Data Integration Using Graph Probabilistic Dependencies (GPDs)
Conference item
Authors | Zada, Muhammad Sadiq Hassan, Yuan, Bo, Anjum, Ashiq, Azad, Muhammad Ajmal, Khan, Wajahat Ali and Reiff-Marganiec, Stephan |
---|---|
Abstract | The diversity and proliferation of Knowledge bases have made data integration one of the key challenges in the data science domain. The imperfect representations of entities, particularly in graphs, add additional challenges in data integration. Graph dependencies (GDs) were investigated in existing studies for the integration and maintenance of data quality on graphs. However, the majority of graphs contain plenty of duplicates with high diversity. Consequently, the existence of dependencies over these graphs becomes highly uncertain. In this paper, we proposed graph probabilistic dependencies (GPDs) to address the issue of uncertainty over these large-scale graphs with a novel class of dependencies for graphs. GPDs can provide a probabilistic explanation for dealing with uncertainty while discovering dependencies over graphs. Furthermore, a case study is provided to verify the correctness of the data integration process based on GPDs. Preliminary results demonstrated the effectiveness of GPDs in terms of reducing redundancies and inconsistencies over the benchmark datasets. |
Keywords | data integration; information retrieval; graph probabilistic dependencies |
Year | 2020 |
Journal | 2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT) |
Publisher | IEEE |
Digital Object Identifier (DOI) | https://doi.org/10.1109/bdcat50828.2020.00028 |
Web address (URL) | http://hdl.handle.net/10545/625607 |
http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
hdl:10545/625607 | |
ISBN | 9780738123967 |
File | File Access Level Open |
File | File Access Level Open |
Publication dates | 28 Dec 2020 |
Publication process dates | |
Deposited | 08 Feb 2021, 15:54 |
Accepted | 30 Oct 2020 |
Rights | Attribution-NonCommercial-ShareAlike 4.0 International |
Contributors | University of Derby and University of Leicester |
https://repository.derby.ac.uk/item/93612/large-scale-data-integration-using-graph-probabilistic-dependencies-gpds
Download files
324
total views0
total downloads1
views this month0
downloads this month