Review Comment:
The paper presents a survey that provides a systematic review of state of the art pertaining to the link maintenance problems in linked open data. The authors analysed related literature and identified approaches and algorithms responsible for detecting, fixing and updating broken/invalid links among and within datasets where they provide a categorisation of existing approaches and define open challenges.
The paper starts by introducing the work and giving an overview of related work. In section 2, the authors formally define the link maintenance problem. Their systematic literature review methodology is then described in Section 3, where the authors define a set of five research questions. In Section 4, the authors analyse the resulted papers from the systematic survey. In Section 5, the authors answer the introduced research questions and present a set of open challenges related to the link maintenance problem. Finally, the paper is concluded in Section 7. The paper also includes one appendix which presents more details about the paper collection process through the systematic survey process.
Suitability: I found the presented categorization as well as the open challenges are the strong points of the paper. The open challenges introduced by the paper is a good start point for PhD students.
Comprehensiveness: The presented work is covering most of the related work except for the last research question, more details will follow.
Readability: The paper is written in good English and well structured, which make it easy to follow the presented ideas.
Importance: The covered material is important to the Semantic Web community.
RQ-3 and RQ-4 are overlapping to some extent. In my opinion, having one RQ about linked data and the other about other data models will make things clearer.
I think that Section 4.8 misses a lot of related research in the field of Link discovery. Specially, the machine learning algorithms for automatic links finding (See for example [1-3], and many more). The author should consider such algorithms when answering RQ-05.
Section 4: I think adding one final category about hybrid systems that use more than one technique from the different categories introduced in Section 4 will complete the presented categorisation. See [5] for example. Also, the DSNotify framework introduced in the paper fall into this category.
I think that RQ-5 needs more investigation. I know already some automatic techniques for link maintenance; some baes on instance linking (e.g., [1-5]), some based on ontology matching (5-7) and many more.
[1] R. Isele, C. Bizer, Learning linkage rules using genetic programming, Proceedings of the 6th International Conference on Ontology Matching-Volume 814, pp. 13-24, 2011
[2] A.C.N. Ngomo, K. Lyko, Eagle: Efficient active learning of link specifications using genetic programming, Extended Semantic Web Conference, pp. 149-163, 2012
[3] WOMBAT - A Generalization Approach for Automatic Link Discovery by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann in 14th Extended Semantic Web Conference, Portoroz, Slovenia, 28th May - 1st June 2017
[5] F. M. Suchanek, S. Abiteboul, and P. Senellart. PARIS:Probabilistic Alignment of Relations, Instances, andSchema.PVLDB, 5(3), 2011
[6] Gal, Avigdor, et al. "Automatic ontology matching using application semantics." AI magazine 26.1 (2005): 21-21.
[7] Bühmann, Lorenz, Jens Lehmann, and Patrick Westphal. "DL-Learner—A framework for inductive learning on the Semantic Web." Journal of Web Semantics 39 (2016): 15-24.
Other comments:
Abstract: “... is considered an key process” → “... as a key process”
Introduction: Add a reference to the definition of RDF triple in section 2, the first time you mention it
Section 2, also in other places: “A RDF triple” → “an RDF triple”
Section 2: “an unique” → “a unique”
Section 2: “SameAs” → “sameAs”, also use \texttt{} for all property names
Section 4.3: “after a number of changes in an certain elapsed time” → “after a number of changes after a certain elapsed time”
Section 4.4, also in other places: “DBPedia” → “DBpedia”
Section 4.5: “According to Galani, Papastefanatos and Stavrakas (2016)” → “According to Galani et al. (2016)”
Section 4.5: “... changed; in addition, ...” → ““... changed. In addition, …”
Section 4.5: “benefits with” → “benefits from”
Section 4.5: “increasing the level of abstraction of changes is proportionally related to the quantity of these types of changes, resulting in a greater level of abstraction.” ???
Section 4.6: “a ontology” → “an ontology”
Section 4.6, also in other places: “an URL” → “a URL”
Section 4.6: “was included” → “has included”/ “includes”
Section 5: “state-of-the-art” → “state of the art”
Section 5: Define the A-box and the Tbox
Section 5: “The benefits of having a RDF dataset with no or very few broken links include the increase of the trust in the consistency of the dataset increases and ...” → remove “increases”
Section 5: “... matter, so if a triple is moved to the end of the file that stores the triples, it cannot be computed.” → ambiguous “it”
Section 5: use the same numbering for tasks in the list and the text
|
Comments
a very relevent missing related work
Unsupervised Link Discovery Through Knowledge Base Repair (http://svn.aksw.org/papers/2014/ESWC_COLIBRI/public.pdf) by Axel-Cyrille Ngonga Ngomo, Mohamed Ahmed Sherif und Klaus Lyko in Extended Semantic Web Conference (ESWC 2014)