Review Comment:
The following paper presents a comprehensive overview of Linked Discovery (LD) frameworks. The paper is very instructive and suitable as introductory text for researchers, PhD students or practitioners, to get started on the topic.
The authors categorised and explained research works based on a general workflow of LD frameworks that includes several phases: configuration, pre-processing, matching and post-processing. The different methods used in each phase by the different frameworks are specified and works are compared based on these phases and on the methods used. Works are also compared based on reported evaluations.
An interesting aspect of this paper is also the list of requirements present in section 2. The authors specified five main requirements for LD frameworks. However, something that I am missing in this paper, and that I think it will be really valuable, is a discussion of which frameworks better fit each of these requirements and, under which situations or scenarios it will be more adequate to use one framework over the other ones. For example, if online LD is required, and effectiveness should be prioritised over efficiency, which of the listed frameworks should I use?
The other issue that I will recommend the authors to address (whenever possible) is the “?” ->unclear from publication. I will recommend contacting the authors of the unclear papers to clarify with them whether their framework fit under the selected criteria. This will make the paper more complete.
Regarding specific issues with respect to some sections:
Section 2: “Most entity resolution approaches focus on homogeneous datasets … By contrast the resources for LD can be heterogeneous and highly interrelated”. Any reference to support this criticism?
The LD process usually involves an ontology and instance -> and an instance
Section 2.2
Simple match techniques -> matching techniques
Semantic neighbourhood of an resource -> a resource
Section 3
Workflows which consists -> consist
Section 4.1
This statement obviously does not hold for the framworsks- > frameworks
This in strong contrast -> this is in strong contrast
It is mentioned in this section that dictionaries are used for ontology matching but not so much for linking instance data. Any suggestions or insights of why? Have dictionaries been tested and not found very useful for linking instance data? Any reference discussing this issue, or any insights from the authors will be desirable.
Table 3
(*) The legend “not in current release” is applicable to all frameworks that do not include MapReduce. I understand that the authors may be working on adding this element to LIMES. Indeed, it should be according to reference [17]. So if it is available at the time of publication, please add it in your table, otherwise, I will advice to modified the legend as “investigated [17] but not available as part of the current release”
“space tiling” is mentioned as filtering mechanism of LIMES but is not explained in the paper. Please add the corresponding description of this filtering mechanism.
Section 4.6
In this section it is mentioned the use of distributed computing as beneficial element to obtain high efficiency and scalability. However, distributed computing may not be necessary for all scenarios, particularly when the datasets are small. A discussion mentioning the situations/criteria in which distributed computing may be useful (e.g., size of the datasets bigger than X) will make this section more useful.
Section 4.9.
In this section it is mentioned that “the high potential of utilising existing links and mappings as well as other data sources or dictionaries as background knowledge has not yet been explored”. A similar comment about the use of dictionaries was made in section 4.1. I was wondering reading this why is this the case, and if there are already studies that have applied them and find them useful. If so, please add the corresponding references in here. Also, learning from links generated in one domain may not help to discover links for a different domain… A discussion here explaining the potential benefits/drawbacks of using these resources and how this can be a future research direction for LD will be desirable.
An additional brief discussion that may be interesting to add to the paper is an overview of the different areas where LD has been investigated, so that the list of studied frameworks can be positioned within the LD literature. For example LD over relational databases, online LD performed by semantic search / question answering applications, etc.
|