Review Comment:
The paper is an extension of [2] and aims to "identify a set of meaningful, efficient, and non-redundant [(RDF) graph] measures, for the goal of describing RDF graph topologies more accurately". The authors further define that an "efficient" measure should be discrete with respect to other measures and should add an additional value in describing an RDF graph (in comparison to other RDF graphs). The authors rely on two types of majors: general graph measures taken from [2] and RDF graph measures from [3]. The authors introduce the different measures before answering three research questions:
- Which measures does the set M' of efficient measures for characterizing RDF graphs comprise?
- Which subset of M' (M''_c) characterizes RDF graphs of a certain domain c?
- Which of the measures in M' show the best performance in classification tasks that aim to discriminate RDF datasets with respect to their domains.
The authors answer these three questions in an empirical way using 280 RDF datasets of the LOD cloud. They find 29 out of 54 evaluated measures to be efficient (i.e., these measures are within M'). 13 of these measures are identified to have an impact on distinguishing datasets of different domains from each other. In addition, the most important features per domain are determined.
=== Positive aspects
+ The research questions the paper focuses on are very important for various research fields related to RDF graphs.
+ The approach the authors apply makes sense to me. Of course, some details might be arguable. However, it is a complex work and this automatically leads to a lot of different possibilities and decisions that have to be made by the authors.
+ The article is an extension of [2]. However, the authors clearly distinguish the two articles from each other and list the extensions made.
+ The insights the authors point out are valuable and can become important for the community.
+ It is very good that the authors exclude the three domains that had a low number of RDF datasets from their per-domain experiments.
+ The authors made the detailed analysis results as well as the framework they used for the analysis available.
+ The paper is well written.
+ The paper is a revision of swj2446. From my point of view, the authors fixed all the issues that have been identified or gave good arguments why they won't follow a reviewer's suggestion.
=== Writing Style
The paper is well written.
- Page 6, right column, line 33: It is preferable to use footnotes at the end of the sentence (unless there are several footnotes within a single sentence). At the moment, an (inattentive) reader could misunderstand the "$C_d$\footnote" as $C_d^6$. I would suggest writing it as "$C_d$.\footnote".
- Table 4, footnote: "Compressed archive containing multiple RDF files which need to be merged" --> Either there is a comma missing ("... files, which need ...") or "that" should be used ("... files that need"). In this case, I would suggest the latter solution since the relative clause is important to define the word "files".
- Table 5: the table has a slightly different formatting than the others (i.e., the top and bottom lines are missing).
|