Review Comment:
The paper presents a detailed analysis of the learning capabilities of the family of RDF2vec embeddings.
ORIGINALITY
The paper is an extension of three formerly published workshop and conference papers by the same authors. In particular, the new parts are the hypotheses in section 5 and their discussion in section 7.4 along with a comprehensive suite of experiments. While parts were published before, the paper as a whole is an interesting summary of a very long and prominent research endeavor, and thus I think it is suitable for publication in an archival journal.
SIGNIFICANCE OF THE RESULTS
The results are interesting both from an applicational and (semi-)theoretical perspective. The paper can be used by a practitioner to make an informed choice on which RDF2vec variant to choose. The DLCC and the benchmarks arising from it are a very valuable contribution enabling the comparison of different embeddings in a systematic manner on multiple semantic dimensions instead of on entangled, downstream datasets. Especially due to this contribution, I expect the paper to be highly cited.
CORRECTNESS
I have no major complaints regarding the correctness, however, there are some aspects that require clarification or extending the proposed framework:
* It is unclear to me why bring the DLs into the picture at all. Are the proposed constructors exhaustive in any sense relevant to DLs, e.g., are they all possible expressions of depth 1 that could be constructed using ALC?
* This is very subjective, but I also think DLs are not a suitable formalism since you are dealing with graphs, not ontologies. There is no guarantee that the graph even has the notion of the top concept. It seems to me that SPARQL BGPs would be a much better formalism: you would not go out of the graph world, you would not introduce possibly foreign concepts, and you would not need to make a kind-of weird-looking mix of DLs and low-level RDF in Eq. (13), and you would not need to explain to me how do you use DL expressions to query a SPARQL endpoint in Section 6.1.
* I am not convinced "hypothesis" is a good term for what you have proposed in the paper. For example, compare Hypothesis 6a with the hypotheses posed in Section 3.1 of [1]. To me, it sounds quite similar to some of them. I would suggest either making them more formal or using a different, less loaded word.
* Section 6.2: Why those particular 6 classifiers? I suspect each is from a different family, but there are decision trees and random forests, and nevertheless, the argument should be given explicitly instead of being left for the reader to guess.
* Section 6.4: Algorithm 1 reads like LUBM [2]. Some argument should be given why we need yet another generator, especially since it seems to be very rigid, e.g., always generating a class hierarchy in the form of a balanced tree.
* Section 6.4: Why resembling DBpedia is important? Are statistical properties of DBpedia representative of many KGs or what?
* Section 7.2: You make conclusions about one variant being better than another on a suite of datasets. This seems to call for a statistical test, e.g., the Friedmann test, and then paired t-tests with necessary corrections.
* Section 7.3: It is not clear to me why a one-sided binomial significance test is a good choice. At the very least I would expect you to report the null hypothesis and the alternative hypothesis.
[1] Abraham Bernstein Natasha Noy "Is This Really Science? The Semantic Webber’s Guide to Evaluating Research Contributions" https://www.merlin.uzh.ch/contributionDocument/download/6915
[2] http://swat.cse.lehigh.edu/projects/lubm/
QUALITY OF WRITING
Overall, the paper reads well. However, there are some aspects that seem vague and/or inconsistent.
* Section 3.1: It is not clear if w0 is a distinguished node and a random walk spans in both directions from it, or if the elements of the vectors are indexed from -n/2 to n/2 just to confuse the reader. If the former, then it is inconsistent with Figure 1. If the latter, please don't.
* Eq. (3) represents a vector of the length n/2 (since the indices are incremented by 2). Is it still a random walk of the length n?
* Section 3.2: The names CBOW and SG stand for should be given, since for a reader not intimately familiar with RDF2vec it is unclear what the original configurations were.
* You usually use the abbreviation DLCC which stands for Description Logic Class Constructors, but Section 5's title is DL Constructors. I think this is inconsistent, or you need to explain the difference in naming properly.
* Section 5/Eq. (9): There's a dot missing after \exists R_2^{-1}
* Section 5/Cardinality restrictions: "the corresponding decision problem is between the two variants" What do you mean?
* Section 5.1: This section is weirdly short and since it is the only subsection of Section 5, I would remove it. In my opinion Table 2 is very hard to read and rather pointless since the hypotheses without the experimental results are not that interesting.
* Section 6, introduction: What does it mean that a gold standard is "officially published"?
* Section 6.3: The notion of hard negatives requires an explicit definition, giving an example is not sufficient.
* I would make Tables 3-5 an appendix since they are quite large and not that important, but that's only a suggestion.
* Section 7.3/Figure 4: It is not clear to me what exactly is depicted in Figure 4. How do you compute complexity? Offer a formula or an algorithm.
* Table 6 (and later): What exactly is ACC? Accuracy?
* The penultimate paragraph of Section 7.3: "(...) most models are not actually learning the description logic constructor but instead are picking up cross-correlations very well." This is unclear to me. I understand the example, but I don't understand the generalization.
REPRODUCIBILITY
There seems to be no "Long-term stable URL for resources". The paper itself references multiple resources, both on GitHub and Zenodo. Based on the claims in the paper, it seems to me they should be sufficient to reproduce the results. However, for the final version of the paper, I would recommend creating a single resource containing all the necessary code and datasets, along with a README on how to reproduce the results presented in the paper.
|