Review Comment:
This paper describes an OLAP-style approach to modelling contextual knowledge graphs. Much like in traditional OLAP settings, the authors consider a multidimensional cube in which slice-and-dice and roll-up operations are supported. Unlike traditional OLAP settings where the cells of this cube are typically numeric values, in this setting, OLAP cubes are knowledge graphs. The authors propose that the dimensions of the cube can then serve as contextual dimensions, thus using an OLAP-style representation to manage context for knowledge graphs. The authors first define and formalise their "KG-Cube" model, covering the overall schema of the cube, the dimensions and levels, and the cells themselves; they then define the "knowledge modules" contained in each cell, which are described using an "object language" (examples are defined using DL syntax). Thereafter they describe the main query operations considered, including two contextual operations: slice-and-dice and merge (roll-up); as well as three graph operations: abstraction, pivoting and reification. The authors then briefly describe a proof-of-concept implementation using an off-the-shelf SPARQL store (GraphDB); using this, they outline some experiments using artificially generated data for an air-traffic-control use-case, over which SPARQL translations of the key operations defined earlier are introduced; the results in general show a linear relation between the cost of the principal operations and the size of the data.
The paper is of clear relevance to the journal, and tackles an important problem (managing contextual data) with an interesting and technical approach. The paper is well-written, and has a good balance of motivation (the air-traffic-control use-case is very appealing), intuition, formal definitions, abstract examples, and concrete examples. The approach itself I found (pleasantly) surprising: I was expecting another graph-esque representation of OLAP, but having OLAP where cells contain graphs is really something new for me and something that captured my attention; also the authors establish an interesting relation between OLAP and context that, though obvious once pointed out, I had not seen before and is, for me, a valuable observation. I also appreciate the provision of experiments for the various operators.
In summary, I like the paper quite a lot!
There are, however, some (minor-ish) points to improve upon:
* The paper never directly addresses the issue of incompleteness, which is one of the characteristic features of knowledge graph-style applications (though perhaps not the specific use-case chosen). For example, what could be done if some dimension members are not known for a particular cell? While this might be something for future work, I think the paper would benefit from some discussion on what completeness is assumed, and how incompleteness could be handled (either now, or in the future).
* I found the mix of expressiveness a bit distracting. In Figure 6, though K0 appears to be pure RDFS, in K1, some more expressive DL constructs are used, such as datatype facets. But these are not supported in OWL RL (Section 3.2.3). Again the expressiveness drops in the experiments where only RDFS is considered. I understand that this is not central to the objectives of the paper, but ideally this switching of expressiveness could be cleaned up somehow.
* Some of the discussion could perhaps be made more concise. A particular case of this is Example 9 and Example 10, which are, respectively, the second and third examples for abstractions. I am not sure what they add versus Example 8, which already exemplifies all of the different types of abstraction. I would suggest to either clarify at the start what the example additionally contributes so the reader knows what to be looking out for, or otherwise (if just another example of the same idea), remove it/them.
* Regarding Definition 5 and the merge operators, since RDF is used as the serialisation format, and since reification steps are considered, I think it would be worthwhile to mention something about blank nodes, either to simply say that the framework does not consider them (e.g., considers that they have been skolemised to constants beforehand), or to add the RDF merge operator [https://www.w3.org/TR/rdf11-mt/#shared-blank-nodes-unions-and-merges] as a merge option.
* The appendix has some useful material (e.g., to get an idea of the queries), but it's not included at the end of paper and took me quite some time to find it. I understand that it is very long (admittedly I only skimmed it for this reason), but maybe some part of it could be included at the end of the paper with the rest being posted online? Admittedly I'm not sure I have a good suggestion on how to handle this.
In summary I like the paper a lot. I think the above comments can be addressed within a minor revision. Please also review the following (more) minor remarks.
MINOR COMMENTS:
* "A knowledge graph (KG) serves organizations to represent real-world entities ..." Slightly awkward phrasing.
* "The Resource Description Framework (RDF) is the standard representation format for KGs ..." I personally think this should say "a standard representation format"; even though there is no standard, this gives the impression of RDF being the one and only representation format.
* "The roll-up operation that sums up ..." Awkward and difficult to read. Rephrase.
* "in [the] form of messages"
* "using the cases of ATM" -> "following the use-cases of ATM"
* I am a bit confused as to why D (dimensions) is defined to be a set of atomic *roles*. I would have imagined a set of dimension names, like "Importance", "Location", etc. I thought that maybe the dimension role maybe is the dimensional ordering, but this is actually defined separately. Perhaps this could be clarified/explained better.
* "s.t. for j with ..." Awkward phrasing.
* "called [the] object language"
* "we assume that [the] meta and object level[s]"
* "is then [a] DL language"
* "come in various fashions" -> "come in various forms" (The current phrasing is slightly off and a bit distracting; one could perhaps also say "Note that various fashions of slice-and-dice operations ..."
* "serve as [a] grouping property"
* "RDFpro allows for the specification of queries across different graphs, a feature needed for the reasoning ..." SPARQL also supports this (through FROM, FROM NAMED and GRAPH).
* "causing a vast number of lower-level cells to be merged" What does "vast" mean? Please be specific.
* "regardless [of] contextualization"
* "different serializations and publication formats of traditional OLAP cubes rather than KGs" In other words, a traditional OLAF cube represented as a graph is not a KG (it does not "represent real-world entities and their relationships with each other")? I think perhaps the argument here can be rephrased to avoid this unnecessary implicit claim (e.g., end the sentence after "cubes").
* I appreciate that the link in footnote 5 provides material to reproduce experiments, but the link should be made more prominent; from the context of the text where the footnote is referenced it's not clear at all what is in the link.
|