Review Comment:
(0) Summary
============
All well-established W3C standardizations regarding the Semantic Web so far are concerned, more or less, with static RDF repositories. Recent efforts in the RDF Stream Processing (RSP) community, where the author of the paper under review has been one of the main contributors, lead to various proposals regarding query languages over RDF streams, their semantics, requirements on RDF stream engines, stream benchmarks etc. The main motivation for the present paper is that, according to the author, there has been less research on the global architecture of web that takes streams seriously. So, in filling the gap, the paper proposes a general architecture based on the actor paradigm and the Linked Data Protocol. The paper describes basics of RSP, the actor paradigm and its adaptation using LDN as a front-end. Furthermore, experiments on some implementation of the model are discussed that are meant to illustrate the scalability (feasibility) of the architecture.
I think the author has a good point in raising again the question on how to make the web a better web from the stream perspective. The main interesting aspect is that of considering streams as first-citizen objects that should be made available in the web. For this, the author proposes to exploit the actor paradigm relying on the general benefits that this actor model has. In this adapted actor model, streams are mainly administered in what the author calls the Stream Receiver. The Stream Senders provide stream element inputs and the Consumer can ask the Receiver to output stream elements from a stream specified by its IRI.
Though I am absolutely d’accord with the main motivation for this paper, I have problems in accepting the paper in this form, mainly due to its hybrid status as a semi-research paper and semi-report on tools and systems, where I have a strong bias and suggestion to go rather for research paper.
(1) Quality, importance, and impact of the described tool or system
====================================================================
As the paper is submitted in the “Report on Tools and Systems Report”, one faces the following difficulty in reviewing this paper: There are actually two software-related artifacts in a wide sense, namely a general, interface software framework for a Web architecture enabling the integration of RDF stream engines, and a concrete prototype implementation based on some set of RDF stream engines. The interface software framework (consisting of interfaces described in a concrete programming language) is not software in a more genuine sense of the word, rather it is an abstract model. It’s quality, importance and impact can also be judged. And, indeed, the paper at hand argues for the quality of the abstract model: it describes experiments with a prototype implementation (the second artifact), which are meant to show the feasibility (or perhaps also the scalability) of the model. In his prototype implementation, the author considers different instances of the CQELS engine and considers query answering (with the queries and the data taken from the SRBenchmark) as the main service to be used. The experiments seem to suggest that the model is indeed feasible/scalable, and so one has a clue on the good quality of the model w.r.t. feasibility/scalability.
As I have understood, the author really wants to stress the evaluation of this abstract model which may not be completely new but nonetheless was not described in this form elsewhere in the literature. The fact that he needs more than half of the paper in order to describe the model hints also to the fact that the model (and not the prototype implementation) is the focus of the paper.
But then one would have to to consider also different aspects of the quality, importance and impact etc. of the model. The experiments alone do not show, e.g., how the web would benefit from the actor-based model but can only meant to show that the proposed actor model would do no harm to the web - regarding at least the feasibility/scalability aspect measured by relative throughput. Of course, as the model is relatively new, one cannot say much on its importance and impact (the use of it by other people). Moreover, as there is no other architecture yet, it is difficult to compare it to other models.
But there are other criteria one should discuss for this model with sufficient detail and persuading illustrations. For example, taking query answering as the main service, the question is what we gain with the actor model in answering queries? Can we answer queries that we could not answer before due to the fact that we now can refer to streams that are "produced" elsewhere? Do we gain better precision or recall? Can queries be answered faster? As far as I can see, the quantitative experiments described in this paper show “only” that the paradigm does not do harm to query answering. Moreover, in evaluating the model, I would have expected some general discussion about the lurking danger of redundant streams, about federation and provenance aspects and about the discovery process which seems to be of outmost importance in order get an overview over available streams that one can use for the specific query task. The metadata regarding the streams seems to be important in order to solve a specific query task. How does the proposed model facilitate the use of the metadata in order to solve a query task? etc.
The model rests on the the well-established actors model, but nonetheless the applicability and benefit of it for the web must be discussed in more detail. This concerns also some technicalities related to streams. For example, the author stresses the point that streams differ from static data in the point that they fade out. In the interface there is also some method for retrieving a specific stream element. I would have expected this to be the last, most recent element in the stream. If this is the case I see no problem. Otherwise it would mean that the receiver must have some appropriate “window size” to fit the needs of different potential consumers. But what would be a good average window size? How is it specified? Heuristics?
Another point is that there is a deviation from (or refinement of) the actor model in that that there is a distinction between two types of inboxes, one as an input the other as an output. The author notes that a stream can have both roles, that of input stream and output stream. Looking at the “definitions” this can be the case only when the stream has the role in different actors. Nonetheless, couldn't it be the case that one has some circular arrangements of actors leading to un-desired recursions in a stream?
Summing up, considering the quality, importance and impact of the first software artifact, which is actually an interface framework and hence an abstract model, it seems rather that the paper at hand is a “system-for-paper” type of paper and not a “paper-for-system” type of paper (see comments section http://www.semantic-web-journal.net/content/special-call-semantic-web-to...). Hence, the paper needs a major revision by addressing the points above to become an acceptable research paper.
Maybe, I have misunderstood the emphasis of the author, and the focus of the paper is really on the second software artifact, namely the prototype implementation. Of course one can discuss the concrete implementation available under https://github.com/jpcik/ldn-streams. And I am pretty sure that the author did a good job in implementing his model. But it is rather awkward to see this system as the system/tool to be judged for quality, importance and impact. As an analogue consider, say, a description logic (DL) reasoner: Here one may develop the general principles of the reasoner (such as the type of inference rules, tableau rules) in a research paper and then, in a report on systems and tools, one may “sell” a concrete DL reasoner software showing how the inference rules were implemented (efficiently), describing possible optimization strategies, evaluating it with benchmarks or by comparison with other DL reasoners. In the case of the paper at hand, the author does not want to “sell” his concrete implementation but the underlying RSP model.
(2) Clarity, illustration, and readability
===========================================
All in all the paper is well-readable. However, I did not get much gain from any of the figures outside the experiments section: The “contents” of the figures do not add much to what has been said in the text. So I would suggest deleting them. Rather the author should spend some effort in introducing his vision with some running example, for which, may be, also figures are given. A second point regarding presentation are the sections on software-related material - be it the interface-stuff in Section 3 or the stuff on some concrete implementations described in Section 5. Surely, this paper is intended as a software-description paper and so one should expect also some software related descriptions and listings, but -in my honest opinion- a description of an algorithm as in Algorithm 1 is out of place. A third point regarding presentation is the terminology used by the author. I was confused about the use of a “stream receiver”. The role of the stream receiver is rather that of a mediator or a channel via which “streams are exchanged”. So maybe the author can give a thought about changing the terminology to Stream sender, stream channel, and stream receiver (= the author’s stream consumer).
A fourth point is the presentation of the experiments. Here the author should provide more details on the experimental configuration: For which queries do the experiments measure the relative throughput? Is it one specific, is the average over all of them?
A fifth point regarding presentation is the conclusion: it is not a conclusion rather a repetition of (some of) the contents from the introduction.
In the related work I would have expected at least a short hint on the whole aspect of web services.
(3) Suggestion
==============
Due to the points above I suggest a major revision where the paper is submitted as a research paper and where the following points are addressed:
1) Detailed critical discussion of the pros and contras of using the actor model w.r.t. query answering - illustrating these with a concrete example.
2) Clarifying details regarding the configuration of the experiments
3) Improving the presentation w.r.t. the points mentioned above
4) Correction of typos (see below)
(4) Typos and minor suggestions for improvement
===============================================
p. 1, abstract:opens-sourced => open-sourced
p. 1, l. col.: events processing => event processing
p. 1, l. col.: to analyze => in analyzing
p. 1, r. col.: Move “as a result” to the the beginning of the sentence starting with “Processing”
p. 2, l. col., pa. 1: Is LARS really about RDF?
p. 2, l. col., pa. 1: Add spaces before citations “[11, 16]”, “[17]” and “[18]”
p. 2, l. col., pa. 2: Reformulate “and through Web standards”
p. 2, r. col., pa. 1: “as depicted in Figure 1” refers to the vision outlined in 20. But does this vision really concern only the concrete engines given in the
Figure?
p. 2, r. col., pa. 2: implementation , => implementation,
p. 2, r. col., pa. 3: follows: we => follows: We
p. 2, r. col., pa. 3: in details => in detail
p. 2, r. col., pa. 3: related works =?=> related work
p. 3, l. col., pa. 2: syntax and semantics =?=> syntactical and semantical
p. 3, l. col., pa. 2: As a concrete example of these languages =?=> For illustration purposes
p. 3, l. col., pa. 3: Here and elsewhere prevent breaks in listings.
p. 3, r. col., pa. 1: How is “ontology” defined?
p. 3, r. col., pa. 1: consecutive => consecutive instances
p. 3, r. col., pa. 1: form => from
p. 3, r. col., pa. 1: “ABox” and “TBox” are not introduced/defined
p. 3, r. col., pa. 1: Paragraph break after “reasoners.”
p. 3, r. col., pa. 1: much considerations => much consideration
p. 4, l. col., pa. 1: et al. => and colleagues
p. 4, l. col., pa. 1: (See Figure 3) => (see Figure 3)
p. 4, l. col., pa. 1: Here and elsewhere be consistent in (not) using
abbreviations for “Figure”.
p. 4, r. col., pa. 2: The notifications are by themselves not “Web resources that can be identified” but are references to Web resources
p. 4, r. col., pa. 4: May be instead of “data exchange” use “exchanging data”
as data exchange reminds one of the area dealt with in database theory.
p. 4, r. col., pa. 1: and scalability => , and scalability
p. 5, r. col., pa. 1: and RDF => an RDF
p. 5, r. col., pa. 2: relative => related
p. 5, r. col., pa. 2: To do, so => To do so,
p. 5, r. col., pa. 3: stored data => stored data (requirement 4)
p. 5, r. col., pa. 3: allow => allows
p. 6, l. col., pa. 2: expectations =?=> constraints
p. 6, l. col., pa. 2: Links or references to SSN, PROV
p. 6, l. col., pa. 3: actor model, the it => actor model, it
p. 6, r. col., pa. 1: case => cases
p. 6, r. col., pa. 1: write => write on(to)
p. 7, l. col., pa. 1, in item “RetrieveStreamItem”: is it really the case that a specific
stream element is requested or rather that the most recent element in a stream identified
by the IRI is returned?
p. 7, l. col., pa. 1, in item “RetrieveStreamItem”: and element => an element
p. 7, l. col., pa. 4: exiting => existing
p. 7, l. col., pa. 6: “are essentially meant” is vague: do you mean the stricter “are allowed“
p. 7, r. col., pa. 1, in case RetrieveStream: send(getStream(msg.uri) => send(getStream(msg.uri))
p. 7, r. col., pa. 1, in case RetrieveStreamItem: What does the “r.” stand for?
p. 7, r. col., pa. 3: such stream => such a stream
p. 7, r. col., pa. 3: “RDF streams can also be available as both input and output streams”. Due to the intended use this can only be the case across different actors, not with the stream receiver, right? Does this concern different actors?
p. 7, r. col., pa. 5: metadata , => metadata,
p. 7, r. col., pa. 5: send and RDF => send an RDF
p. 8, l. col., pa. 3: can derived =?=> can deviate
p. 8, l. col., pa. 4: of certain stream => of a certain stream
p. 8, r. col., pa. 2: This IRI is =?=> This IRI refers to
p. 8, r. col., pa. 8: of the results => for the results
p. 9, l. col., pa. 4: and RDF => an RDF
p. 9, l. col., pa. 4: times-tamped => time-stamped
p. 10, l. col., pa. 1, first listing: Missing timestamps?
p. 10, l. col., pa.2: time-annotated => timestamped
p. 10, l. col., pa. 4: in WebSocket => in WebSocket)
p. 10, r. col., pa. 2: CQELS => following CQELS
p. 11, l. col., pa. 2: align “LDnStreamReceiver”
p. 11, r. col., pa. 6: align “registerSelect”
p. 12, r. col., pa. 4: How is the “maximum ideal number” given?
p. 13, l. col., pa. 2: number => numbers
p. 13, l. col., pa. 5: excepting => except
p. 13, r. col., pa. 1: number => numbers
p. 14, l. col., in Fig. 14 caption: N. => Number
p. 14, r. col., pa. 2: WEb => Web
p. 14, r. col., pa. 4: RDF dataset => RDF datasets
p. 15, l. col., pa. 2: RDF, into => RDF into
p. 15, l. col., pa. 2: continuoation => continuation
p. 15, l. col., pa. 4: takes => take
p. 16, l. col., pa. 3: citation for “SLD Revolution”
p. 16, l. col., pa. 4: provides => provide
p. 16, l. col., pa. 4: synchronous => asynchronous
p. 16, l. col., pa. 4: patterns => pattern
p. 16, l. col., pa. 4, in reference [5]: W3c => W3C
p. 17, in reference [29]: delete ISSN and doi
|