One Size Doesn't Fit All – Fostering Diversity on the Semantic Web

Tracking #: 749-1959

Authors: 
Wouter Beek
Ruben Verborgh
Christophe Guéret
Miel Vander Sande

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
The current Semantic Web landscape is shaped by a collection of standards and practices, the philosophy of which is strongly rooted in its ancestor domains of databases and artificial intelligence. All too often, we seem to assume that the default solutions are sufficient for the majority of use cases, but are they really? While practice shows that, for instance, installing a SPARQL endpoint over a public dataset is not the definitive solution to offer reliable data access, the community still seems to assume issues will disappear by improving existing solutions instead of developing new ones. With nothing but the standards to fall back to, we need to think about alternative solutions—and how to start building these. This paper discusses use cases in which the current one-size-fits-all approach fails, suggesting possible alternative directions. Embracing the diversity of the Web, rather than trying to retrofit things to a perhaps idealistic model, could be the more elegant way forward.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 23/Aug/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

-1

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

1

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Review

The paper discusses the diversity on the Semantic Web, which is considered to be limited due to the current usage of technologies and approaches. The authors discuss which are the factors in terms of standardization, research focus and deployment practice that influence the restriction of diversity on the Semantic Web. Then the authors propose some alternative solutions that go beyond the idealistic model of the Semantic Web and instead of improving on existing solutions they propose new ones.

PROS

The paper is clearly on topic for this track and tackles very important issues for the Semantic Web/Linked Data community. The paper is reasonably well written in most parts. The limitations of current approaches and the three main alternative solutions proposed in this paper (Linked Data Fragments, eRDF, and DataHives) are all very interesting and solve real problems in the current semantic Web technology ecosystem. These solutions provide very nice contributions to the state of the art in three important domains.

CONS

I think that the paper suffers from an overall lack of focus and analytical depth.

*Lack of focus

The paper discusses several problems (too many?), but only publication methodologies (with the proposal of Linked Data Fragments), engines (with the proposal of eRDF) and reasoning (with the proposal of DataHives) are discussed at a reasonable level of details.

What is the main focus of the paper? To discuss the lack of *diversity in semantic Web technologies/paradigm*? To discuss the lack of *diversity in semantic Web research* (but [16] is a research paper published in ISWC 2014)? To discuss the need of application that deal with *diverse of information* (the above mentioned approaches propose solutions to deal with the diversity of information available in the semantic Web)? Surely these three issues are connected, but the paper does not explain how.

*Lack of depth*

The arguments and evidence used to support the main claims of the paper (one major problem in Semantic Web compared to other technology is the lack of diversity, which is caused by standardization, research stubbornness, and current deployment practices) are not sufficiently well developed, which makes the thesis of the paper not very convincing.

How and why are the (well) discussed limitations of current semantic Web technologies/paradigms connected to a problem of diversity? Why the causes of these limitations are the three factors mentioned at the beginning of Section 2?

Some claims are difficult to evaluate because are not supported by any quantitative analysis. For example, is restrictiveness a problem perceived by users or developers? Could you bring some figure about the cost (quantity) of deploying a query engine? Even though I intuitively agree that deploying a query engine to make RDF data accessible is more costly than building a website, the two applications are not comparable, in terms of both complexity and technology maturity (one might argue that the limitations you point to are due to the immaturity of semantic Web technology).

Some other claims are difficult to evaluate either because the argument refers to research work that is not accessible to the reviewer (e.g., [16] will be published in October), or not available in peer-reviewed publications (e.g., [8] is a technical report and many editorial details are missing in [9]). For example, I could have better appreciate the contraposition between [16] and mainstream research if I could read the full paper on Linked Data Fragments, but the short summary in the paper did not help enough. (Isn’t the Linked Data Fragments an improvement solution to the interfaces? Why is this to be compared against SPARQL endpoints?)

In conclusion, I think that there are interesting messages in this paper and I encourage the authors to go through the paper again and try to sharpen the writing through the help of a more in depth-analysis (e.g., using figures and more examples). However, in its current shape, the paper lacks strong analytical arguments to support the main claims of the paper (the lack of diversity and the causes of this lack of diversity). The parts of the paper that are developed more in-depth, and which are more convincing, are/will be published in dedicated papers.

Minor comments:
- p3: “users is stores”
- p5: “have to date aimed”
- although it may be clear to some readers you need to define terms such as “entity data”
- p6: check the sentence “One of the new interfaces defined with Linked Data Fragments are triple pattern fragments”
- avoid the usage of “this” in the paper
- can not -> cannot

Review #2
Anonymous submitted on 25/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
* 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
* 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

* 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
=== 4 good
== 3 fair
* 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
* 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
* 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
* 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Review

This is a well-written, clear and very interesting paper that analysis and criticizes the current state of play and way of conceptualizing the Semantic Web.
It particulary criticizes that the current SW standards and paradigms are "killing diversity". They in particular argue that setting up an RDF triple store represents a substantial effort and overhead for application developers, especially those that can not afford to invest in a larger infrastructure to host the triple stores. This has lead to the emergence of a few data providers and hosting parties that can afford setting up such an infrastructure, running counter to the powerful idea that the SW is supposed to be a distributed system in which everyone can "say anyhting about everything".

In general, I agree with the point of view and position put forth by the authors and I endorse some of thei arguments. The main problem is that the paper has little alternatives to offer. It sketches some potential solutions to increase dencentralization by adopting principles from the Entity Registry System. They propose to rely on triple pattern fragments as a low-cost method for publishing and accessing data without SPARQL and a triple store. Regarding reasoning, they propose to rely on evolutionary or nature-inspired algorithms to support reasoning over RDF data.

While I generally agree with the points raised by the authors and agree that more diversity is needed to progress, there are two issues that I find questionable:

1) First, the authors highlight that inconsistency is an issue in the SW as inconsistent ontologies allow to deduce everyhting. They argue that this is exacerbated by the trend towards centralization as the likelihood of inconsistencies increases with data coming from more and more resources. These sort of inconsitencies, however, do not pose a problem for RDF reasoning but for OWL reasoning only. As not many applications really use OWL, this might be less of an issue from an application perspective.

2) The authors propose evolutionary / nature-inspired computing methods as an alternative to complete RDF reasoning. In fact, it can be expected that nature-inspired computing methods to RDF reasoning essentially trade off completeness for anytime behaviour. While this might be acceptable for some appliations, others might indeed require completeness. Further, it is really hard to imagine that nature-inspired computing methods can outperform scalable triple stores based on adequate indexing structures such as Hexastore.

Overall, this paper is clearly more a position paper that brings up important issues in calling for more diversity in SW research. The discussion of alternatives, however, is quite superficial and not really backed up by a discussion of requirements nor empirical analyses. Could be accepted for the conference if there is room, but it is definitely to premature for a journal publication unless further substance is added.

Review #3
Anonymous submitted on 02/Sep/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.
-1
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.
3
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
2
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
1
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
2
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Review
This paper discusses the limitations of current Semantic Web standards, such as RDF and SPARQL, and proposes a number of alternative technologies to enhance the development of Semantic Web. I appreciate the efforts of the authors who think out of the box and who try to find innovative solutions to the challenges of the Semantic Web. However, I found the authors' arguments not exactly convincing. I have listed my questions as below:

1. In the introduction section, the authors seem to be suggesting that HTTP and HTML provides more flexibility, while RDF and SPARQL do not. However, this contradicts to my understanding of these two technologies. RDF is a general data model which only asks data to be organized as subjects, predicates, and objects. SPARQL only specifies the general structure and syntax for queries. It is the data providers who decide which content should be organized or should be queried. This process is also similar to HTML, as it specifies a set of HTML tags necessary to annotate your document, and it is the data publishers who decide which content should be put into a HTML document.

2. The authors argue that SPARQL introduces a lot of cost on the server side, which is . While I agree that graph matching requires quite some computational resources, this does not sound like a serious problem of SPARQL. A lot of currently popular and advanced methods require a large amount of computational resources on the server side. The machine learning approaches, complex statistical models, or data indexing often require days, weeks, or even months to run on the server side. Yet, these expensive approaches are favored by many companies, not only big companies, but also small start ups. A major reason is that these methods are useful. Thus, the problem should be whether SPARQL is useful or not? If it is useful, the cost issue can be solved by other approaches, such as high performance computing.

3. The authors states "Use of the SPARQL standard in practice leads to low availability." This is a very hasty conclusion. I agree that the number of available linked datasets is not huge, but this may be due to many reasons, including the limited number of years' development since the LOD initiative. A simple conclusion like this is just unsupported.

4. The authors argue that SPARQL endpoints restrict the types of queries a user can ask. I agree that the types of queries that a SPARQL endpoint can support are limited by the ontologies used by the endpoint. However, traditional static HTML pages only allow a single way to query the data (i.e., simply present the HTML document to the end user). Although a SPARQL endpoint does not support infinite types of queries, it does allow the users to query data in many ways.

5. The authors state that "a single data store provides a single point of failure in the case of unanticipated circumstances." This is not an issue of the semantic web standards, but an issue of infrastructure construction. A data provider can simply avoid the failure of a single data store by providing multiple and duplicated data stores on different machines, and use load balance strategies to evenly distribute requests.

7. The authors suggest that "This means that, in practice, individual data providers are handing over their potentially private data to a
few big disseminating entities, thereby handing over (part of) their agency and ownership of the data as well." Again, I think this is not an issue of the current Semantic Web standards, but about data licenses and policies.

8. "However, putting together such multi-perspective data into one spot makes it difficult
to tell these different views apart." It's difficult to agree with the authors on this opinion. When examining DBpedia entities with two different rdfs:label strings, it's not difficult to tell that such entities have incorporated more than one perspective.

9. The authors proposes ERS as an alternative for data publication. However, the description of ERS is sketchy, without stating how it is going to improve the integration of multiple views, and what better features it can provide compared with the existing Semantic Web standards.

10. The alternatives, such as triple pattern fragments, eRDF, DataHives, look very interesting. I appreciate the authors for proposing them as new possible approaches.