A More Decentralized Vision for Linked Data

Tracking #: 2193-3406

Axel Polleres
Maulik Rajendra Kamdar
Javier D. Fernandez
Tania Tudorache
Mark A. Musen

Responsible editor: 
Guest Editor 10-years SWJ

Submission type: 
In this deliberately provocative position paper, we claim that more than ten years into Linked Data there are still (too?) many unresolved challenges towards arriving at a truly machine-readable \emph{and} decentralized Web of data. We take a deeper look at key challenges in usage and adoption of Linked Data from the ever-present ``LOD cloud'' diagram. Herein, we try to highlight and exemplify both key technical and non-technical challenges to the success of LOD, and we outline potential solution strategies. We hope that this paper will serve as a discussion basis for a fresh start towards more actionable, truly decentralized Linked Data, and as a call to the community to join forces.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Anna Lisa Gentile submitted on 06/Aug/2019
Review Comment:

This is a position paper that presents an overview of the unresolved challenges towards achieving a truly machine-readable and decentralized Web of data.
I particularly like this paper as it isconcise, direct, informative and while exposing the "pain points" of Linked Data, oslo summarizes possible solutions. which are already available but not widely adopted.

I believe this paper is truly in the spirit of this special edition as it reflects on the last decades of right&wrong of Linked Data.

Consider discussing pro-cons of a recent work on overcoming SPARQL endpoint restrictions and avoiding the issues with pagination [1].

Minor typos (or sound weird sentences):
- widely bisible
- ofen
- we believe to need
- different different
- alongside with an in the meantime
- to be established best practices

[1] Thomas Minier, Hala Skaf-Molli, and Pascal Molli. 2019. SaGe: Web Preemption for Public SPARQL Query Services. In The World Wide Web Conference (WWW '19), Ling Liu and Ryen White (Eds.). ACM, New York, NY, USA, 1268-1278. DOI: https://doi.org/10.1145/3308558.3313652 https://dl.acm.org/citation.cfm?id=3313652

Review #2
By Jens Lehmann submitted on 31/Aug/2019
Major Revision
Review Comment:

The position paper aims to present a new decentralized vision for Linked Data. It contains several valid discussions points, in particular related to metadata. This aspect seems to have even higher importance in the article than the decentralization aspects.

Generally, the article points out issues in each subsection followed by solution paths. The authors have long standing experience in the field and the mentioned issues are relevant. However, it is not clear to me how feasible the solution paths are. In many cases, it is almost obvious what *should* be done, but the effort and expertise required to actually do it vs. the incentives (expected reward) is a main issue.

Section 4 makes a particularly relevant point in my opinion: "In fact, we would argue that more principled Linked Data publishing could allow to auto-generate LOD clouds from a set of such HDT dumps, which to demonstrate is on our agenda for future work." => I would not limit this statement to HDT, but rather think that the maintenance of metadata outside of the actual knowledge source, as done in some dataset catalogues, is always likely to lead to synchronization problems (and completely unfeasible for frequently changing data).

Overall, I believe that the article could have a clearer structure. Only indirectly, I see how the solution paths lead to a "more decentralized" vision of Linked Data as claimed in the title. The authors may consider structuring the article fully around LOD cloud access and metadata provisioning, which seems to make up a large part of the article.

Moreover, I believe the visionary parts should be strengthened. In particular, the authors could comment / discuss how the suggestions can actually be realized at web scale.

Since the above two weaknesses (in the structure and the solution paths) from my point of view require potentially more significant changes, I opt for a major revision of the article before it can be accepted.

Further specific comments follow below:

- Myths => decentralized network of ontologies: The criticism here seems not that convincing. Gruber himself was talking about ontologies modeling domains of discourse - one could say that those are almost by definition "insular efforts" as they just relate to one domain. I would also like to see a source for backing up the claim that "vocabulary reuse is still extremely limited".

- Myths => knowledge graphs not decentralized: In some sense true, but some of them are linked to various other knowledge graphs. An argument about why these single knowledge graphs should even be decentralized in those specific cases could be added to the article.

- Section 2 is not that well organized in my opinion: Whereas Section 1 seems mostly about general observations, which are valid for the Semantic Web as a whole, Section 2 seems very specific and mostly focussed on LOD cloud metadata rather than its not completely clear how this fits into the story line.

- The solution path in Section 3.1.1 seems to rather focus on observing the status of the LOD cloud rather than providing actual solutions (of course being able to observe the status can be seen as part of the solution).

- "In addition to that, it is mostly impossible to indeed retrieve all triples from a SPARQL endpoint," => It is not clear that this is indeed a weakness of the Semantic Web, since this is by design to keep endpoints alive. Also non-Semantic Web sources can often not easily be fully retrieved via queries.


- "authoes" => "authors"
- "wherefor"
- "sizes of triples" ?
- from unfulfilled 50 expectations) => replace bracket by dot