Review Comment:
=== General aspects of the paper ===
The paper "Quantifiable Integrity for Linked Data on the Web." describes in concise and clear terms how to establish trustable Linked Data (LD) networks by introducing the concept of LD integrity and presenting a decentralized technical solution for tamper-evident resources with traceable authorship.
Further, the authors discuss the notion of trust scores and strategies to incentivize agents to contribute to the integrity of an interlinked network.
Their technical solution combines Web technologies and the Solid Protocol to conduct self-verifying resource representations.
The work is sufficiently motivated and fits the journal's research track as it addresses several relevant issues related to the Semantic Web.
The authors highlight current and relevant issues of Linked Data integrity and trustworthiness.
=== Specific aspects ===
(1) Clarity and completeness of the descriptions.
Concerning the paper's writing, the overall language is very sound and comprehensible. The preliminaries section is very detailed and supports the reader by refreshing or understanding the necessary theoretical and technical principles. In chapter 4, the authors introduce a running example, presenting a theoretical architecture and scenario.
However, the intent to visualize the accessed problem for the reader can be beneficial, but why is it called a "running example" if it never occurs with this detail in the rest of the work?
Chapter 3 is excellent and covers an adequate amount of related work, showing current approaches and problems.
(2) Approach
The papers' foundation is supported by previous works (ref. 5, 11, 46). However, the contributions in this paper focus on new aspects or are extensions to earlier work. The authors separate their contributions into five sections, where sections 5-8 describe theories, concepts, and proposed approaches, while chapter 9 presents an experimental evaluation of the proposed solutions.
In chapter 5, "Self-Verifying Resource Representation" (SVRR), the authors propose their idea of self-verifying resource representation by introducing their idea of Singed URIs in combination with LDS.
Although the text describes the idea sufficiently, a final listing that includes the usage of Singend URIs as the subject or object of the wrapped tiples would be helpful. Furthermore, the listings should relate to the running example presented in Section 4.
Chapter 6, "A Web of Self-Verifying Resource Representations", investigates the idea of having SVRRs build a document graph. It describes how deletions must delegate to maintain the integrity of the graph. It also shows that integrity violations are detectable based on links set by other documents.
An unaddressed aspect regarding versioning aspects remains concerning creating or updating new documents. As the authors mention that they use RDF-Start to avoid the issue of accessing the state of triples directly, it still would be interesting to discuss the combination version strategies and the idea of SVRRs. For example, this is a common requirement for data, e.g., Wikipedia article revisions, leaving the opportunity to reference an older resource version instead of updating all transitively associated documents to preserve their integrity.
Chapter 7 is very sound. It utilizes the ideas proposed earlier to describe the definition of integrity preservation, integrity contribution, and different types of trust scores to provide a foundation of quantifiable web resource integrity. Additionally, the authors present the idea of using Hub- and Authority-based trust scores.
Chapter 8, "Score Optimisation", discusses the theoretical aspects of score optimization of newly created resources (documents) to increase their initial trust scores. (mentioned because of (3) Evaluation)
(3) Evaluation
A significant effort of the work focuses on evaluating the proposed ideas for trust score, score optimization, and possible integrity recovery in attacked or intentionally disconnected graphs (deletion of documents).
In 9.1, the authors present a formal analysis of three trust-score algorithms. They discuss their fitness as a trust score regarding the incentives for agents to link other agents' documents using an analysis based on game theory.
The authors provide proof of the insufficiency of the page-rank-based trust score in Appendix A, which shows that it punishes additional linkage.
In section 9.2, the authors evaluate their proposed heuristic to find the "next best link" (ARS) by comparing its correctness with other heuristics and an optimal link set.
Is there an explanation for the behavior shown in Figures 6 and 9 regarding the edge probabilities 0.1 and 0.05?
Additionally, the paper could mention the computational complexity of the compared heuristics and the optimal algorithm.
The last evaluation section, "Evaluation Cases Overview", shows an experimental evaluation of the recovery of document authority score in a corrupted graph (disconnecting the graph by deleting documents with a high betweenness centrality).
The evaluation executes a simulation of 4 phases and uses the Watts-Strogatz model for graph generation and the Monte Carlo method for randomness.
Although the setup description is highly sufficient, some possible interesting parameters are hidden from the user.
For example, the authors state, "The deletion of documents based on betweenness centrality is dependent on the created graph." which makes it unclear how many edges were deleted on average or what the threshold was.
Both experimental setups of 9.2 and 9.3 are missing a link to their implementation's source code, making it impossible to replicate their outcome.
In conclusion, each chapter between 5-8 was sound, but continuing with section 9 felt that later introduced or specified concepts would fit better in earlier sections. By reflecting on the structure and order of sections, the following formal suggestions arise 1) Explain the degree-based, PageRank-based, and HuA-based approaches in Section 7 (move). 2) Explain ARS and other heuristics in Section 8 (move).
=== Strengths ===
* a concise and clear writing style
* a comprehensive description of the current issues
* a sound methodology for the design of the technical solutions
* sufficient and broadly covering related work
* an overall good scientific evaluation concerning the proposed approaches
* extensive paper supplements through appendices A and B to show proof of made assertions
=== Weaknesses ===
* [minor] mismatch in figure 1, listing 1, and text (see "nonce" and "verification method.")
* [minor] mention of Emily in figure 3 but not in the text
* [minor] parts of chapter 4 as a running example are unconducive regarding the rest of the paper, as
* never mentioned again
* [minor] missing aspects regarding versioning strategies in 6. (not necessary to require delegation for each change)
* [minor] missing mention of computational complexity (or performance) of heuristics compared to the optimal algorithm.
* [minor] missing link or reference to experimental source code
* [minor] few IMO interesting numbers are missing in the evaluation (e.g., 9.3 deletion threshold)
=== Further formal comments ===
* Stating some legal issues that, if solved, would benefit the paper's appearance and readability.
renaming Chapter 4 into "Exemplary Architecture and Terminologie."
* An example of a final resource based on Listing 2 and the usage of Signed URIS would benefit clearness.
* I suggest using example triples related to the running example.
* 9.1.3 highlight the Degree-based, PageRank-based, Hub- and Authority-based terms different than Formal Analysis and Conclusion (maybe bold)
* Some parts of the paper (paragraphs in sections 5-8 and 9) are unintuitively structured concerning the specification of the two proposed algorithms (ARS and HITS) and other available methods.
In conclusion, the paper should be accepted, but minor revisions are required based on the comments.
|