The Community Solid Server: Supporting Research & Development in an Evolving Ecosystem

Tracking #: 3665-4879

Authors: 
Joachim Van Herwegen
Ruben Verborgh

Responsible editor: 
Katja Hose

Submission type: 
Tool/System Report
Abstract: 
The Solid project aims to empower people with control over their own data through the separation of data, identity, and applications. The goal is an environment with clear interoperability between all servers and clients that adhere to the specification. Solid is a standards-driven way to extend the Linked Data vision from public to private data, and everything in between. Multiple implementations of the Solid Protocol exist, but due to the evolving nature of the ecosystem, there is a strong need for an implementation that enables qualitative and quantitative research into new features and allows developers to quickly set up varying development environments. To meet these demands, we created the Community Solid Server, a modular server that can be configured to suit the needs of researchers and developers. In this article, we provide an overview of the server architecture and how it is positioned within the Solid ecosystem. The server supports many orthogonal feature combinations on axes such as authorization, authentication, and data storage. The Community Solid Server comes with several predefined configurations that allow researchers and developers to quickly set up servers with different content and backends, and can easily be modified to change many of its features. The server will help evolve the specification, and support further research into Solid and its possibilities.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 24/Apr/2024
Suggestion:
Minor Revision
Review Comment:

The Community Solid Server (CSS) described in this contribution is the only up-to-date open source implementation of the Solid specification and hence of crucial importance for researchers and developers, and for Solid uptake at large.

The github repository associated with this tool is well organized and contains an appropriate readme file. The article refers to several additional documentation resources and tutorials.

General comments:

Section 1, line 9: 'decentralisation' is correct when considered from the company perspective; from a users' point of view, solid centralises the data (into a pod). Clarify this stance to avoid confusion between central and decentral.

Section 6: it could be stated explicitly that the CSS is written in TypeScript (if it is) and what motivated this choice.

References: refs 34 and 35 are not cited in the text, hence should be removed

The NSS has a GUI whereas the CSS has not. For use many use cases (section 3) and requirements (section 4) it would seem useful to have a GUI. I suggest adding reflections or arguments pro/con somewhere in the article, and motivate this choice for CSS.

The article states several times that the CSS is aimed at researchers and developers. I suggest adding a discussion of what would need to be changed or added to either use the CSS in a production environment, or in which respect a production grade server would differ from the CSS - if it would.

Minor textual comments
mixed use of spelling decentraliz* and decentralis*
p.2 line 26: comma needed after "used"
p.4 line 20: phrase "use cases" used twice. Suggest "... we give an overview of several use cases ...".
p.4 line 27: "would need at least two Solid server", server -> servers
p.4 line 36: "for now users", now -> new
p.6 line 49: "at a single at a small", keep one of both
p.7 line 42: "so cover it in detail below", so -> we
p.8 line 35: spell out CRUD in full
p.11 line 36: "exposed that exposing" suggest "showed that exposing"

Review #2
Anonymous submitted on 09/May/2024
Suggestion:
Major Revision
Review Comment:

Before I start with the paper, firstly note that I was only able to run this CSS on a specific version of Node JS. I had to purge the Ubuntu default version of Node JS by running the following:

sudo apt purge nodejs npm
curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.33.8/install.sh | bash
nvm install node

Such steps do not appear to be documented anywhere but are important for first point of entry. The server could be tested on the default contributions of popular systems to ease this process and appropriate steps could be provided.

This paper mainly describes the Solid project in Sections 1-3. In Section 4 it describes the tool CSS, where the following requirements are laid out:

1) Testing. The claim is that CSS has been tested to comply with the specification (2 sentences). This is mentioned again in Section 7.1, where it is stated: "Besides the unit tests there are also extensive integration tests, setting up complete instances of the server and verifying these instances conform to all the necessary specifications."
By searching I found the following description in repo.
https://github.com/solid-contrib/conformance-test-harness/blob/main/USAG...
This provides a long document explaining many parameters to set in order to run test. This doesn't give a reasonable point of entry for running a test suit that the authors intend. It is more directed at people who wish to create tests themselves. I suggest the documentation filters by these different audiences, and provides new content for evaluation by external reviewers in the context of the current process and also in the context of external people who are measuring the robustness of CSS before it is used in a commercial product for example.

This is already a major problem since I'm expecting that either the paper tells the reader how to evaluate and reproduce results, or, at least, there is (1) a clear link to how that evaluation can be performed and (2) the link contains clear instruction for how to execute the evaluation.

2) Evolving specs. The authors make the point that the modularity allows the server to evolve with the specs without interfering with each other. This is indeed a plus of CSS. Some developers in the community I hear complaining about the modularity though. While I agree with the authors on their design decisions here, perhaps that counter viewpoint, which results in some people falling back on NSS, should be acknowledged and weighed in a balanced manner. How can CSS be presented in such a way that the modularity is not an obstacle for entry for some developers?

3) Multiple roles. Solid consists of IDPs, authorisations servers, resource servers, apps, etc.. These could be independent or combined, but CSS can play the role of most of these (except an app I believe – incidentally, could a framework for apps also be supported?). This is related to later comments about multiple configurations, e.g., it’s good to explain how to step up a server so that it efficiently serves m out of n of these different roles. Therefore this requirement could be streamlined along with the others.

4) Related to (2) above the authors emphasise modularity. Modularity is a plus that could be given a clearer entry point for those who have struggled with the architecture (not the review) as mentioned above already.

A separate point made by the authors when discussing modularity is that CSS gives the ability to choose between WAC and ACP. The authors draw attention to performance of WAC v.s. ACP, but I doubt that performance is at stake in that choice. What is stake there is security. As explained in related work [1], the existing spec for WAC does not permit secure policies to be expressed, exposing critical attack vectors. In fact, it's not clear why WAC should be permitted at all, or why WAC is used in the default configuration when CSS is first installed from the repository (incidentally, default configuration are problematic -- can CSS out-of-the-box allow the owner select a good secure configuration). It would be good to explain why, given that we know WAC (at least the current WAC spec) leads to insecure pods, then why is WAC part of the CSS ecosystem, or at least accompanied by warnings about its usage. There is a compatibility argument, but in fact that is weak since pods should be able to govern their own policy model surely. Is there a way to reference guidelines for making secure choices when selecting from modular components?

5) Allow new features. Is this not evolving with the specs / modularity again? Can the requirements be gathered in a more streamlined manner?

6) Quick setup. The argument that quick set up is possible (again this is mixed in with modularity arguments and hence could be streamlined and sharpened). Quick setup and configurations is indeed a good selling point for CSS potentially if it can be confirmed (from independent tests with students this is questionable currently). This point on quick setup is revisited in Section 6 (line 43). The authors chose to list a specific configuration file. I find this listing unhelpful. Instead, it would be good to provide the configurations that are provided. That could be accompanied by some pointers for selecting between the configurations, and for evaluating them. It would strongly support the claim for quick setup if we can confirm in this review process that indeed the information is there for someone to select and quickly deploy and test a number of different setups. This paper can be revised to facilitate such an evaluation that would be useful beyond just the review process.

7) Errors and logs. Logging is perhaps one of the most critical parts of CSS; not least because primary use cases for CSS handle personal data and there are legal reasons for retaining logs. The authors don't mention this or refer to related work on the need for logging in Solid. Surprisingly "logging" is not mentioned again in this paper, so we are not directed towards a way to evaluate the scope of the logging and whether it is appropriate for various purposes e.g., debugging v.s. cyber security audits v.s. resolving legal disputes. Does what is logged change if the parties that a deployment of CSS represent in the Solid ecosystem change, and why?

Beyond the requirements listed:
The evaluation in Section 7 is rather subjective and difficult to independently verify. It's OK to have this discussion and it is good that the discussion tries to ground itself in points raised earlier in the paper. However, it would be good to tell the reader how you expect the reader to evaluate the server. What can we reproduce and see working with our own eyes when we run CSS on our own machines? The paper could in general be much more directed. A priori, I want to accept this paper due to the role CSS plays in the development of Solid. However, I have the above concerns about the accompanying paper that I propose should be considered in a major revision.

The related work could be more detailed and provided concrete and specific details about how the implementations differ to be able to situate CSS in this landscape. Nextcloud Solid from PDS Interop must appear along with the other implementations correctly covered.
https://github.com/pdsinterop/solid-nextcloud

A minor point. Since there are many contributors to CSS, do they have consent to not be named explicitly in this tool description? The authors could possibly point to whether they have raised this point in the community. If the community is fine without more than two people being name (not as authors by the way), then the reviews would accept any community decision here. This is mainly to mitigate side effects and is down to the judgment of the authors whether or not to follow up on this suggestion.

[1] https://www.mdpi.com/2078-2489/14/7/411

Review #3
By Christoph Braun submitted on 13/May/2024
Suggestion:
Major Revision
Review Comment:

Paper summary

The paper presents the "Community Solid Server (CSS)" as a modular implementation of the Solid Protocol.
First, in the introduction, the authors explain the problem of siloed personal data, and how Solid [sic] is aiming to provide a solution to that problem. CSS is mentioned towards the end of the introduction as a tool to support research and development of Solid specifications; to be presented in the paper at hand.
Next, in the related work section, the authors provide an overview of preliminaries on access controlled HTTP requests and the specification of the Solid Protocol. Existing other implementations of the Solid Protocol are mentioned.
Then, use cases for CSS are briefly outlined.
Following the use cases, requirements for the CSS are described.
For the CSS architecture, the authors proceed to present the main abstract components as well as a walkthrough of a HTTP request sequence in the CSS.
Next, configuration of the CSS is briefly explained as relying on the dependency injection framework Components.js, where code components are represented as RDF classes, then interlinked and thereby extensible and exchangeable.
Then, the authors address sustainability, usage and impact of the CSS:
For usage, the authors provide details from the CSS Github repository.
Claiming (community) impact, the authors provide a short list of Github repositories that use CSS in their work.
For (use case) impact, the authors describe how the modularity of the CSS configuration and architecture provide a solution for the outlined use cases.
For sustainability, the authors explain the employed test strategy and feedback they provided to the community, claiming community impact.
The authors provide pointers to available documentation, tutorials and supporting tools to help new users of the CSS.
Finally the authors conclude, summarizing the contents of the paper.

---

Strong points
- The presented tool seems very useful (and indeed is by my experience).
- The presented tool seems well documented and thus usable by new users (also according to my observation).
- The presented tool had a positive community impact (also according to my observation).

Weak points
- The paper does not clearly define terminology that is used throughout the paper.
- The paper does not clearly establish connections between the contents of single sections (e.g. sections 3 and 4, or sections 5 and 6).
- The paper does not clearly delineate what is "just" implementation of the Solid Protocol and what are (software) architectural design choices to provide the tool's main value.

---

Overall impression and general comments

The presented tool seems to be useful to and well regarded in the community.
The submitted paper therefore seems valuable to the community.
Considering the current state of the paper, however, I need to recommend a major revision for the paper to match the high quality of the presented tool.
Nonetheless, I highly encourage the authors to further work on this valuable paper!

The paper could be improved by focusing on what value the CSS provides that other implementations of the Solid Protocol do not provide.
While the paper's title indicates the unique selling point of the CSS, vast parts of the CSS functionality/capability description are defined by the Solid Protocol.
Therefore, a reader unfamiliar with the Solid Protocol is hardly able to distinguish what the CSS provides and what is simply implementation of specification.

To offer some guidance on what I would have expected:
The CSS is (A) an implementation of the Solid Protocol but most importantly (B) implemented in a dauntingly modular fashion.
For a tools report presenting CSS, (A) can be covered in a Preliminaries section, and checked off the list. (There are other implementations of the Solid Protocol, CSS is not special here.) (B), however, is the unique selling point of the CSS.
The tool's architecture description may center around explaining what design decisions, abstractions and other means make the CSS so modular.
So, rather than explaining that the CSS indeed implements the Solid Protocol, I encourage the authors to explain how the CSS is made so modular to support research and development as this is the premise of the paper.

Before commenting on the paper's structure, I would like to ask the authors to consider defining and delineating the terms "Solid", "Solid ecosystem", "Solid Protocol", "Solid Project" in the context of the paper. Which ones of those are interchangeable?
Similarly, I would encourage the authors to clearly define terminology used throughout the paper.
For example, what exactly is a "server", "resource", "Solid Pod" (never been introduced), "Solid server" and so on. While some terms are roughly explained, I am unsure if this is enough for a reader unfamiliar with the Solid Protocol to understand this paper.

For the paper structure, I would like to ask the authors to consider re-structuring the beginning of the paper:

1. The introduction does not seem to introduce the tool to be presented but instead presents preliminaries on the Solid ecosystem.
As a reader, I would have expected to learn about
- the problem which the tool addresses
- why is it interesting/important to solve that problem?
- what makes the problem non-trivial
- if there are previous solutions, what is wrong with the others / what is the tool offering over the others?
- what are the components is the tool composed of / what is the design approach
- are there any limitations?
Information on what the Solid Project tries to achieve could be placed in a Preliminaries section.

2. The authors may want to consider moving section 2.1 and 2.2 into a Preliminaries section. Sections 2.1 and 2.2 are not only related, they are fundamentals as are directly implemented in the CSS.
I would then suggest to first describe the specifications (identification, authentication, authorization, data operation), and then explaining a UML sequence diagram of an HTTP request. To this end, the authors might want to consider combining 2.1 and 5.2 (extracting common Solid Protocol elements to that section).
This would give section 5.2 more space to further explain the modularity in implementing the already presented details. Some of these details are already present, e.g. ResourceStore, DataAccessor and the details on content-negotiation.

For related work, are there other tools/systems reports that are similar to this work?
Maybe using other technologies, but also for the purpose of easing R&D efforts?

3. The use cases read like artificial user stories. The authors could improve this section by shortly describing how these use cases were created, collected and evaluated.

4. The text indicates that the requirements follow from the use cases. The section could be improved by providing a mapping between the presented use cases and the presented requirements. Otherwise, both read detached from each other and seem standalone.

5. The authors could improve the architecture section by explaining how their software design choices result in the claimed modularity.
Instead of focusing on a request sequence, which is well defined by the Solid Protocol, the authors could present a UML component diagram of the server, explaining interfaces that allow for the claimed modularity. In addition, the authors could explain particular design choices in their implementation that are not just trivially implementing the standard.
For example, I recall that CSS relies on concepts such as "waterfall handler" or "parallel initialiser", mentioned in the architecture documentation (https://communitysolidserver.github.io/CommunitySolidServer/latest/archi...).

6. I see two parts in this configuration section: First an introduction to Components.js, which could also be a short section in a preliminaries chapter. And second a description, what the choice of Components.js entails for the modularity / flexibility of the CSS. Which I think was sufficiently well explained.
As in Figure 1 the components of the CSS are listed, I would have expected in Section 5 to see a UML diagram that presents these components and how they relate to each other in the architecture.

7. Sustainability, Usage & Impact. The authors could improve this section by shortly explaining how sustainability is defined in the context of this paper.
While the authors outline current usage of the CSS within the community, do the authors know of any projects or institutions using the CSS for R&D purposes as outlined?

Overall, I appreciate the paper for the presentation of the CSS, a valuable tool for the Solid Community. I highly encourage the authors to continue improving the presentation in the paper to match the high quality of the software!

---

Detailed comments (expanding on the general comments from above)

0. Abstract

Summarizing the abstract:
Solid Protocol
- implementations exist
- ecosystem is evolving (comment: why ecosystem? why not protocol? specification?)
- therefore, need for implementation that enables research in features
setting up quickly varying development environments
CSS
- modular server
- with pre-defined configurations
- supports custom configuration
Goal of the server: evolve specification, support research into Solid specifications

As a reader, I would like to know as quickly as possible what the tool tries to accomplish.
Suggestion: Put the goal of the server towards the beginning of the abstract.

1. Introduction

Questions
1.)
- Despite this being a tools report, I would have expected some references for the claims stated in the beginning of the introduction.
- "a considerable challenge for newcomers trying to establish themselves in a particular sector" - how do the authors address this challenge with the tool? Why is this a problem the authors consider?
- recent legislative changes - which ones do the authors refer to?
- "tangible benefits from data sharing" - such as?
the idea described here is that users would share their data with services of high (enough) quality
but if users are required to share their data to use the service in the first place, what is the tangible benefit?
How does the tool relate to or have an impact on this vision?
- Description of the "Solid ecosystem" seems very detached from the first part of the introduction.
- Description of the "Solid ecosystem" mingles description what "open standards" mean and what the core idea of the Solid Protocol (decoupling identity, data, and application) entails, e.g. client's should be seen as views and controls over data. Which is also just an opinion and not defined as such in the Solid Protocol.

2. Related Work

2.1)

This section could benefit from a clear definition of terms used throughout the paper.
After having read this section, I am unsure what the definition of "server" is. (just a HTTP server, a Solid Pod, a Solid Pod server, an OpenID Provider, ...). The term "server" is used throughout the paper.

While I see the benefit of a high-level description of the client-server interaction, I wonder if the client-server interaction following the Solid Protocol is that much different from other client-server interaction that includes authentication, authorization and data access.
Perhaps the description of the protocol flow could be placed after the description of the Solid Protocol, allowing to reference the specifications from the flow description?

Questions
2.1)
- "client–server contract" The term contract seems ambiguous in this context. What kind of contract are the authors referring to? legal contract, smart contract, ...

2.1.1)
- "One suggested solution ..." suggested where?
- "WebIDs [4], which are Linked Data resources that uniquely identify ..."
should be: WebID, a HTTP URI that identifies an agent
- the roles involved in the protocol described in 2.1.2 are mentioned but not defined / clearly described in advance
- what is an identity provider?
- what is a Solid server?
- what is a Solid pod?
- what is a server?

2.1.2)
- The wording is a bit ambiguous in this section. In particular, it reads as "for each request to a Solid server". This would then entail that each time a client requests data from a Pod, the user needs to authenticate at an IDP.

2.2.1)
- The second paragraph of this section is what I would have expected to read in the introduction.

2.2.2)
- "it defines specific semantics for [...] interacting with containers of resources. Containers are resources that group other resources together by providing RDF descriptions with containment triples" how is this different from an LDP Basic Container?

2.2.3)
- "The Solid-OIDC specification [7] defines everything related to authenticating with Solid." is an incorrect statement. The Solid Protocol defines everything that may or may not be used for authenticating according to the Solid Protocol. While Solid-OIDC is the current normative specification for authentication in the Solid Protocol, WebID-TLS is an additional non-normative specification that defines an additional authentication method. Moreover, as the authors dove into the "history" of the Solid Protocol in 2.2.1, WebID-TLS once was the normative specification to implement authentication for the Solid Protocol.

2.3)
- "What about ..."
- GoLD? (https://github.com/linkeddata/gold)
- the Nextcloud plugin? (https://github.com/pdsinterop/solid-nextcloud)
- the PHP one? (https://github.com/pdsinterop/php-solid-server)
- Manas? (https://github.com/manomayam/manas)

3. Use Cases

- "In this section we will cover several use cases that give an overview of several use cases we wanted to support by creating a new server." there seems to be a doubling in wording?

3.1)
- "to inform future spec changes" - while the term "spec" is very common in the community, it is an informal abbreviation of the term specification. I would suggest using the formal wording in the paper.

3.4)
- The description lacks clarity regarding what the server should offer to the researchers to satisfy the use case.

4. Requirements

4.1) - How is this requirement distilled from the use cases?

4.2) - The authors could improve this paragraph by delineating it from Requirement 4.4. Is this requirement actually about versioning of different states of implementation and specification?

4.3)
- Clarity of the overall paper could be improved by providing precise descriptions of the terms and roles involved in the Solid Protocol in a dedicated section (as mentioned earlier):
"Multiple servers are involved in a Solid-based interaction: the pod server handles the core Solid protocol, and the OpenID Provider provides OIDC authentication."
- The authors could improve this paragraph by elaborating how the requirement of "supporting multiple server roles" would "allow[s] any part of the set of Solid specifications to be investigated and experimented with, not just a subset of it". This sounds to me more like a "implement everything" requirement.

5. Architecture

5.1.2) "The full of the request is reconstructed" somehow, there seems to be a link at the 'o' of the 'of'.

5.2)
- In which way is this special to the CSS? How is this different from a usual implementation of the Solid Protocol?

5.2.4)
- "The first store uses ..." What about the other stores?
- "Content negotiation" - this is the style I was expecting: Clearly explained design choices particular to the presented implementation.
In contrast to: "POST only works when targeting a container." which is clearly defined in the Solid Protocol.

6. Configuration
- Maybe the authors could add a section explaining the nature of linking that the configurations do, give an example on what that means?

7. Sustainability, Usage & Impact

7.1) Sustainability
- The authors could improve this section by defining the term sustainability in the context of this work.
- The authors could improve this section by explaining how their test strategy contributes to the sustainability of the tool. In addition,

7.2) Use case impact

- The authors may want to consider renaming this section. I am unsure if impact is a suitable word here.
- In 7.2.3: "We want to support" Do the authors mean "A researcher wants to ..."?