Improving Linked Data development experience with LDkit

Tracking #: 3816-5030

Authors: 
Karel Klima
Ruben Taelman
Martin Necasky1

Responsible editor: 
Aidan Hogan

Submission type: 
Tool/System Report
Abstract: 
The adoption of Semantic Web and Linked Data technologies in web application development has been impacted by less-than-optimal development experience. Front-end web application development is inherently challenging, as developers must master a multitude of technologies and frameworks even to build simple applications. Adding Linked Data technologies to the mix further complicates this challenge by increasing the number of technologies that need to be learned. The Semantic Web community has historically struggled to provide front-end developers with quality tools and libraries for working with Linked Data; consequently, developers often prefer traditional solutions based on relational or document databases that offer far superior developer experience. To address this issue, we developed LDkit, an innovative Object Graph Mapping (OGM) framework for TypeScript. The framework works as the data access layer, providing model-based abstraction for querying and retrieving RDF data. LDkit transforms the data between RDF representation and TypeScript primitives according to user-defined data schemas, simplifying the use of the data and ensuring end-to-end data type safety. This paper introduces LDkit, describes its design and implementation fundamentals with focus on its developer interface and integration with other related technologies. Building on community feedback and experience from using LDkit, we introduce major enhancements that simplify data querying and update for common and uncommon web application scenarios, further improving developer experience. Finally, we demonstrate impact of LDkit by examining usage of the framework in real-world projects. Through the provision of an efficient and intuitive toolkit, LDkit aims to enhance the web ecosystem by making Linked Data more accessible and integrated into mainstream web technologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Daniel Schraudner submitted on 16/Apr/2025
Suggestion:
Major Revision
Review Comment:

In their Tool/System Report "Improving Linked Data development experience with LDkit", the authors present LDkit, their Object Graph Mapper for TypeScript as well as its usage. Their work is based heavily on the previous paper "LDkit: Linked Data Object Graph Mapping Toolkit for Web Applications" by the same authors.

The authors claim as contributions:
1. The presentation of LDkit itself. In my opinion this cannot be a valid contribution as LDkit itself has already been presented in the previous paper.
2. Recent enhancements of LDkit. It is, however, not exactly clear, what those improvements (especially in relation to the previous paper) are; only on page 16 there is a short paragraph about "LDkit 2.0 improvements".
3. An empirical analysis of LDkit's usage. This is in my opinion the only proper contribution of this article. The analysis itself, however, needs improvement (see below).

In the Introduction the authors motivate the need for a tool like LDkit well, but they also make several claims for such a tool to which they never return to in the rest of the paper or even evaluate them (e.g. reducing complexity while maintaining expressivity, developer-friendly and intuitive, efficient).

The Related Work section is well-structured, but I do not understand the need for the third subsection. An overview over different runtime environments does not give the reader any insights about the presented tool or its usage. I would instead have expected a comparison of the related work with the tool proposed by the authors here; this comparison can be found in the Developer Experience section instead.

In the LDkit section I was irritated by the use of Lens as a data abstraction layer without the mention of lenses from functional programming [1] which are also used as a data abstraction layer. Are the lenses of LDkit in any sense related to the functional lenses or is the name just a coincidence?
In this section I again wondered if the query interface of LDkit together with its Schemas is really more "intuitive" and "developer-friendly". Evidence for this fact is missing.
The singularization mechanism was explained well, I, however, missed an explanation of how the singular value is chosen out of the potentially big list of values.
The code examples in this section greatly contributed to my understanding!

The Usage section is supposed to be the main contribution of this article but also its biggest weakness.
In the first part, the authors assess the adoption of LDkit by analyzing different usage metrics. All of those metrics are, however, easily publicly available and of low significance. More accurate metrics would be good here.
In the seconds part the authors conduct a qualitative in use analysis. They, however, picked only three projects and tools for this, all of them with direct involvement of one the authors, and describe the usage of LDkit in the project or tool briefly. The fact that the authors used their tool in three of their own projects is only of low scientific value and does not lead to any valuable insights about the usage of LDkit nor really contributes to demonstrating its impact.

The Github repository of the tool is of very high quality. It has an understandable and extensive README as well as additional documentation. It is well-organized and seems complete.

All in all, I can see the high importance of a tool like LDkit. The quality of the article is overall good but too low for the part that is the actual contribution of this article (especially in relation to the previous paper). This makes it also hard to assess the actual impact of the tool. I would suggest the authors to improve their evaluation of the usage of LDkit which would strengthen their contribution as well as demonstrate its actual impact (which is potentially high).

[1] Foster, J.N., Greenwald, M.B., Moore, J.T., Pierce, B.C., Schmitt, A.: Combinators for bidirectional tree transformations: a linguistic approach to the view-update problem. ACM Trans. Program. Lang. Syst. 29(3), 17 (2007)

Review #2
By Jose Emilio Labra Gayo submitted on 29/Apr/2025
Suggestion:
Major Revision
Review Comment:

The paper presents LDKit, a library that can be used to work with RDF data in Typescript. The tool seems interesting and the library has several features like a good documentation, open source repository in github, being available as an npm package, tests, etc.

The paper presents some related work in three different aspects: web frameworks in general, Typescript/Javascript RDF libraries and Javascript runtimes.

I think the authors could include two other aspects which are about other RDF libraries and mappings from OO languages to RDF, as well as other web possibilities like WebAssembly. In the case of other bindings from OO languages and RDF, there are some historical proposals like Trioo: https://trioo.wikier.org/ (Disclaimer, I was one of the coauthors, I am not requesting the authors to cite it, but just suggesting that at least they could consider talking about some of those approaches that intended to provide a mapping between OO languages and RDF in a type safe way). Another approach that could also be interesting to mention in the related work is rudof (https://labra.weso.es/publication/2024_rudof_demo/) which could be compiled to WebAssembly and they could also provide an alternative way to work with RDF in the browser (I am again one of the coauthors).

The caption of Listing 1 says that it is the formal specification of LDKit schema, but I think it is an informal specification and the paper doesn’t describe it properly. For example, the paper doesn’t mention the fields “optional” or “multilang”. And the vocabulary employed in the specification is also not clear to me. I noticed that the online documentation of LDKit contains more properties like multilangarray which are not described in the paper. I would suggest the authors to describe better the different fields of the Schema and their semantics, as it seems to me that the Schema is an important and novel part of this proposal.

There are some questions that appear when one looks to the Schema specification…is it allowed to have recursion in Schemas…for example, to have a Schema that refers to itself in a nested definition?

One feature that I think the authors of LDKit should consider or address is the relationship between those schemas and ShEx schemas or even SHACL shapes. I think that the library could be very helpful if it could import subsets of ShEx/SHACL schemas and generate schemas, making the library compatible with those definitions.

Another possibility to explore or mention would be to work with VOID descriptions from SPARQL endpoints in a way similar to the recent SPARQL editor from SIB (https://github.com/sib-swiss/sparql-editor)

What are the possibilities of available datatypes in LDKit? Looking to the vocabulary, it seems that the available ones are Boolean, Number (Integer) or Date…is it possible to have decimals?

I am also wondering if the type system in Schema can enforce, for example, to have a multilingual string without an integer datatype. It seems that it can’t, but I am not sure.

Some minor comments:
- Page 3, line 47, “an of a driver…”
- I would suggest the authors to review some of the language in the paper to avoid statements which are not very academic and look more as a sales document, for example, the statement in page 4 that Communica is a highly modular and flexible query engine…I think I would try to avoid those adjectives which are not easy to prove.
- Page 5, line 19, “that is facilitated by NPM package registry”
- Page 5 line 21, “by the JIT transform (I think it should be transformer?) with Node.js
- Page 5 line 27 “test deploy applications…”
- Page 5, line 40, “The examples in the rest of paper”
- Page 5, line 41 “to to other runtime environments”
- Page 5, line 49, “client, server, or edge” the paper doesn’t introduce “edge” before, I wonder if it may be interesting to mention what the authors consider “edge” before…
- Page 6, line 5 “P1 Embraces Linked data heterogenity” (I think it should be “heterogeneity”
- Page 9, line 20, “browse data, more advanced interface is required…
- Page 9, line 21, “solutions provide to retrieve all…”
- Page 10, line 24, why do you use “take” and “skip” for “limit” and “offset”, why not use “limit” and offset” which would be more familiar to SPARQL users?
- Page 11, “LDKit assumes that the data present in the data source correspond to the defined data schema…” I think it would be a great feature if LDKit could read directly from ShEx schemas or SHACL shapes so it could work with already validated RDF data without the need to assume it.
- Page 15, Table 1, the Type safety entry may need some explanation because in the case of LDKit it says that it has automatically inferred types…but I think they are not inferred, they are declared in the Schema…while in the comparison to LDO, they are obtained from ShEx.
- In the previous table, the entry Environment compatibility looks to have the same values for each of the entries…maybe remove that entry as it doesn’t give any difference?
- Page 15, line 49, “LDKit provides unmatched tooling support” seems to me not very academic…
- Page 16, line 12…”a strong emphasis on adherence to W3C standards and recommendations…” is again not very academic…and specially since it seems to ignore ShEx or SHACL…
- Page 16, line 15 “... SPARQL
- Page 16, line 21 “LDKit provides a consistent and efficient way to handle RDF data” is again not very scientific.
- Page 16, line 43, “Since the complexity of such queries is directly proportional to the complexity of the data schemas, the more properties developers add to schema, the less performant LDKit becomes” This sentence raised a lot of questions for me…first, it seems that the queries always take all the properties in the Schema, is that the reason why they are less performant? Maybe those queries are more specific and less general, which could make them more efficient? How do you measure the performance? Maybe include a section about performance in the paper would be a good idea?
- Page 19, line 48 replace graphql by GraphQL

The section about usage shows some usage metrics which are not very high but nevertheless, I think it is a good idea and I encourage the authors to include working on this kind of software. It also mentions three scenarios which look fine although in the first two, the authors of this paper are involved. The third scenario looks similar to the work on Vincent Emonet with the SPARQL editor and it seems it could make sense to explore the use of LDKit to explore SPARQL endpoints and to obtain the schema declarations from the VOID descriptions.

In general, I think the paper and the tool described are interesting, although I think it needs some major revision to improve its quality with a better description of the Schema which would not require to read the documentation as well as other features which are not properly described (for example, the encoders and decoders are not described neither in the documentation nor in the paper. I also think the paper could give more details about the performance of the library as well as improve a little bit the related work section.

Review #3
Anonymous submitted on 06/May/2025
Suggestion:
Minor Revision
Review Comment:

This paper describes LDKit, a framework to assist managing querying RDF in Javascript/Typescript applications.
The authors propose having developers describe their desired data schemas, avoiding having to build SPARQL queries and the messy JSON SPARL endpoints usually return.

The paper is well written and easy to follow. I think it targets some of the challenges faced by developers, even if there is no quantitavive or qualitative evaluation to demonstrate so. I think the framework is quite useful, and I have managed to set up a project relatively quickly to use LDKit to query resources of my own SPARQL endpoint.
In short, I look forward to using this resource in the future in my Linked Data based applications.

I think the paper should be accepted. I provide minor comments that would be great to clarify in the final version:

- What are the differences between the previous version of LDKit and this version? LDKit was already presented in [15] according to the authors. I think the differences with the previous version should be clearly stated.

- Compliance with the FAIR principles: I recommend having a citation file (CFF) to state how to cite LDKit. This will help tracking which scientific applications use the toolkit. Similarly, I recommend establishing the GitHub-Zenodo bridge to get automated DOIs for the releases of the toolkit.

- Some claims should be relaxed a little: Since there is no quantitative evaluation with developers, I think
the authors cannot claim that there is an improvement in the development experience granted with LDkit (i.e., the title of the paper).
I sympathize with the proposed approach and agree in that the proposal seems to make things easier, but there needs to be a study to demonstrate this claim (I am not requiring to do a user study, but to relax the claims made by the authors).
For example, this approach still requires the developer to know about the
target ontologies being used, how the data is represented in the SPARQL endpoint, etc. to create the correct schemas. This is often not straightforward to do. Just an ontology like DBPedia has thousands of classes
and properties. From the point of view of a developer, it may be still easier to rely on a REST API that provides the calls already in JSON format. True, this is less flexible, but requires less effort. This point is not discussed in the paper, and it may be another of the main reasons why it's hard to consume RDF data. That said, a similar issue happens
with graphql.

- On this note, I am surprised to see the paper does not discuss efforts like GRLC (https://grlc.io), which take the route of having someone easily design a REST API on top of KGs. It's a different paradigm with the same goal: ease developer handling of RDF data. They also do transformations into JSON format, but the developer cannot specify the schema.

- How does LDKit handle complex types? For example, those where the range of a property is the union or the intersection between two classes. I have not seen examples on how to accomplish this. And these are quite common, e.g., in schema.org/author is a Person or an Organization. Including this in the article would help

- When I was setting up my own namespace, I found repetitive having to add the terms from the namespace and my own schema. Wouldn't it be easier to be able to consume JSON-LD contexts from ontologies directly? (I did not see whether this is supported or not)
- Also, please clarify what is the minimum version of node required to run the LDKit, as I had to figure this out by myself.

Minor issues:
- On the sentence: "Should developers encounter performance issues, we recommend refactoring the data schemas by removing un-necessary properties, splitting the schemas into multiple simpler ones, or reducing schema nesting. Often, executing several simpler SPARQL queries can be much faster than running a single complex one.". Is there some sort of guidance or sweet spot as to the general number of properties that are supported before degradation? Some guidance would help.

- Typo: ans SPARQL update -> and. In section 4.3