A comparison of object-triple mapping frameworks

Tracking #: 1910-3123

Authors: 
Martin Ledvinka
Petr Křemen

Responsible editor: 
Philippe Cudre-Mauroux

Submission type: 
Survey Article
Abstract: 
Domain-independent information systems like ontology editors provide only limited usability for non-experts when domain-specific linked data need to be created. On the contrary, domain-specific applications require adequate architecture for data authoring and validation, typically using the object-oriented paradigm. So far, several frameworks mapping the RDF model (representing linked data) to the object model have been introduced in the literature. In this paper, we develop a novel framework for comparison of object-triple mapping solutions in terms of features and performance. For feature comparison, we designed a set of qualitative criteria reflecting object-oriented application developer's needs. For the performance comparison, we introduce a benchmark based on a real-world information system that we implemented using one of the compared OTM solutions -- JOPA. We present a detailed evaluation of a selected set of object-triple mapping libraries and show that they significantly differ both in terms of features and time and memory performance.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 30/Jul/2018
Suggestion:
Accept
Review Comment:

# Review: A comparison of object-triple mapping framework

## Guidelines

1. Overall Impression (0-100) ... higher is better

2. Decision: Accept, Minor Revision, Major Revision, Reject

3. This manuscript was submitted as 'Survey Article' and should be reviewed
along the following dimensions:

- (1) Suitability as introductory text, targeted at researchers, PhD students,
or practitioners, to get started on the covered topic.
- (2) How comprehensive and how balanced is the presentation and coverage.
- (3) Readability and clarity of the presentation.
- (4) Importance of the covered material to the broader Semantic Web
community

## Review

1. Overall Impression: 85
2. Decision: Accept
3. Detailed Review:

In their work, the authors present a comprehensive survey on existing triple-object mapping framework and compare them based on both a set of qualitative criteria and a performance benchmark. The article is motivated by the necessity of helping non-expert developers to work with Linked Data by allowing them to work according to the well-known object-oriented paradigm. Object-triple mapping (OTM) solutions allow for interacting with triple-based data in an object-oriented fashion. In their work, the authors aim to help developers to choose the most suitable OTM solution and hence, propose a comparison framework for such solutions.

After providing more background information and related work, the authors detail the aspects to be considered in the comparison of such OTM libraries. Therefore, they present and describe the qualitative criteria to assess the libraries as well as the criteria of a performance benchmark.

In the presentation of the libraries, the authors provide a brief introduction into the selected 12 libraries. The selection of these libraries is clearly motivated and it is also mentioned, why some libraries are omitted from the comparison. The main arguments for omitting a library are (1) long inactivity (5 years), (2) to non-open source code, or (3) bugs. All of the 12 libraries are compared according to the set of qualitative criteria, which are categorized in general, ontology-specific and mapping criteria.

The second comparison measure is a performance benchmark developed by the authors. In the benchmark, a selection of 5 libraries out of the 12 overall selected libraries are compared. The selection of these libraries is motivated by the authors' requirements for comparability in the performance benchmark. The requirements are that the libraries must be written in Java since the comparison framework is based on Java. Additionally, to provide a level playing field by requiring the libraries to implement the RDF4J API for data storage access. The performance is evaluated for 6 different types of operations: create, batch create, retrieve. retrieve all, update and delete. In the analysis, the execution time of these operations are compared. For these operations a certain number of instances are defined (e.g., a total of 600 instances in the create operation). However, it is unclear to me, how these numbers are determined. Furthermore, even though all operations use the same number of instance for all libraries, I would recommend an illustration of the throughput (instance/millisecond) or its inverse (milliseconds/instance) instead of the overall execution time. I believe this would provide a more intuitive understanding of the figures.

Finally, in the conclusion, the authors summarize the comparison framework presented in their work. Furthermore, the results of the performance benchmark are summarized. In my opinion, I think it would be also helpful to provide a summary of the qualitative criteria (e.g., how strongly do current solutions differ and which are the main differences). Moreover, it would be helpful if the authors provide a judgment on the importance of the performance benchmark in real-world applications. For instance, are the tested memory sizes an actual restriction in a real application and how important are the execution times compared to the qualitative criteria. This would improve the guidance for selecting a library.

Overall, the authors clearly outline their approach and the contributions of the article. The readability and presentation of the results are very clear and allows for easily following the authors' line of argument and train of thought. In my opinion, the article is very relevant to the Semantic Web community and especially helpful for practitioners. The work provides a helpful overview of available OTM libraries and provides a basis for an informed decision on the selection of one library for a given use case.

Additional minor comments:

- I suggest making it clear in the abstract that the performance benchmark merely considers a selection of Java-based OTM frameworks with support for the RDF4J API. These restrictions of the performance benchmark should also be mentioned in the contributions.

- I recommend to include a criterion on usability in the qualitative criteria since usability is a key factor in the motivation of the work. As a developer and non-expert in Linked Data, how does the usability of the libraries compare to those, which the developer is used to (e.g. popular object-relational mapping libraries).

- In the title the authors talk about comparing "frameworks", however in section 5 they compare "libraries". I would recommend to authors to use a single consistently or provide a clear separation between these terms.

- Since the comparison framework, especially the performance benchmark, will frequently need to be updated when new versions of the libraries or new libraries become available, it would help to have more information on the benchmark in the source code repository. This should include requirements and an installation guide for the benchmark as well as information on how the benchmark can be extended with new libraries.

Overall, I recommend accepting this submission.

Review #2
Anonymous submitted on 25/Oct/2018
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

The paper describes a motivating scenario in regard to a fictive safety management system for the field of aviation. However, it is not really clear why the development of a domain-specific user interface for this system demands an OTM solution. For example, the system could also be realized by using a mixture of HTML, RDFa and Javascript.
Maybe it would be better to describe the gap between an object-oriented approach in widely used programming languages and the logic-based RDF data model with properties as first-class citizens. One advantage in using an object-oriented domain model instead of directly accessing/manipulation the data with a generic RDF framework is for example the support for type checks at compile time, refactoring operations and IDE features like class or property lookups. To map between the object-oriented model and the RDF model an OTM solution reduces boilerplate code (as stated in the paper) but also potential mistakes when converting the data from one representation into another.

The paper also gives a short but sufficient introduction to RDF, its ontology languages and the SPARQL query language.

The paper gives enough contextual information to introduce the topic to the reader. An improved motivating scenario would further increase the value of the paper.

(2) How comprehensive and how balanced is the presentation and coverage.

The paper introduces a comparison framework for OTM solutions with general (GC), ontology-specific (OC) and mapping (MC) criteria. Most criteria are well chosen and reflect common requirements in regard to the selection of an OTM technology.

Note: The criterion OC1 (Explicit inference treatment) requires a closed-world view onto the ontologies within the mapped knowledge base/RDF graph to decide if a property value is or can be inferred or not. Hence it is questionable if such a feature is required for an OTM framework. Some frameworks also support read-only properties which may also be sufficient to satisfy OC1.
Furthermore, the criterion OC1 needs a precise specification in the paper to be helpful.

Note: The criterion OC3 requires a bit more explanation. For example, KOMMA provides support for aspect-oriented programming and dependency injection with Google Guice. Both were used to implement access control for RDF models in the eniLINK platform (https://github.com/enilink/enilink/blob/master/bundles/core/net.enilink....). Similar concepts are also possible with the Alibaba framework and can be used to create provenance information based on PROV-O for example.

Note: The criterion MC1 is hard to understand and seems to be a bit artificial. If an unmapped class C extends two mapped classes A and B and c is an instance of C then it is understandable that persisting c may add the statement ‘c rdf:type A’ and ‘c rdf:type B’ to the repository (this would be some kind of lightweight inference). But what should happen in the opposite direction? Should the class C be used for all instances which have the RDF types A and B (hence C is the intersection of A and B)? Wouldn’t it be better if each entity class also requires a mapping to a concrete RDF type to prevent possible ambiguities?

The presented work evaluates the described criteria against twelve existing OTM frameworks for different programming languages. The results are compiled into an overview table and are adjusted by short textual descriptions.

As a further contribution the paper presents a benchmark consisting of a data model (ontology) and six benchmark operations. This benchmark was run against the Java based OTM frameworks and the results are reported within the paper along with a discussion for each of the tested frameworks.

In summary the presentation and coverage of the frameworks and the comparison is comprehensive and well-balanced. Some of the criteria need to be revisited and/or need to be described more precisely.

(3) Readability and clarity of the presentation.

Despite some of the criteria for feature comparison the readability of the textual descriptions (spelling, sentence structure, grammar) as well as the clarity of the presentation is very good.

(4) Importance of the covered material to the broader Semantic Web community.

The work gives an overview on current OTM frameworks for different programming languages. The criteria system and the benchmark provide a good base to compare existing and new solutions in this field. The benchmark is comparable to existing SPARQL benchmarks (LUBM and others) although its informative value largely depends on the benchmark’s implementation for a specific OTM framework due to the missing standardisation of OTM APIs (in comparison to JPA for example).

Since OTM frameworks are important technologies for implementing RDF-based software systems with object-oriented programming languages the presented work is highly relevant for the broader Semantic Web community.


Comments

We are the authors of the KOMMA framework.

Thank you for this interesting work which we are sure will help to further improve the performance of OTM solutions.

Regarding KOMMA we have the following remarks:

Features:

Performance:

KOMMA's data change tracker (required for correct undo/redo behaviour) needs to be disabled to achieve good performance in batch operations. Please use this code to disable it. (We can provide a better API to disable it in the future.) Otherwise any insertion or removal of a statement is validated with a query against the database.

  • OP1 − Create and OP2 − Create-Batch: Performance is largely improved by disabling the data change tracker.
  • OP5 − Update: The benchmark code needs to be modified according to this commit to correctly run the update within a transaction. In its current version the updates are run outside of the transaction. The use of "merge" is only required to move data between different entity managers.
  • OP3 − Retrieve and OP4 − Retrieve all: We are in the process of investigating the benchmark code. An issue is that a bean is invalidated if its properties are changed. Hence the benchmark loads the beans from the database AND reloads the created beans for comparison. Furthermore, KOMMA is able to load a whole object graph (in this case of a single OccurrenceReport) with:
    em.createQuery("construct { ?r a . ?s ?p ?o } where { ?r (!<:>|<:>)* ?s . ?s ?p ?o }")
    .setParameter("r", uri).getSingleResult(OccurrenceReport.class)

Additionally, we are not able to reproduce the performance figures of Alibaba. Maybe this is due to a difference in the repository configuration for GraphDB (We created the benchmark repository with the default settings from the workbench).

Dear reader of this comment thread,

We have been in contact with KOMMA developers so we post a reply here mainly for information of anyone reading Ken's original comment.

First of all, we thank the KOMMA developers very much for their insight into KOMMA. In order to make our benchmark repeatable for the paper readers, we based the benchmark setup of each tool on information available in its documentation and a limited investigation of its source code (about 5h) only. Unfortunately, none of the remarks posted by Ken were part of KOMMA documentation.

We kindly ask KOMMA developers to include the remarks into the documentation. Then - since we have already implemented the changes based on our communication with them - we are able to rerun the updated benchmark, include the results on the benchmark web and update the paper in its next revision.

As far as AliBaba performance figures are concerned, we can reproduce them on the benchmark set up described in the paper without any special GraphDB configuration.

Any additional results on different machines are welcome. In case the results differ, we would be happy to further investigate them.