Review Comment:
# Review: A comparison of object-triple mapping framework
## Guidelines
1. Overall Impression (0-100) ... higher is better
2. Decision: Accept, Minor Revision, Major Revision, Reject
3. This manuscript was submitted as 'Survey Article' and should be reviewed
along the following dimensions:
- (1) Suitability as introductory text, targeted at researchers, PhD students,
or practitioners, to get started on the covered topic.
- (2) How comprehensive and how balanced is the presentation and coverage.
- (3) Readability and clarity of the presentation.
- (4) Importance of the covered material to the broader Semantic Web
community
## Review
1. Overall Impression: 85
2. Decision: Accept
3. Detailed Review:
In their work, the authors present a comprehensive survey on existing triple-object mapping framework and compare them based on both a set of qualitative criteria and a performance benchmark. The article is motivated by the necessity of helping non-expert developers to work with Linked Data by allowing them to work according to the well-known object-oriented paradigm. Object-triple mapping (OTM) solutions allow for interacting with triple-based data in an object-oriented fashion. In their work, the authors aim to help developers to choose the most suitable OTM solution and hence, propose a comparison framework for such solutions.
After providing more background information and related work, the authors detail the aspects to be considered in the comparison of such OTM libraries. Therefore, they present and describe the qualitative criteria to assess the libraries as well as the criteria of a performance benchmark.
In the presentation of the libraries, the authors provide a brief introduction into the selected 12 libraries. The selection of these libraries is clearly motivated and it is also mentioned, why some libraries are omitted from the comparison. The main arguments for omitting a library are (1) long inactivity (5 years), (2) to non-open source code, or (3) bugs. All of the 12 libraries are compared according to the set of qualitative criteria, which are categorized in general, ontology-specific and mapping criteria.
The second comparison measure is a performance benchmark developed by the authors. In the benchmark, a selection of 5 libraries out of the 12 overall selected libraries are compared. The selection of these libraries is motivated by the authors' requirements for comparability in the performance benchmark. The requirements are that the libraries must be written in Java since the comparison framework is based on Java. Additionally, to provide a level playing field by requiring the libraries to implement the RDF4J API for data storage access. The performance is evaluated for 6 different types of operations: create, batch create, retrieve. retrieve all, update and delete. In the analysis, the execution time of these operations are compared. For these operations a certain number of instances are defined (e.g., a total of 600 instances in the create operation). However, it is unclear to me, how these numbers are determined. Furthermore, even though all operations use the same number of instance for all libraries, I would recommend an illustration of the throughput (instance/millisecond) or its inverse (milliseconds/instance) instead of the overall execution time. I believe this would provide a more intuitive understanding of the figures.
Finally, in the conclusion, the authors summarize the comparison framework presented in their work. Furthermore, the results of the performance benchmark are summarized. In my opinion, I think it would be also helpful to provide a summary of the qualitative criteria (e.g., how strongly do current solutions differ and which are the main differences). Moreover, it would be helpful if the authors provide a judgment on the importance of the performance benchmark in real-world applications. For instance, are the tested memory sizes an actual restriction in a real application and how important are the execution times compared to the qualitative criteria. This would improve the guidance for selecting a library.
Overall, the authors clearly outline their approach and the contributions of the article. The readability and presentation of the results are very clear and allows for easily following the authors' line of argument and train of thought. In my opinion, the article is very relevant to the Semantic Web community and especially helpful for practitioners. The work provides a helpful overview of available OTM libraries and provides a basis for an informed decision on the selection of one library for a given use case.
Additional minor comments:
- I suggest making it clear in the abstract that the performance benchmark merely considers a selection of Java-based OTM frameworks with support for the RDF4J API. These restrictions of the performance benchmark should also be mentioned in the contributions.
- I recommend to include a criterion on usability in the qualitative criteria since usability is a key factor in the motivation of the work. As a developer and non-expert in Linked Data, how does the usability of the libraries compare to those, which the developer is used to (e.g. popular object-relational mapping libraries).
- In the title the authors talk about comparing "frameworks", however in section 5 they compare "libraries". I would recommend to authors to use a single consistently or provide a clear separation between these terms.
- Since the comparison framework, especially the performance benchmark, will frequently need to be updated when new versions of the libraries or new libraries become available, it would help to have more information on the benchmark in the source code repository. This should include requirements and an installation guide for the benchmark as well as information on how the benchmark can be extended with new libraries.
Overall, I recommend accepting this submission.
|
Comments
KOMMA - features and performance
We are the authors of the KOMMA framework.
Thank you for this interesting work which we are sure will help to further improve the performance of OTM solutions.
Regarding KOMMA we have the following remarks:
Features:
Performance:
KOMMA's data change tracker (required for correct undo/redo behaviour) needs to be disabled to achieve good performance in batch operations. Please use this code to disable it. (We can provide a better API to disable it in the future.) Otherwise any insertion or removal of a statement is validated with a query against the database.
em.createQuery("construct { ?r a . ?s ?p ?o } where { ?r (!<:>|<:>)* ?s . ?s ?p ?o }")
.setParameter("r", uri).getSingleResult(OccurrenceReport.class)
Additionally, we are not able to reproduce the performance figures of Alibaba. Maybe this is due to a difference in the repository configuration for GraphDB (We created the benchmark repository with the default settings from the workbench).
RE: KOMMA - features and performance
Dear reader of this comment thread,
We have been in contact with KOMMA developers so we post a reply here mainly for information of anyone reading Ken's original comment.
First of all, we thank the KOMMA developers very much for their insight into KOMMA. In order to make our benchmark repeatable for the paper readers, we based the benchmark setup of each tool on information available in its documentation and a limited investigation of its source code (about 5h) only. Unfortunately, none of the remarks posted by Ken were part of KOMMA documentation.
We kindly ask KOMMA developers to include the remarks into the documentation. Then - since we have already implemented the changes based on our communication with them - we are able to rerun the updated benchmark, include the results on the benchmark web and update the paper in its next revision.
Alibaba performance
As far as AliBaba performance figures are concerned, we can reproduce them on the benchmark set up described in the paper without any special GraphDB configuration.
Any additional results on different machines are welcome. In case the results differ, we would be happy to further investigate them.