SISSVoc: A Linked Data API for SKOS vocabularies

Tracking #: 658-1868

Authors: 
Simon Cox
Jonathan Yu
Terry Rankine

Responsible editor: 
Krzysztof Janowicz

Submission type: 
Tool/System Report
Abstract: 
The Spatial Information Services Stack Vocabulary Service (SISSVoc) is a Linked Data API for publishing vocabularies. SISSVoc provides a RESTful interface via a set of URI patterns that are aligned with SKOS. This provides a standard web interface to any controlled vocabulary structured using, or decorated with, SKOS classes and properties. It can be consumed via web clients as human-readable resources (such as HTML) and by client applications through machine-readable resources (such as RDF, JSON, and XML). SISSVoc is implemented using a Linked Data API façade over a SPARQL endpoint. The use of the Linked Data Approaches streamlines the configuration of content negotiation, styling, query construction and dispatching. SISSVoc is being used in a number of projects, mainly in the environmental sciences, where controlled vocabularies are used to support cross-domain and interdisciplinary interoperability. The SISSVoc standard interface makes it possible for the development of common client applications such as search clients and validation clients.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Alejandro Llaves submitted on 14/May/2014
Suggestion:
Minor Revision
Review Comment:

SUMMARY
This report describes the version 3.0 of SISSVoc, a Linked Data API for searching and retrieving RDF datasets based on SKOS. SKOS is commonly used to formalize vocabularies with a hierarchical structure. The methods of the SISSVoc API allow requesting SKOS resources, and filtering based on resource label and broader/narrower SKOS properties.

QUALITY, IMPORTANCE, AND IMPACT OF THE TOOL
The presented API provides abstraction in searches over RDF datasets organized by SKOS, which is a W3C recommendation. Authors claim that SISSVoc users do not need to worry about RDF, SKOS properties, or SPARQL. Therefore, this is a step forward to bring semantic technologies closer to non-expert users. Currently, the API is being used within the AuScope Portal in various environmental projects. Moreover, a validation service built on top of SISSVoc is presented in the report, which tells about its reusability.

The provided source code requires Windows OS. It would be great to have implementations for others operating systems, nevertheless there is no doubt that SISSVoc may have an impact on all those organizations using SKOS and the potential users of their data.

CLARITY, ILLUSTRATION, AND READABILITY
The report is well written and includes many examples. It also contains several references to related research (not only in section 3.4, but also in the introduction). I really liked how the URI patterns are explained in tables 1 to 5. Figures have a nice and simple design and are easy to understand. Authors describe the capabilities of SISSVoc in section 2. and comment some limitations related to REST behaviour in 3.2.

STRUCTURE
1. Introduction
2. SISSVoc design
2.1 SISSVoc API
2.2 Implementing SISSVoc
2.3 Deploying SISSVoc to meet use cases
2.4 SISSVoc deployments
3. Evaluation
3.1 URI Patterns
3.2 HTTP operations and REST behaviour
3.3 Applications
3.3.1 Water Data Transfer Format validation service
3.3.2 SISSVoc Search
3.4 Related Work
3.4.1 ONKI
3.4.2 Normalised Ontology Repository (NOR)
4. Conclusion
5. Acknowledgements
6. References

The structure of the report needs some changes. First, sections 2.3 and 2.4 should be merged. Second, the content and organization of section 3. Evaluation looks odd to me. I could not find a real evaluation of the tool. Section 3.1 talks about the SISSVoc pattern, which is based on Cool URI patterns, and some examples of Cool URI patterns (background/related work). Section 3.2 describes some limitations of SISSVoc, challenges of managing vocabularies, and some suggestions for this purpose. Then, sections 3.3 and 3.4 deal with applications built on top of SISSVoc and related work, respectively; thus, I do not see the reason for including them in an evaluation section. My suggestion is to move sections 3.3 and 3.4 out of section 3. Additionally, I would either rename section 3. or include a proper evaluation of the tool, e.g. a user-based evaluation or a use case evaluation for the different API requests.

MINOR REMARKS
Page 2:
- "But primarily SISSVoc also for machine-to-machine use,..." Anything missing here?
- "The SISSVoc API is (normally?) implemented using..."

Page 3:
- In the paragraph describing broader and narrower properties, all the properties have "/" at the beginning except broader. If you want to separate the properties using "/", remove the space after each property, i.e. broader/broaderTransitive/narrower/narrowerTransitive. If you want to list the properties like /[PROPERTY], add "/" to broader and commas between the elements in the list, i.e. /broader, /broaderTransitive, /narrower, /narrowerTransitive. My suggestion is to remove the "/" and add commas: broader, broaderTransitive, narrower, and narrowerTransitive.
- Table 2: The /collection pattern has half SPARQL query in page 2 and half one in page 3. If possible, try to rearrange the table to have the whole query in one page.

Page 4: Move the caption and header of table 5 to page 5 for the sake of readibility.

Page 6:
- Move URLs in first paragraph to footnotes.
- What does it mean that "the interfaces are distinct from the user's point of view."?
- "However, each interface uses the next one down for its configuration of real-time operation." I do not get the concept of "next one down". Please, rephrase.
- Section 2.4 should be merged to section 2.3.

Page 7: The reference to figure 3 appears in the text before the reference to table 6, so figure 3 should appear above table 6.

Page 8:
- Add a short section summary between heading 3. and 3.1.
- Use the same size and format for URI patterns throughout the paper. They are not underlined in page 3.
- There are more examples of unformatted URI patterns in the comparation between Cool URIs, DBPedia, and SISSVoc.
- Footnotes in this page are in a different format that in previous pages.

Page 9:
- "The standard interface provided by SISSVoc supports a wide (?) range of applications."
- Add a reference/footnote for Schematron.
- Redistribute text to fill the gap below figure 4.

Page 10: Add some content between 3.4 and 3.4.1.

Page 11: The NOR URI patterns should be formatted like the rest of URI patterns in the paper, and add some spaces above and below the patterns. The whole section 3.4.2 has an odd paragraph organization.

Acknowledgements and References are not regular sections and do not need to be numbered.

Revise the references, many of them look incomplete, e.g. [36] to [42]. In case of online resources, please add the URL.

Review #2
By Antoine Isaac submitted on 25/May/2014
Suggestion:
Reject
Review Comment:

It is very good to read practitioners' standpoint, with concrete cases, on such issues. The API proposed has some shortcomings (see below) but with the right story around the API, the paper could have been fine. The problem is that in the current state it is rather weak, especially for a journal such as SWJ. So I would invite the authors to re-submit a richer version.

First, the report of the cases is disappointingly small in regard to the initial promise in the abstract, and the other shortcoming of the papers (see below). The introduction of section 2 mentions that SISS is applied in 'other environmental information projects', and I believe section 2.4 is intended to do this. But it is very short! And Section 3.3 is also not very rich: for example, what does the vocabulary checking service look like, and how does it compare with existing services (Poolparty)? I would urge the authors to re-submit the paper with more details.

Then, the API design could be explained and motivated in more detail: does the API assumes that the underlying data has been completed according to SKOS inference rules? It seems that sometimes it isn't the case, as Table 2 presents a SPARQL query that seeks for instances of skos:Collection and skos:OrderedCollection (while the latter is a sub-class of the former). But on other times it seems that SKOS reasoning is assumed to have been performed:the queries for broaderTransitive and narrowerTransitive may otherwise not return many results, as triples with these properties are rather expected to be inferred than asserted by the original data publishers (according to the SKOS docs).

Later some calls/URI are explained really quickly, such as http://sissvoc.ereefs.info/vocab/ereefs/wq : it is said to be a 'document', but what is a document in such context? Is it a part of the API? Fig 3 actually doesn't connect the layer of RDF documents to the other ones, as the text in section 2.3 promises it ("each interface uses the next one down for its configuration or real- time operation"). And the URI doesn't work at the time of review, which doesn't help.

The API seems also really weak on labeling and the multilingual aspects, which are key to SKOS: it is surprising not to find skos:hiddenLabel not employed for the anylabel query. Further on, Pattern #11 presents a set of properties in the filter, which is different to the one in pattern #5, why one would expect the same (both patterns correspond to an 'anylabel' call). Many URI patterns also present a filter that select only English labels, which is really strange, if not a straight mistake.

Other decisions may be also surprising: why a call for getting all concept schemes (res. collections) would use 'conceptscheme' (resp 'collection') in singular. It seems to be derived from what the LD API recommends. But such practice is different from a number of other (SKOS) APIs. Reading more from the authors on this would make the paper more interesting. As a matter of fact it would have been useful to explain in more detail how the Linked Data API has been adopted for the SISSVoc one. What is being re-used, what is specific?

There are many such opportunities to strengthen the paper from a practitioner perspective, which should have been seized. For example, "requiring users to have bespoke means of querying for vocabulary resources." in the introduction. This is right, but to some extent this may be read as a failure of the entire linked data approach, where shared data model (RDF) and vocabularies (SKOS, etc) were supposed to solve such issues. Also, I don't understand why there is no explanation of how the SISVoc API evolved over time (this is now v3). I guess this evolution results from continuous experimentations and lessons learnt. These must be reported in the paper!

The study of the state-of-the-art is rather weak for a journal paper. One misses reference sand comparison with a number of APIs / Web service would really help the reader judge the relevance of the SISVoc API. The SKOS API used by SKOSEd (http://skosapi.sourceforge.net/), the terminology web service for the STW thesaurus (http://zbw.eu/beta/stw-ws) and several others are proposing functions that are similar to the ones presented in this paper. A number of tools are mentioned on the W3C SKOS wiki: http://www.w3.org/2001/sw/wiki/SKOS . And for many, the paper mentions links without any form of analysis, especially the SKOS API. And the more detailed studies more details are not complete. ONKI was further developed with a REST API that is much more similar to SISVoc than the SOAP Web service mentioned in 3.4.1 (https://code.google.com/p/onki-light/wiki/RESTv1).

Besides the above-mentioned lack of reference to relevant efforts, the bibliography section is also formally , with hyperlinks missing when they are crucial (24, 28, etc) and some editorial problems (year repeated twice in a number of cases). Some references seem to be left to the reader to find out (STAR).

Minor comments:
- what is the difference between references 11 and 17?
- ref 16 and 24 are also quite redundant
- the introduction says that v3 has been introduced in 15 but this paper also introduces v3?
- it is confusing to have the NERC SPARQL endpoint called a vocabulary service, the same way as the CSIRO Service. Why not calling it a SPARQL endpoint with vocabulary data, simply?

Review #3
Anonymous submitted on 28/May/2014
Suggestion:
Major Revision
Review Comment:

This paper describes the development of an API (SISSVoc) that provides easy access to vocabularies represented using SKOS. It supports different interfaces to access vocabularies, e.g., URI patterns specified in terms of classes and properties in the SKOS data model and a SPARQL endpoint. The Linked Data API has been applied to handle URI-based requests between SISSVoc and SKOS vocabulary resources.

In support of this paper, the proposed topic is timely and is grounded in relevant literature. The motivations are clearly formulated. The API is congruent with current developments/web practices (e.g., SKOS model, Linked Data API, SPARQL, RESTful API, Cool URIs). With the proposed URI patterns, simplified queries over SKOS vocabularies can be performed without any knowledge of RDF and SPARQL. The authors not only specify the API, but also describe its applications in supporting search and validation applications.

Despite these strong aspects, in its current form, there are several issues should be addressed before the paper can be considered for publication.

SISSVoc Design
a. The sub-sections should be restructured to improve readability. The authors chose to introduce a specific vocabulary interface (URI patterns), and then possible four related interfaces to access SKOS-based vocabularies (section 2.3), and finally the deployment information. I would suggest to first providing an overview of the related interfaces (section 2.3) at the beginning of this section, and then detailed information regarding their design and implementation. Some texts are repetitive (e.g., descriptions of Linked Data API at the beginning of section 2.0 and in section 2.2).
b. Several standard vocabularies of the SKOS data model are missing from Tables 1-5. Does the API support all the SKOS classes and properties? The authors should clarify the scope (capabilities and limitations) of the proposed API.
c. The descriptions of URI patterns are useful (Table 1-5). However, as the authors claimed that the API has been used in several environmental projects, I expect actual queries based on deployed vocabulary resources in tables 1-5. Alternatively, the author may consider providing a table with a selected pattern (with corresponding example and sparql query) from each of the existing tables, and then using an external link to describe the rest URI patterns.
d. The deployment details (section 2.4) are very much related to the implementation of the API (section 2.2). Why are these divided into two different sections?
e. You have listed several examples of SKOS vocabularies supported by the SISSVoc service (Table 6). It is not clear how the existing concepts are matched to concepts from an externally hosted vocabulary service.
f. Is it possible to use SISSVoc to extend query results with information from existing linked open data sources?

Section 3 - Evaluation
(1) I would argue that sections 3.1-3.2 do not belong in the “evaluation” section as they represent methodological information. For example, section 3.1 is related to the URI pattern specified in Table 1.
(2) “An RDF representation of the 2013 version of the geologic timescale delivered by a SISSVoc service in the form of a graph that mixes SKOS predicates with predicates from an ontology designed for the geological time-scale” – It is possible to query vocabularies with external predicates using the service?
(3) The applications (section 3.3) are interesting, but they are still very general regarding their implementation and results. In particular, examples of queries supporting the vocabulary validation process should be specified.
(4) Section 3.4 addressing related approaches should be made into a new main section (section 4). Related work should be extended - to what extent the API provides better support in retrieving SKOS-based vocabularies as compared to the PoolParty SKOS API?

Minor remarks:
• “SISS is currently being applied more broadly in other environmental information projects” (page 3) – add references specifying the projects here.
• Section 2.3 – The query example (broader) does not match its descriptions.

Review #4
By Werner Kuhn submitted on 08/Jun/2014
Suggestion:
Minor Revision
Review Comment:

The paper describes a very useful contribution to the Semantic Web, in the form of a Linked Data API for SKOS-based vocabularies. As such, it should clearly be published, though minor revisions would help readability. Anything the authors can do to lift the level of discussion from the (sadly typical) acronym-heavy jargon to a description that a domain modeller finds interesting, and can understand and benefit from, will be much appreciated. This opportunity begins with the title and abstract and runs through most sections.

A claim in the abstract and in section 2 seems either wrong or ambiguous, if I understand the contribution correctly: it seems to me that the API does *not* allow to publish vocabularies, only to access those published with other means. This is fine, but needs to be adequately stated.

As I am not knowledgeable about Linked Data API, I have no criticism or suggestions regarding the technical solution itself, which looks good and useful to me. The idea that vocabularies are more likely to be adopted and shared if they are simply made available more easily (as opposed to being constantly refined and extended) is not articulated in this paper, but could be.

Add pointers to the services after the sentence "A number of SKOS vocabulary services are available...".

The first sentence in 2.1 is too hard to understand and needs some explanation.

The first URI given in 2.3 does not work and needs to be explained anyway. The second paragraph following it needs to be turned into a complete sentence.

Explain what you mean by "each interface uses the next one down"

The links in figure 3 should become clickable. The figure itself needs a bit more explanation in its caption.

Section 3, labelled "Evaluation" does not really do that. It is more of a (useful) discussion of what has been done and how it can be used. The downside of such papers (as useful as they are) is that their contributions cannot really be evaluated. One should either acknowledge that (or at least not claim the contrary) or then find a way to actually do some form of an evaluation and state why it achieves that.

The web-based search interface for vocabularies is very nice. It would be even better if it said more clearly what one can search for, or possibly generated itself some lists for faceted browsing.

The discussion of related work in section 3.4 had me wonder why these approaches appear to be developed in parallel, without much evidence of one building and improving on the others.

Language issues:
Abstract
- the 3rd sentence may be grammatically correct, but is too hard to parse
- why "Linked Data Approaches"? (both, capitalized and pluralized)
- the last sentence needs to be fixed

Introduction
- "each implementation use" should be ".. uses"
- the repetion of "such as" in the same sentence is clumsy
- fix the sentence "But primarily SISSVoc also..."

Section 2
- constructions with "a set of" require a singular, not plural form verb (provides, obtains)
- in the footnote, "for tool" should be "for a tool"

Section 3
- the "its" in "This limits its flexibility" has no clear referent.

References 13 and 42 need to be fixed.