Exposing the Institute for Development Studies' Data using the API2LOD Linked Data Wrapper

Tracking #: 436-1598

Authors: 
Christophe Guéret
Victor de Boer
Duncan Edwards
Timothy G. Davies

Responsible editor: 
Pascal Hitzler

Submission type: 
Dataset Description
Abstract: 
This short paper provides a description of the wrapper (API2LOD) used to expose the data from the Institute for Development Studies' Knowledge Services as Linked Data. The wrapper provides Linked Data access to 35,000 research documents on development research as well as its metadata. Links are added from this metadata to a number of external sources: DBpedia, GeoNames and Lexvo. We expect that the IDS data will play a central role in the larger web of Linked Data for global development.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Philippe Cudre-Mauroux submitted on 20/Mar/2013
Suggestion:
Minor Revision
Review Comment:

This short paper describes a wrapper used to convert and interlink metadata from the Institute of Development Studies (IDS) into Linked Data. The architecture of the wrapper is compelling: it dynamically converts identifiers to linked data, calling the IDS REST API and creating links to further Linked Data resources on-the-fly taking advantage of a Java restlet package deployed on Google's AppEngine. Also, the authors give a nice overview of the state-of-the-art in international development APIs and data dumps.

A first version of this paper (which I personally found "interesting and well-written" at the time) was submitted to the Semantic Web Journal mid-2012. This new version enhances the description of the context in which the wrapper was build and deployed, as well as the overall presentation of the paper. The authors do not give much detail on why they do not provide an RDF dump / SPARQL end-point, however they are including in this new submission a short section on "known limitations" (i.e., lack of data export and warmup time). The description of the wrapper has not changed much, however the authors include in this new version a short but interesting performance evaluation of each linker (except for the IATI/OIPA linker that is now mentioned as future work), showing good performance results overall (e.g., 100% precision & recall for the Lexvo linker, 100%/77% precision/recall for the GeoName linker, but only a 72% precision for the DBpedia linker). The authors discuss the effectiveness of the linkers, however they do not discuss the efficiency of their approach (beyond the few words given in Section 3.4.2; it would be for example interesting to have more detail on the size of the cache, or some information on the latency with / without cache). Despite those few limitations, I liked the paper and feel that it makes an interesting contribution in the context of Linked Data for international development.

Review #2
By Norman Heino submitted on 03/Apr/2013
Suggestion:
Minor Revision
Review Comment:

This paper describes an RDF dataset as well as the software used to produce it from its original data API source.
The original data is published by the Institute of Development Studies (IDS) and is about documents, organizations, categories, countries and regions of research focus.
The RDF dataset is created on-the-fly and enriched with links to Lexvo, DBpedia, IATI, and GeoNames using the API2LOD tool.

Data quality of the described set seems to be quite good overall.
Some minor improvements could be in using literal datatypes and language tags.
What I particularly like is that some properties have been replaced by or linked to more common ones.
Replacement has been done for rdf:type and dcterms:language, while others like ids:date_created or ids:cat_parent have been linked via rdfs:subPropertyOf relations to Dublin Core, FOAF, or SKOS vocabularies.

The authors claim the developed software could be used to expose other APIs as RDF as well.
It remains unclear how much effort this would require, though.

The paper is well written and provides a usage example as well as the bigger picture on supporting development practitioners.
Other than a few typographical errors and minor issues no editorial revision is needed.

Typos
-----
* the mapping between JSON keys to predicates => the mapping of JSON keys to predicates (page 2)
* rdf:range => rdfs:range (page 2)

Review #3
By Axel Polleres submitted on 14/Jun/2013
Suggestion:
Major Revision
Review Comment:

Overall, since the first iteration the authors made a considerable effort to
improve the Webpage and also fixed presentation issues in the paper. Still,
some points need further clarification and my original concerns about
the preliminary state of the project aren't fully eliminated as of yet.

Remarks:
========

1)
When looking at the original ids api page at http://api.ids.ac.uk/
I find:
" 32343 abstracts or summaries of development research documents
8255 development organisations
28 different development themes
research on 225 countries and territories
"

So why do your numbers differ, i.e. where does the additional data come from?
This seems to indicate that either on the IDS page or on your page this data
is not reported in the current state, remarkably the API page reporting lower
numbers than in the paper.

Do you really keep in sync with http://api.ids.ac.uk/ ?
It seems, as you indicate that the wrapper works "on demand" that you
dynamically access the API. So, how does that work and how can those numbers then
differ or how can you end up in additional resources?

Also, since the API reported on http://api.ids.ac.uk/ the page requires
an API-key, which I assume could, at least in the future, be limited
in usage: you may want to think about providing a wrapper that
can be called with an own API-key?

2)
The dbpedia wrapper seems to need manual curation, you mention
non-optimal precision and links added/found by hand. Do you keep
this manual curation data in your linkage cache?
How's the plan to keep this sustainably up-to-date in the future?
This is also in connection with question 1) ... Linked Datasets that become
stale do IMO more harm than good for the promostion of Linked Data.

3)
As for known limitations, it seems that periodic crawl/dumping should
be possible to make the dataset available for querying. Otherwise,
since there is also no SPARQL endpoint, probably for the same reasons,
I see not much use of the data in RDF.

4) The examples of usage still seem preliminary and not yet
outlining a clearly defined practical use case or application, but
rather some first ideas towards that.

5) Section 6, although it has "Sustainability" in the title,
does not clraify my question fully. It is more a summary of ongoing
work and future plans. What I am missing there is a statement on how
long into the future this project is guaranteed to persist, basically,
whether and how long the URIs will be stable. While I acknowledge that
this maybe cannot be answered entirely, a frank assessment would be
enough, just to be clear on what time horizon you are operating on.

Overall, I think the project is insteresting and would accept
the paper for a *workshop* on linked datasets with bells and whistles.
As for the special issue of a journal - the editors may correct me
if I am wrong - I rather had understood the call of descriptions of established and sustainable linked datasets to enable usage and illustrate current use cases on these, whereas I still have the impression that this dataset is "in the making". I encourage the effort to continue!

Minor/editorial remarks:
=========================

p.2 footnote 2... put the footnotemark after the '.'

remark on the WebPAge:

- The webpage, http://api2lod.appspot.com/ doesn't render nicely in IE (maybe you want to fix that)