Linked Web APIs Dataset: Web APIs meet Linked Data

Tracking #: 1420-2632

Milan Dojchinovski
Tomas Vitvar

Responsible editor: 
Rinke Hoekstra

Submission type: 
Dataset Description
Web APIs enjoy a significant increase in popularity and usage in the last decade. They have become the core technology for exposing functionalities and data. Nevertheless, due to the lack of semantic Web API descriptions their discovery, sharing, integration, and assessment of their quality and consumption is limited. In this paper, we present the Linked Web APIs dataset, an RDF dataset with semantic descriptions about Web APIs. It provides semantic descriptions for 11,339 Web APIs, 7,415 mashups and 7,717 developer profiles, which make it the largest available dataset from the Web APIs domain. The dataset captures the provenance, temporal, technical, functional, and non-functional aspects. In addition, we describe the Linked Web APIs Ontology, a minimal model which builds on top of several well-known ontologies. The dataset has been interlinked and published according to the Linked Data principles. Finally, we describe several possible usage scenarios for the dataset and show its potential.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tobias Kuhn submitted on 06/Jul/2016
Review Comment:

This looks good now. Except I would suggest to use the DOI link instead of a shortened URL for the survey results:

Review #2
By Enrico Daga submitted on 18/Jul/2016
Review Comment:

This version addresses all my remarks.

Review #3
By Christoph Lange submitted on 09/Aug/2016
Review Comment:

This paper presents a dataset about Web APIs, which has been generated from the directory website by screen-scraping, and furthermore interlinked with a few existing linked datasets. Like its previous revisions, the paper …

* clearly motivates the need for such a dataset,
* explains the data source reasonably well,
* explains the ontology, which has been designed for this purpose, very well,
* explains the URI naming scheme and some statistics about the dataset,
* covers the interlinking, and
* presents as many as five (5) use cases, whose practical relevance is pointed out clearly.

The latest revision features the following main enhancements w.r.t. the criteria for dataset papers:

* Regarding the quality and stability of the dataset, it now provides additional information about the process of generating the dataset (Section 4.2) and is more explicit w.r.t. the quality criteria assessed (Section 6.1).

* Regarding the usefulness of the dataset, the observations from the user survey are now explained in slightly more detail (Section 7.2).

* Clarity and completeness of the descriptions: this has generally improved.

I recommend acceptance; however, the authors should take care to update all figures on interlinking. Not only is the number of out-links likely to grow during the subsequent maintenance of the dataset, but on top of the in-links received from DBpedia in October 2015, I would also expect more in-links to be created.

The two _minor_ concerns from my previous review were actually not fully addressed in this revision. Let me re-state them, trusting that you will address them in the final version.

* section 7.1 "use cases": I wonder whether the queries that use prov:generatedAtTime make sense. If ProgrammableWeb does not record the history of versions of an API/mashup, then this probably effectively has the semantics of "last updated on ". Also, your ontology does not cover version histories. I would appreciate a brief discussion of these aspects (2–3 sentences).

* section 7.2 "survey": My question about whether your survey participants had used ProgrammableWeb is now answered, but the following issue has not yet been addressed: some more information on the background of the users would be helpful, i.e. being more specific than "all of the participants […] have searched or used an API, while 19 […] also provide an API". E.g. in what _ways_ are they using APIs, and in what situations of their work do they consider your dataset helpful. (The distinction between the perspectives of consumer vs. provider is already a good step into this direction!)

There are also a few places in which the grammar still needs fixing; e.g. in Section 7.2. "<*> majority of the participants" (missing article; see as a guide on whether to choose "a" or "the"). Another example of a sentence with poor _style_ is in Section 8.2: "A currently ongoing effort (up to this point rephrasing will help) is on integration of the (up to here, rephrase once more) […]".

In the references, there is a UTF-8 encoding problem in [10], and the metadata for [11] is not up to date; use the following:

author = {Zaveri, Amrapali and Rula, Anisa and Maurino, Andrea and Pietrobon, Ricardo and Lehmann, Jens and Auer, S{\"o}ren},
journal = {Semantic Web Journal},
Number = 1,
title = {Quality Assessment for Linked Data},
url = {},
volume = 7,
pages = {63--93},
year = {2016},