Review Comment:
This paper presents a dataset about Web APIs, which has been generated from the directory website ProgrammableWeb.com by screen-scraping, and furthermore interlinked with a few existing linked datasets. The paper …
* clearly motivates the need for such a dataset,
* explains the data source reasonably well,
* explains the ontology, which has been designed for this purpose, very well,
* explains the URI naming scheme and some statistics about the dataset,
* covers the interlinking, and
* presents as many as five (5) use cases, whose practical relevance is pointed out clearly.
It is thus a feature-complete dataset paper reflecting solid work and should therefore be accepted – with minor revisions.
The most frequent type of mistakes (spelling and grammar) will be easy to fix, probably with help from a native speaker. The slight lack of detail in some places will also be easy to improve, by elaboration on the following aspects:
* section 2 "data source": please comment on your screen-scraping approach. How sustainably will it ensure further updates of your dataset? In other words, how frequently does the structure of the HTML source change?
* section 3 "ontology":
* regarding provenance, in addition to who created an API or mashup, isn't it also relevant who created the ProgrammableWeb entry for a dataset? Is such information available from ProgrammableWeb?
* for some properties, whose ranges are not self-evident, a few comments on their range would be appreciated. E.g., how exactly do you represent usage fees? I suppose that in many cases this information will have quite a complex structure.
* section 4 "dataset":
* you explain how you created your own dataset, with all information harvested from ProgrammableWeb in a _central_ place. However, now that your ontology is available, it will enable Web API and mashup maintainers to make their mashups self-explaining, by publishing _decentral_ RDF records at the same domain from which the API/mashup is available. Could you provide some recommendations on how they should do this?
* while you do explain your inspiration for the URI format with appended type information such as "..._api" (roughly following the naming scheme of Wikipedia's disambiguation pages), I could imagine that URIs containing the type information as a path component (e.g. ".../api/...") would also be appropriate. Could you discuss this potential alternative?
* section 5 "interlinking": Why did you not use rule-based interlinking tools, such as Silk or LIMES? Wouldn't this have made the job easier?
* section 6 "use cases":
* what exactly do you mean by "personalised recommendation of Web APIs"? Even though this is covered in your other publications, please provide a little bit of explanation on the setting in which you provide such recommendations, and on who is the target audience of these recommendations.
* I wonder whether the queries that use prov:generatedAtTime make sense. If ProgrammableWeb does not record the history of versions of an API/mashup, then this probably effectively has the semantics of "last updated on ". Also, your ontology does not cover version histories.
* section 7 "future work": are you planning to consider other standards as well, such as WSDL or WADL?
At http://www.iai.uni-bonn.de/~langec/exchange/swj1144.pdf please find attached an annotated PDF with detailed comments.
|