Review Comment:
Name, URL, versioning, licensing, availability
Topic coverage, source for the data
Purpose of the Linked Dataset, e.g. demonstrated by relevant queries or inferences over it
Applications using the dataset and other metrics of use
Creation, maintenance and update mechanisms as well as policies to ensure sustainability and stability
Quality, quantity and purpose of links to other datasets
Domain modeling and use of established vocabularies
Examples and critical discussion of typical knowledge modeling patterns used
Known shortcomings of the dataset
This paper describes the Urbanopoly dataset, which consists of the results of a GWAP 'urbanopoly'. The dataset models the result of the GWAP including provenance. I think the paper is well written and understandable, the Urbanopoly game is in general well described and seems to be a good way of enriching geographical data. The datamodel is in my view relatively simple but it fits the purpose quite well. The human computation model is a very nice schema and especially the mapping to PROV is useful. Example queries show how the data can be queried to answer questions about the GWAP experiment outcomes.
- One question I have about the human computation ontology is the reusability. How reusable is this model, can the authors provide use cases or examples of such reuse outside of the Urbanopoly game?
However, with respect to the specific call for this special issue, I have some reservations about the paper. The special issue is concerned with concise descriptions of useful and reusable datasets. The dataset as it is described is essentially the result of a crowdsourcing experiment (who added what information), rather than the triples that enrich the geographical data (venue-feature-value). I can see how the latter would be useful and reusable, but for the human computation dataset as it is currently available, I fail to see the re-use and the authors do not provide convincing examples other than that it could be reused to evaluate different aggregation algorithms. The fact that the reusable triples (venue-feature-value) are now 'hidden' as reified triples inhc:ConsolidatedInformation instances makes the dataset harder to reuse from a Linked Data perspective. If I want an application to reuse your information to plot it on a map, I now need to be aware of the details of the human computation model, rather than just asking for all triples with subject venueX, which I think would be more reusable.
In short, as it is presented now, I am not convinced that this paper describes a dataset that is useful outside of its context. The useful information (the geo-enrichments) are hidden in the data and are not presented as the main outcome in this paper. I think this can be solved in two ways: either the authors make more explicit how the human computation dataset as it is now qualifies as a useful and reusable dataset or the authors describe the geographical data (venue-feature-value) in more detail (how many features/properties are used, where do these properties come from, how many venues are enriched, what is the average number of enrichments per venue etc.)
- One issue that remains regardless of this choice is the lack of statistics about the dataset. the paper fails to provide clear statistics on a number of defining features of the dataset. (# of venues, players, properties, relation instances, average number of features per venue, the distribution of information over venue or feature types, etc.). This will give the reader a much clearer view of the usefulness of the data.
- It seems that there is a lack of links to other datasets other than the reuse of the source data (OpenStreetMaps and LinkedGeoData). Are all rdf:object values unmapped literals or could you use these literals to link to other datasources (e.g. dbpedia for restaurant types,...)
Some other issues:
p2. Specific minigames result in specific information. please elaborate on how many (which) minigames were used and which triples they produce.
p2. - "The evaluation on the data curation results [6] of Urbanopoly is very good in terms of both precision/accuracy – around 92%" -> I understand that these results are better reported in [6], however, it would be good to also repeat here how these results were obtained to get a clear picture of the quality of the resulting dataset. Is this evaluation of a sample? Who evaluated it? Is this evenly distributed over the properties or is some data more reliable than other data?
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...