Review Comment:
This paper describes the Listening Experience Dataset (LED), a recently published dataset on the OU’s Linked Data platform, providing information about listening experiences which are gathered using a controlled-crowdsourcing approach (i.e., volunteers contribute listening experiences which are then checked and approved by moderators).
The work described has several interesting aspects (strong points). Firstly, from a content perspective, it provides a rather unique dataset, since as far as I know no similar type of data is yet available on the linked data cloud. Secondly, the data creation process relying on volunteer contributions is interesting and differs from the more frequent approach of generating linked data from databases. Thirdly, the technical realisation of the data representation and publishing approach is of high quality, thus fulfilling the “quality of the dataset” evaluation criterion of the Call.
On the less positive side, there are also some concerns. Most importantly, as the authors themselves point out, the dataset is very new and as such has not yet been used by third-parties. It is therefore rather difficult to judge the usefulness of the dataset (this being one of the criteria in the Call). Given the specialised focus of the data content, a broad adoption of the dataset is probably not realistic to expect. However, this dataset might enabled specialised, niche applications. It would be useful if the authors made this point clear in the paper.
The crowdsourced data creation is one of the particularities of the dataset, and therefore, it would be important to briefly overview related work at the intersection of crowdsourcing and linked data – I am aware that short papers are not required to have an extensive related work section, however, given that the crowdsourcing aspect is a core differentiator of this work, the authors should position themselves in the landscape of other works in this spirit. At a minimum, a short definition of crowdsourcing and an overview of its main genres should be included. Authors should also mention other similar works (e.g., [1, 2, 3]), which use different genres (GWAPS, paid for crowdsourcing) and also have different aims, mostly the verification of existing data. Another important aspect to cover would be providing an example data instance. Section 5.2 about URI schemes would better fit in the section 4 where dataset design issues are described as opposed to section 5 which focuses on dataset usage. Finally, I would suggest reconsidering the title and providing a shorter, more concise title.
Smaller comments and typos:
•p2: link against => link to;
•p3: coming across => encountering
•p3: to relate them at the time. => to relate them at the time of creation.
•P3: remove “Basically”
•P6: EDTF should probably not be shown in boldface
•P7: This is in true => This is true
•P8: and one in towards => and one towards
[1] Irene Celino: "Geospatial dataset curation through a location-based game", Semantic Web Journal, DOI: 10.3233/SW-130129, IOS Press
[2] J. Waitelonis, N. Ludwig, M. Knuth, and H. Sack. WhoKnows? Evaluating Linked Data Heuristics with a Quiz that Cleans Up DBpedia. Interact. Techn. Smart Edu., 8(4):236–248, 2011.
[3] L. Wolf, M. Knuth, J. Osterhoff, and H. Sack. RISQ! Renowned Individuals Se- mantic Quiz - a Jeopardy like Quiz Game for Ranking Facts. In Proc. of the 7th Int. Conf. on Semantic Systems, I-Semantics ’11, pages 71–78, 2011.
|
Comments
Data licensing update
We had to temporarily re-license the dataset described in the paper as CC BY-NC-SA, http://creativecommons.org/licenses/by-nc-sa/4.0/ . This will be reflected in further manuscript updates unless the licence is reverted to the original one by then.