Review Comment:
The paper describes the LAK dataset, consisting of LOD of bibliographic data and papers in the field of learning analytics and educational data mining of the past 5 years, taken from various venues and sources. The paper is easily readable and describes the motivation, creation, usage, and uptake clearly. The dataset was created semi-automatically, i.e., with some manual intervention, and given that it is used for various activities and software, may seem to have sufficient quality for its purposes. I have some reservations on this, primarily due to the 'schema' used (see below); also, some links/resources were not working.
Concerning usefulness, the dataset is rather focussed and limited, and its construction is rather narrowly focussed on just this case instead of approaching it in a broader context. To make the efforts put in to create the resource useful also beyond this particular dataset---one may like to create a similar dataset, but then on, say, conceptual data modeling research, or some other research field---more detailed guidelines on how to construct it would have been useful. Moreover, one would want to be able to reuse the annotation model, which is not possible at the moment (at least, not based on the information provided in the paper or the website). How is the mishmash of annotation resources useful (if at all) for anyone wishing to create their own extended dataset of bibliographic and paper data, building upon the authors' efforts?
There is indeed a section on "schema/ontology", but I could not find the actual schema used, other than that the paper gives the impression it is patchwork, mixing a bit of bibo, swrc, foaf etc. The data provided in Table 4 does not induce confidence either: foaf:maker has as range foaf:Agent in the FOAF file, but the table says foaf:Person; bibo:content is deprecated in BIBO (hence, ought not to be used, but is, according to Table 4); swrc:affiliation does have domain and range restrictions in the ontology (swrc:Person and swrc:Organization), yet foaf:Person is used according to Table 4, but no mention is made about an alignment between the two. p3, first column, states there are some "implicit mappings" between those ontologies, but just one example is meagre, and, given the content of Table 4, questionable. And why leave them "implicit"? Section 4 also mentions "e.g. by frequently adding new alignments with emerging vocabularies", but I could not find those alignments; in fact, http://lak.linkededucation.org/ doesn't seem to have the file with the schema. It would be useful to have, and its URI could be included in Table 1.
Another shortcoming is that, despite that various resources have been used, it does not discuss similar endeavors, how the one of the authors differs, and what (if any) could have been reused from that; e.g. the OCLC [1], some other domain going the LOD way [2], or BibBase [3], to name just a few that were easily found by a simple online search, and other data sets (see, e.g., [4], which has, e.g., SwetoDblp [5]); to name but a few. Now the authors just assert they have more, which isn't convincing (that the dataset is the first of its kind, being bibliographic + full text). Note: this criticism doesn't mean I'd expect the authors to add exactly these references; just at least add some related references---even if the authors created the dataset in isolation and from the ground up de novo without looking at other efforts, the dataset does not exist in isolation, and it's evidently not the only way of creating a bibliographic dataset.
[1] http://oclc.org/developer/develop/linked-data.en.html
[2] http://www.semantic-web-journal.net/content/migrating-bibliographic-data...
[3] http://www.semantic-web-journal.net/content/publishing-bibliographic-dat...
[4] http://datahub.io/dataset?q=Bibliography
[5] http://knoesis.wright.edu/library/ontologies/swetodblp/
I did try out a few links from Table 1, with mixed results. Some of it may be just coincidence and bad timing (a Friday during work hours), but it does not give a good impression of dataset availability.
http://data.linkededucation.org/resource/lak/conference/lak2013/paper/93 gave a http status 500
http://data.linkededucation.org/request/lakconference/sparql and http://data.linkededucation.org/request/lak-conference/sparql (not clear from the table which one it is): the first one returned a 404, the second one an 'unable to connect' (to the l3s it was redirected to).
Trying to go to the dataset via http://lak.linkededucation.org/ - 'spiralling to the core', then clicking around does allow browsing access. Clicking the 'blue canary' option gave a connection-reset. DEKDIV works.
On the datahub.io/dataset/lak-dataset, the pointers tot he example and sparql endpoint don't work, and when clicking the link of 'source' under 'additional info' [http://www.solaresearch.org/resources/lak-dataset/], it gives a 404 page not found. Further, the last activity to the dataset was over a year ago, which makes me assume the dataset is not as kept up-to-date as Section 4 (p5, bottom) of the paper suggests.
The R-dump link works, which is the one I actually did not expect to work, for it's on a person's homepage of an affiliation, and people tend to change affiliations, so it is quite prone to link rot over time (though, admitted, so are EU project URLs).
Other infelicities
There are multiple footnote numbers int he text that have a space between the word and the number, which should not be there.
Table 1 and Table 4 go outside of the text area.
Table 3 is spread over two columns, and listing 1 over two pages, which shouldn't be.
"references and full text is missing" -> are.
"particularly about," -> incomplete sentence.
reference to references "[8][8]", which probably should be refs 8 and 9; there are several of those.
last section "and beyond," -> "and beyond."
footnote 22 and 23 are redundant, or put them in the references "[8][8]".
|