Review Comment:
The paper presents a classifier of vocabularies in LOV based on the main categories of Wikipedia by using word embeddings models and Deep Learning techniques. The authors show that a hybrid model of (DNN, RNN, CNN) gives an accuracy of 93.57%. The idea is very interesting and promising, but the authors need to (1) explain better some choices in the methodology (2) compare their classification with the LOV categories and (3) adhere to FAIR principles by providing a link to the materials of their experiments.
See below more details comment about the paper:
== Originality ==
The paper seems to be original as it uses Deep Learning techniques to attempt to classify vocabularies. The use of the Random Multimodel Deep Learning (RMDL) seems promising for the classification task. However, some aspects of the methodology need better understanding when applying to vocabularies, which are a special type of data as their encode “semantics” for datasets.
Data vocabularies: The dump used of the vocabularies is almost 5 months older, and 72 vocabularies were discarded. Would you elaborate more on this? Normally, each vocabulary contains many annotations (not only classes and properties). What about rdfs:comment, dcterms:abstract|title or any other annotation contained in the vocabulary? Additionally, LOV is a multilingual dataset with labels in English, French, Spanish, etc. It is not clear to me how do you take into account the multilingual aspect in your methodology (pre-procesing data vocabularies and Wikipedia corpus). The SPARQL queries in page 5 show only abstracts in English. Could you give more details?
Choice of the categories: Some categories are somehow difficult to be applied in LOV, either because they are so generic or because they are hard to be considered as domain category (Example “Philosophy” and “Technology”).
Embeddings: Vocabularies are also graphs of knowledge. Using only word embeddings could not capture also the underlying nature of a graph. Have you checked graph embeddings approaches (RDF2VEC, Graph2Vec, Node2Vec) if they can be suitable for your task?
== Significance of the results ==
The authors acknowledge in section 4.2 some limitations of their approach, and mention the use of “very general domains for the classification”.
The evaluation of the model lacks recall and F1-measure in section 3.2.3. I miss an evaluation of the approach against the manually created tags in LOV. How significant is your classification compared to the classification made by LOV curators? Currently, there are 28 vocabularies for “People” in LOV (https://lov.linkeddata.es/dataset/lov/vocabs?tag=People), and you classified 38. Which are those 38 vocabularies?
The authors should make this qualitative assessment for better acceptance of the proposed methodology.
== Quality of writing ==
The paper is well organized to understand what was the goal of the paper, and how the experiment was conducted. What is missing, is a link to a notebook to assess the pipeline and the list of vocabularies classified under each category (FAIR principles in action).
Figure 2 is not easy to read, please update
What means “vocabularies used are ontologies”?
|