Review Comment:
reviewer #4
#overview
This work proposes a solution for creating Knowledge Graphs from tables based on the data profiling techniques. In particular, the data profiles regard the domain profiles and the table profiles which are provided as vectors of features and represented into semantic data. Domain profiles are patterns of ontology relations (only datatype relations in this work) and their statistical characteristics such as value distributions of the data in a sample of the domain KG. Tables profiles comprise the columns of a table and the statistical characteristics associated with each column. The table interpretation approach named Tab2KG, considers the mapping of table columns to the ontology relations and transforms the table into the data graph.
#Originality and contribution
I think that the work is original and is very interesting. I have a big concern about the lightweight domain KG. In the evaluation Section, the authors mention "DBpedia as a crossdomain knowledge graph" and this seems to be a contradiction of what you stated in the introduction with respect to the state-of-the art approaches "In the context of DAW, the input data typically represents new instances (e.g., sensor observations, current road traffic events, . . . ), and substantial overlap between the tabular data values and entities within existing knowledge graphs cannot be expected". What if a domain KG is not available? What does mean a sample? How do we measure that the data in the domain KG are representative?
#presentation of the work
I think that reading the introduction I got a slightly different understanding of what is then explained in the other sections. Remove redundant information and keep it short. We already have an explanation in section 1 about what this work is doing- then we have a second explanation in section 2 on the running example- then we have a detailed and formal description in the problem statement- then we have section 4 with the details on profiles. I would suggest keeping a concise description in the introduction and maybe a section of the problem statement and the running example together -> In this way we reduce the number of sections as well.
#other commments
*Definition 6: data type profiles -> which are the statistical characteristics i.e., the features associated with the literal relations? From the definition and the examples in the paper, this is not clear. I was expecting to see some numbers
*Section 5.4. I can understand that the mapping is normalized in the range [0,1] but I don't understand how this function measures the similarity "Given a column profile and a data type relation profile, the mapping function returns a similarity score in the range". Can you provide a formula on how this is effectively measured?
*The knowledge graphs set was split into a training set (90%) and a test set (10%). Is this used for all the other datasets? What happens if we keep 80% for training and 20% for testing?
#Minors
*missing verb: In the case of a data table profile, these attributes the columns.
*check the correctness of the verb "assign" + "to" or "with"- seas:rank 2 (check spaces)
* check spaces in triples e.g., rdf:type
|