Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
This is an interesting, well written paper providing novel insights into the combination of LLMs and Semantic Web Technologies. The proposed approach is original and fairly well evaluated.
All relevant material is provided in a GitHub repository. This repo is well organized and well documented. To ensure reproducibility of the paper results, I suggest to freeze a version of the repo with, e.g., a zenodo DOI.
A few more detailed comments that should be addressed prior to publication:
general: the running title is missing
p2, l21: "RDF, TTL, or OWL" should be rephrased, after all TTL is an OWL-serialisation
p2, l24 ff: I suggest to also mention Graph-RAG here
p3, l1: validation and verification are two different things, should be phrased more precisely
p3, l29: what is specific about industrial semantic artifacts - or what makes the approach suited to those only?
p4,l45ff: the descriptionof OPC UA lacks references
p5,l44ff: U and L are not defined
p6, l5: s is used to denote both subject and SPARQL query.
p6, l20: is it correct, that the new graph has to contain the old one? So no updates or changes allowed?
p6,l8: might be good to mention somewhere, that as result to a NLQ, you expect something structured (like a set of queries, not a natural language answer).
p6,l42: SPARQL has been used before, bit late to introduce the acronym
p8,Fig 3: The figure sort of suggests that the embeddings for the text chunks happens at query time (and thus again for each query). I assume that is not that case...
p10, Fig 5: how is the subgraph provided to the LLM? as graph or in some form of embedding?
p11, l10: what happens if there is more than one possible URI?
p12,l50ff: something is wrong with this sentence
p14,l32: where is the inclusion of parent classes reflected in the algorithm?
p20, l4: shouldn't that be 4/5 (see line 14)
p20,l30: why is 3/5 deemed correct?
p20,l38/39: the numbers of the per-execution ratio seem not to match (47/60 does not equal 0.7, neither does 42/60 equal 0.72)
p24,l33ff: I didn't understand the remark about the size reduction. This needs more elaboration.
p24,l40: This is the first time, the approach is called Graph-RAG, before it was called Our RAG. should be named consistently - and Graph-RAG may be too close to GraphRAG not to cause confusion.
p25,l10: linebreak
|