Review Comment:
In this manuscript the Authors present the Materials Design Ontology (MDO), a domain ontology to support interoperability of databases for solid-state calculations.
Next, they address the MDO extension, applying a method (and a tool) that extracts candidates for additional concepts from a corpus of journal abstracts and enables domain experts to evaluate them. Finally, they present a proof-of-concept system using MDO to query multiple databases from the OPTIMADE initiative and evaluate its performance in comparison with other two systems.
The manuscript builds on previous work from the Authors, that is extended here; both the ontology and the information to set up the server are publicly available (https://w3id.org/mdo/full/1.0/ and https://github.com/LiUSemWeb/OBG-gen).
The topic is timely, the manuscript is well written and rich of details to support reproducibility, therefore I do recommend it for publication in the Semantic Web journal.
Below I give some questions and comments I would like the Authors to consider in their revised version before publication.
-----------------------------------------------------------------------------------------------
Questions/Comments
1) Some ontologies for materials are listed in the manuscript (Table 1): how have these ontologies been found/selected? It would be useful to add a brief comment about this in the text, especially if they were found systematically.
2) The "xsd:string" type is used also for cases that could be further specified, for example as URI (e.g., DOI and URL properties of ReferenceAgent in the Provenance module). Please add a comment about this choice.
3) In MDO, "Physical property" and "calculated property" are disjoint, it seems "Physical" stays for "measured" (experimentally). Note that other authors take the view that certain properties can be "computed/calculated" or "measured", but they are anyway in both cases
"physical" quantities, as opposed, for example, to numerical parameters that do not affect the physics of the system. A note could be added to point out what is meant with "physical" in MDO.
4) On competency questions and restrictions.
4a) CQ11 vs CQ13? Does CQ13 refer to software that is compatible with the one used to obtain the results? Please clarify.
4b) AR2 vs AR6? Does this exclude the possibility of using a combination of computational methods for materials? Please clarify.
5) Table 3: It is not clear how the "Original TopMine" and "TopMine without stemming" methods (columns two and three) differ from each other. Please explain it briefly in the text and/or point to a reference for details.
6) The concept of "High-quality frequent phrases" could be clarified better and sooner. It is explained with an example, but I think a concise sentence could be added beforehand. For example, saying that a phrase that is also part of (contained in) a frequent one will not be counted.
7) Is the ontology extension tool (snapshots in Figs 2,3,4,5) available, or is there a plan to make it available? In case, please add a comment in the manuscript.
8) Current and future scope and connections
8a) As clarified by UC1, "materials" and "materials calculations" in MDO are intended for solids, in the context of solid-state physics and condensed-matter theory. A note on this could be added in the introduction and conclusions, to stress the focus of MDO also there. If, on the contrary, there is a plan for MDO to be extended to address also other phases and areas of physics/modelling, this should be said.
8b) For future work, in line with the mentioned OntoCommons demonstrator where connections to other ontologies will be explored, I would suggest, for the "Calculation" module, looking into recent and ongoing work within VIMMP Ontologies, in particular VISO-electronic (for definition of methods, parameters etc) [https://gitlab.com/vimmp-semantics/vimmp-ontologies/-/blob/master/viso/v.... Similarly, another (work-in-progress) relevant ontology is EMMO-CIF [https://github.com/emmo-repo/CIF-ontology].
8c) Sec 3.4: The current connection to EMMO and CHEBI is only via two individual concepts. A comment could be added on the reason for these choices (e.g., since EMMO is a TLO, more concepts can probably be reused, but I am aware that EMMO is still being actively developed and it could make sense to wait for it to be stable before drawing more connections).
Minor points:
9) "Topic model" approach and "Latent Dirichlet Allocation": please add references.
10) Please add a sentence (and a reference) about Yago. Add a reference for MatML.
11) Unless they are universally known, please expand acronyms in the main text when they first appear (e.g., OQMD = Open Quantum Materials Database, NOMAD=Novel Materials Discovery, etc).
12) Introduction: EMMO name, please update "Elemental" -> "Elementary"
13) Page 5, line 51: Additional Restrictions -> please add "(AR)", since it is used afterwards
14) "ODGSG" is used in Fig 19, 20 and not defined. Obviously it refers to the OBG-gen server, but the acronym should be introduced in the text.
15) Section 5: "The 37 concepts of MDO were used as search phrases ...". Was the search run on titles AND abstracts? I guess so, but please clarify for reproducibility.
16) Typo: "has exact one composition" -> has exactly ...
17) Patterns: In Sec. 3 it is first said "we did not use existing ontology design patterns (scenario 7), as the only one we are aware of in the materials science field is about materials
transformation [42] that is not covered by MDO." then later on in the same section it is said
"We identified a pattern related to provenance information in the repository of Ontology Design Patterns (ODPs) that could be reused or re-engineered for MDO." Please clarify.
18) "TopMine generates frequent phrases": I understand it is meant to point out the "output" of the code, but "identifies" seems more appropriate.
19) Page 16: "generated topics": Please add a brief sentence to say what a "topic" is in this context (e.g, as a clusters of phrases).
-----------------------------------------------------------------------------------------------
Note: In Long-term Stable Link to Resources, the Authors provided this link: https://github.com/LiUSemWeb/Materials-Design-Ontology
Probably also this other link should be given https://github.com/LiUSemWeb/OBG-gen
|