The Materials Design Ontology

Tracking #: 3037-4251

Authors: 
Patrick Lambrix
Rickard Armiento
Huanyu Li
Olaf Hartig
Mina Abd Nikooie Pour
Ying Li

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Full Paper
Abstract: 
In the materials design domain, much of the data from materials calculations is stored in different heterogeneous databases with different data and access models. Therefore, accessing and integrating data from different sources is challenging. As ontology-based access and integration alleviates these issues, in this paper we address data access and interoperability for computational materials databases by developing the Materials Design Ontology. This ontology is inspired by and guided by the OPTIMADE effort that aims to make materials databases interoperable and includes many of the data providers in computational materials science. In this paper, first, we describe the development and the content of the Materials Design Ontology. Then, we use a topic model-based approach to propose additional candidate concepts for the ontology. Finally, we show the use of the Materials Design Ontology by a proof-of-concept implementation of a data access and integration system for materials databases based on the ontology.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Gerhard Goldbeck submitted on 07/Apr/2022
Suggestion:
Minor Revision
Review Comment:

(1) originality: The paper provides and important and original contribution to the application of semantic web technologies in the field of materials science. It is very timely since there is a rapid development of databases and resources that could be better harnessed and harvested with ontologies, there is a strong interest in the field in using ontologies for such purposes but still a lack of semantic resources and ontologies available. The approach employed is very methodological and will also serve as s show-case for ontology development to others in the field. The ontology is well described in the paper, provided in github and all entities in the owl file are 'defined' via comments as is appropriate. As mentioned above, the Long-term stable URL included the complete (modular) ontologies together with Readme etc

Extension of the ontology based on analysing a corpus consisting of journal databases is presentation and discussed in excellent detail. This will be a key step in overcoming current scope limitation of ontologies in the field.
The application to materials databases, enabling integration of heterogeneous resources is also presented with detailed results, and includes the required GraphQL server setup details, as well as queries for reproducability.
Overall the results are highly significant and expemplary for the field in many aspects as discussed. The paper is well written and clear.
One minor criticism regards the discussion and use of top level ontologies. In section 2.1 it states that:
EMMO is "a standard representational ontology framework based on knowledge of materials modeling and characterization." That is not correct. As outlined in https://github.com/emmo-repo/EMMO, EMMO is based on physics and analytical philosophy, in particular mereotopology and semiotics.
Given that DOLCE Top Level Ontology is then referred to in line 6 on page 3, it should be discussed also with the BFO and EMMO in the previous paragraph. It cannot be said, as seems to be implied now, that EMMO and BFO are somehow particularly 'materials' related top level ontologies.
Given the discussion of ontologies developed within TLO contexts and those indepedent of TLO context, it is not clear why MDO decided to develop outside of a TLO.Furthermore, it is not clear what the purpose of using a single concept (Material) from EMMO is in the context of MDO, in particular since 'atom', i.e. a type of 'material' is aligned with CHEBI (why not also with EMMO?). Strictly, this creates the issue that Material is a 4D (space-time) entity (due to EMMO being 4D) and atom is a 3D (space) entity. It would be better to keep the concepts without reference to other ontologies at this stage and leave any alignments to the future work mentioned in the conclusions.

Review #2
Anonymous submitted on 20/May/2022
Suggestion:
Minor Revision
Review Comment:

In this manuscript the Authors present the Materials Design Ontology (MDO), a domain ontology to support interoperability of databases for solid-state calculations.

Next, they address the MDO extension, applying a method (and a tool) that extracts candidates for additional concepts from a corpus of journal abstracts and enables domain experts to evaluate them. Finally, they present a proof-of-concept system using MDO to query multiple databases from the OPTIMADE initiative and evaluate its performance in comparison with other two systems.

The manuscript builds on previous work from the Authors, that is extended here; both the ontology and the information to set up the server are publicly available (https://w3id.org/mdo/full/1.0/ and https://github.com/LiUSemWeb/OBG-gen).

The topic is timely, the manuscript is well written and rich of details to support reproducibility, therefore I do recommend it for publication in the Semantic Web journal.

Below I give some questions and comments I would like the Authors to consider in their revised version before publication.

-----------------------------------------------------------------------------------------------

Questions/Comments

1) Some ontologies for materials are listed in the manuscript (Table 1): how have these ontologies been found/selected? It would be useful to add a brief comment about this in the text, especially if they were found systematically.

2) The "xsd:string" type is used also for cases that could be further specified, for example as URI (e.g., DOI and URL properties of ReferenceAgent in the Provenance module). Please add a comment about this choice.

3) In MDO, "Physical property" and "calculated property" are disjoint, it seems "Physical" stays for "measured" (experimentally). Note that other authors take the view that certain properties can be "computed/calculated" or "measured", but they are anyway in both cases
"physical" quantities, as opposed, for example, to numerical parameters that do not affect the physics of the system. A note could be added to point out what is meant with "physical" in MDO.

4) On competency questions and restrictions.
4a) CQ11 vs CQ13? Does CQ13 refer to software that is compatible with the one used to obtain the results? Please clarify.
4b) AR2 vs AR6? Does this exclude the possibility of using a combination of computational methods for materials? Please clarify.

5) Table 3: It is not clear how the "Original TopMine" and "TopMine without stemming" methods (columns two and three) differ from each other. Please explain it briefly in the text and/or point to a reference for details.

6) The concept of "High-quality frequent phrases" could be clarified better and sooner. It is explained with an example, but I think a concise sentence could be added beforehand. For example, saying that a phrase that is also part of (contained in) a frequent one will not be counted.

7) Is the ontology extension tool (snapshots in Figs 2,3,4,5) available, or is there a plan to make it available? In case, please add a comment in the manuscript.

8) Current and future scope and connections
8a) As clarified by UC1, "materials" and "materials calculations" in MDO are intended for solids, in the context of solid-state physics and condensed-matter theory. A note on this could be added in the introduction and conclusions, to stress the focus of MDO also there. If, on the contrary, there is a plan for MDO to be extended to address also other phases and areas of physics/modelling, this should be said.

8b) For future work, in line with the mentioned OntoCommons demonstrator where connections to other ontologies will be explored, I would suggest, for the "Calculation" module, looking into recent and ongoing work within VIMMP Ontologies, in particular VISO-electronic (for definition of methods, parameters etc) [https://gitlab.com/vimmp-semantics/vimmp-ontologies/-/blob/master/viso/v.... Similarly, another (work-in-progress) relevant ontology is EMMO-CIF [https://github.com/emmo-repo/CIF-ontology].

8c) Sec 3.4: The current connection to EMMO and CHEBI is only via two individual concepts. A comment could be added on the reason for these choices (e.g., since EMMO is a TLO, more concepts can probably be reused, but I am aware that EMMO is still being actively developed and it could make sense to wait for it to be stable before drawing more connections).

Minor points:

9) "Topic model" approach and "Latent Dirichlet Allocation": please add references.

10) Please add a sentence (and a reference) about Yago. Add a reference for MatML.

11) Unless they are universally known, please expand acronyms in the main text when they first appear (e.g., OQMD = Open Quantum Materials Database, NOMAD=Novel Materials Discovery, etc).

12) Introduction: EMMO name, please update "Elemental" -> "Elementary"

13) Page 5, line 51: Additional Restrictions -> please add "(AR)", since it is used afterwards

14) "ODGSG" is used in Fig 19, 20 and not defined. Obviously it refers to the OBG-gen server, but the acronym should be introduced in the text.

15) Section 5: "The 37 concepts of MDO were used as search phrases ...". Was the search run on titles AND abstracts? I guess so, but please clarify for reproducibility.

16) Typo: "has exact one composition" -> has exactly ...

17) Patterns: In Sec. 3 it is first said "we did not use existing ontology design patterns (scenario 7), as the only one we are aware of in the materials science field is about materials
transformation [42] that is not covered by MDO." then later on in the same section it is said
"We identified a pattern related to provenance information in the repository of Ontology Design Patterns (ODPs) that could be reused or re-engineered for MDO." Please clarify.

18) "TopMine generates frequent phrases": I understand it is meant to point out the "output" of the code, but "identifies" seems more appropriate.

19) Page 16: "generated topics": Please add a brief sentence to say what a "topic" is in this context (e.g, as a clusters of phrases).

-----------------------------------------------------------------------------------------------

Note: In Long-term Stable Link to Resources, the Authors provided this link: https://github.com/LiUSemWeb/Materials-Design-Ontology
Probably also this other link should be given https://github.com/LiUSemWeb/OBG-gen

Review #3
Anonymous submitted on 17/Jun/2022
Suggestion:
Accept
Review Comment:

The paper provides an original and credible approach to the definition and population of a ontology for materials. The results of this approach would be of great benefit for the materials community since it would prepare data semantically facilitating further computational investigations. The implementation details in sections 5 (extension) and 6 (database access) clearly describe the power and limits of the approach.

In general, the paper is well written and able to effectively deliver the details of the approach. The suggestion is to accept the paper for publication.

Here is a couple of minor polishing suggestions and comments, leaving to the authors to decision to use them to make minor text changes/additions:

- page 1, lines 41-43: provide a better explanation about the role of material science in env-friendly energy technologies (a couple of sentences should be enough)

- page 5: really appreciated the usage of OWL2 DL to enable a full exploitation of semantic constraints provided by OWL, that would enable the usage of expressivity at ontology level (while usually such approaches are kept at taxonomical or thesaurus level), enabling the exploitation of existing Top Level Ontologies (TLO) (strongly promoted by the OntoCommons project, mentioned in the paper). However... (see next point)

- page 9-11: as it is now, the connection to the TLO is quite useless since its main concepts are not really used to implement the MDO modules. Unless a stronger commitment to the TLO classes and relations in the MDO core ontology the connection is only a conceptual orientation more than an architectural choice. However, the approach used here is already a relevant step forward and does not prevent future alignments that will provide stronger semantic extension of the MDO thanks to a stronger commitment to the EMMO TLO.

Data are easily accessed and well organized, and primarily based on a GitHub repository.