MAMBO: a lightweight ontology for multiscale materials and applications

Tracking #: 3389-4603

Fabio Le Piane
Matteo Baldoni
Mauro Gaspari
Francesco Mercuri

Responsible editor: 
Aldo Gangemi

Submission type: 
Full Paper
Advancements of both computational and experimental tools have recently led to significant progress in the development of new advanced and functional materials, paralleled by a quick growth of the overall amount of data and information on materials. However, an effective unfolding of the potential of advanced and data-intensive methodologies requires systematic and efficient methods for the organization of knowledge in the context of materials research and development. Semantic technologies can support the structured and formal organization of knowledge, providing a platform for the integration and interoperability of data. In this work, we introduce the Materials and Molecules Basic Ontology (MAMBO), which aims at organizing knowledge in the field of computational and experimental workflows on molecular materials and related systems (nanomaterials, supramolecular systems, molecular aggregates, etc.). Linking recent efforts on ontologies for materials sciences in neighboring domains, MAMBO aims at filling gaps in current state-of-the-art knowledge modelling approaches for materials development and design targeting the intersection between the molecular scale and higher scale domains. With a focus on operational processes, lightweight, and modularity, MAMBO enables extensions to broader knowledge domains and integration of methodologies and workflows related to both computational and experimental tools. MAMBO is expected to advance the application of data-driven technologies to molecular materials, including predictive machine learning frameworks for materials design and discovery and automated platforms.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Apr/2023
Review Comment:


The paper presents the MAMBO ontology which is focused on molecular materials. The purpose of the ontology is the integration and interoperability of data about computational and experimental workflows on molecular materials and related entities (aggregates, nanomaterials etc.)

MAMBO is developed with a modular approach. Ideally, this makes possible to integrate/expand the ontology (by class alignment) with categories or modules from other ontologies, like MBO and CHMO. MAMBO aims also to make interoperable results obtained on materials at different scales.

MAMBO is compatible with EMMO (of which it aims to become a domain ontology, thus a module), which is a recent effort in foundational ontology. EMMO takes (a version of) quantum physics as its starting point.

Formally, the MAMBO ontology is a logical theory expressed in OWL and the ontology domain is that of concepts and relationships of common use among experts.

The core of MAMBO includes the following classes: Material, Property, Structure, Simulation and Experiment. These are only lightly identified and no special information is given on how to use them.
This can be problematic since I find the comments in the ontology puzzling. For instance, the ontology says that MolecularAggregate, a subclass of Structure as well as of StructuralEntity, is: rdfs:comment "A material composed of aggregates of molecules of the same kind or of different kind”. If this were true, MolecularAggregate should be a subclass of Material as well (but it is not). Also, subclasses of Experiment are BFO concepts, which is not motivated nor explained.
The comment on Experiment talks of “experimental operation/procedure” and the difference with ExperimentalMethod is not addressed. Also, for Simulation there is no SimulationMethod. This is also puzzling. (There is another ontology in the GitHub but it does not have the core categories and has no annotation. I assume it is not the one referred to in the paper.)

I find the rest of the paper confusing. It is unclear to me whether MAMBO aims to be a reference ontology or a technical portal (in the sense of a system where you just collect data with a minimal organisation).
It fails to be a reference ontology because it does not clarify the meaning of the core categories (Material, Property, Structure, Simulation and Experiment) nor how they are organised to make sense of the different ways application ontologies understand, e.g., properties and their granularity.
If it is understood as a technical portal, then it cannot be used as an integration ontology.

The authors are definitely concerned about covering the standard classes in the ontology but, in my view, do not pay enough attention to characterise them (concepts, properties, executions and models) to a point that the ontology can be correctly used.

From section 3 (pg. 6) we know that tasks and sub-tasks, methods and pre/post-conditions are needed for the development of MAMBO. Tasks need goals to be modelled, how are they introduced in MAMBO? And pre/post-conditions?

Are properties of materials at different spatial scales somehow integrated or are them just juxtaposed in the ontology? There is no explanation nor example in the paper. Same for properties of processes.

The CQs at pg. 8 mix ontological questions and database queries. CQs should not be confused with DB queries one might want to make. CQ should be about the structure and relationships that the ontology must have to be able to cover the domain of interest. In particular, they are not about data. There is an important difference btw: what is an author of a simulation vs who is/are the authors of a simulation.

Fig. 1 is unclear and the text is not helping.
The issue is that a property in an experiment is a property of a physical object, the property of a simulation is a property of an information entity. These properties are quite different but this point is not addressed (see Sect. 5, bullet Structure).
Perhaps relations has_experiamental_input and has_computation_input are enough to deal with this difference but I cannot tell from the paper.
A similar question arises on Structure.

It remains unclear in Sect. 5 if both basic and complex (obtained from combinations of basic) properties are modelled. If so, the relationships btw complex and basic properties in the ontology should be explained.

What are the members of ComputationalMethod and ExperimentalMethod? procedure descriptions?
a flat list of names without semantics? Are partial deviations from the ideal method modelled in the ontology?

Pg. 13 “tune accordingly”, explain how this can be done.

Fig. 4 and 5 need to be clarified better. From the text it seems that there is no distinction btw the files (documents in general) and the content in those files. The two levels should be distinguished. As of now, the paper talks of transformation in general terms: it can refer to the transformation of the files (adding structure, producing new files) or of the content making unclear what we are talking about.

In Fig. 4, the input files seem to be the instances of the MolecularSystem, not their content.

“Information on the tools for the manipulation of data structure…”: an example would help.

“In analogy with the input data,…” Ok if the members of the classes are documents, not clear if they are the actual structures since a virtual structure is a piece of information (and unique for all objects in a class), while a material structure is its realisation in a physical object (and each object has its own realisation).

After reading the paper, I still do not understand how (or in which sense) multiscale modeling is possible in the ontology.

Minor points:
Pg.2 “Many academic disciplines uses” (use)
Moreover, add references to this and the following sentence.

Pg. 6 “To deal with… [20, 56]” I find this hard to read unless one is already familiar with PSM

Pg. 11
“The class features general…” (I would not use the verb “features” in an ontology paper)
(bottom). “The MolecularAggregate and the Crystal classes, respectively.” (Respectively to what?)

Review #2
Anonymous submitted on 01/May/2023
Major Revision
Review Comment:

The manuscript is well written overall and the ontology is well motivate. The multiscale materials modeling concern of MAMBO is original and innovative. Since the ontology targets both materials computation and materials experiment, it provides a possible way to represent a workflow that integrates both computations and simulations. In addition, the development of this ontology involves a group of domain experts. These are positive aspects of this manuscript.

However, I think the current manuscript faces a couple of weaknesses for publication at SWJ as a “Full papers” submission type. These weaknesses mainly are mainly related to following perspectives:
- Evaluation of the ontology based on CQs is missing (What would be the SPARQL queries for CQs?). The current evaluation of the ontology only provides a simulation example, and as such, the level of detail for ontology evaluation is limited. (How can the ontology be used to annotate an experiment workflow or a combined workflow of simulations and experiments?)
- Some statements are not supported by enough evidence, references, or examples.
- Some description of the ontology in the manuscript is not consistent with the actual implementation of the ontology. (e.g., modules of the ontology, Experiment versus Measurement, Computation versus Simulation)
- Detailed documentation of the ontology is missing (tools such as WIDOCO can generate the documentation); the github repository of the ontology lacks some introductions (what are the CWL and tomls folders about?).

Therefore, my recommendation of this manuscript is Major revision. My detailed comments are as follows:

For the Abstract:
- In several places in the paper, the authors present that MAMBO focuses on modularity. However, after reading the whole, the authors don’t provide more details in terms of what are the modules. In addition, the MAMBO does not seem to be organized in a way to support modules, after checking the ontology files.
- The authors present “[…] targeting the intersection between the molecular scale and higher scale domains.”. What are these higher scale domains, and what is the intersection? I think it’s important to let readers know what these domains are and why just the intersection of them is targeted.

For Section 1:
- The authors present “the vast amount of data typically produced by both computational and experimental approaches is often unstructured and uncorrelated […]”. I think this statement is not supported. For example, data provided by Materials Project, OQMD and NOMAD and many other databases in the OPTIMADE consortium is structured. In terms of “uncorrelated”, do you mean the data from computations and experiments respectively is not correlated?
- For the sentence “Many academic disciplines uses ontology to deal with complexity and to organize […]”. What is the complexity about?
- The authors present that “ontologies have a long history in supporting […] of experiments and activities, including complex workflows, processes and procedures”. This sentence needs more details such as what are the example domains and example ontologies. Also, the sentence is not supported or missing references.
- For “[…] including molecular materials, nanomaterials, […] and other related systems.”, what are these related systems? It is important to make MAMBO's scope precise so that people can understand when to reuse MAMBO.

For Section 2:
- The authors present that they mutate from EMMO general design criteria and approaches. However, there is no further description what these borrowed design criteria and approaches are.
- For the sentence “MDO is structured upon Competency Questions [48], answered by specialists in materials science […]”. This seems to be an incorrect understanding of CQs or MDO. CQs are used to define the ontology's scope, such as what questions it needs to answer. They can be proposed by domain experts (specialists) and should be answered by the ontology when evaluating the ontology after development. Further in the description of MDO, the authors mention linking MAMBO to MDO. But after reading the whole paper and looking at the ontology files, the links are not presented or shown explicitly.
- The authors present that “DEB […] from the point of view of foundations and approach”. What are the foundations and approach? And why are they linked to MAMBO? The following sentence “Moreover, the complexity and the very heterogeneous nature of the literature […] similar to those addressed by MAMBO.”, needs to be elaborated. Does MAMBO intend to address problems related to the complexity and heterogeneous nature of literature?

For Section 3:
- The structure of section 3 can be better reorganized. For instance, put the long description before sub section 3.1 in another sub section.
- In Section 3, the MAMBO approach and the MAMBO development approach are presented. What are these approaches?
- In Section 3.1, the authors present that “Ontologies can supporting the design and implementation of complex workflows, […] This step enables the realization of high-throughput […]”. But after reading the following sections and looking at the ontology files, I don’t see how a computation sequence can be represented. In Figure 4, there is a link from one ComputationalMethod to another ComputationalMethod while it is unclear what this link means.

For Section 4:
- For the 14 CQs presented in Section 4.2, after looking at the ontology structure presented in section 5 and the ontology files, some questions seem unclear or cannot be answered by the ontology. For instance, CQ2 (what is blend?); CQ3 (how homogeneous is defined in the ontology); CQ6 (how an interface can be represented in the ontology); CQ12 (what are materials characterization); CQ13 and CQ14 (how authors are represented in the ontology).
- In the mambo_core.owl file, there is the Measurement concept instead of Experiment. However, in mambo_topology_base.owl file, there is the Experiment concept.

For Section 5:
- The authors present that the CartesianCoordinates class is imported from MDO. However, in the ontology file, it has the local MAMBO IRI which is not in a way imported from MDO. If a class or property is reused from another ontology, it should be 1) reusing the IRI of that class or property in your ontology; 2) using your own IRI but providing some “alignment information” in the annotations.

For Section 6:
- The application of MAMBO presented in Section 6 seems stay at the level just using MAMBO to annotate a simulation workflow informally without populating MAMBO with the actual data from the workflow.

Minor comments/typos:
- Page 2: “Many academic disciplines uses […]” -> “Many academic disciplines use […]”
- Page 4: “we mutuated from EMMO […]”- > “we mutated from EMMO […]”
- Page 7: “Ontologies can supporting […]” -> “Ontologies can support […]”
- Indentation of some paragraphs

Review #3
Anonymous submitted on 10/May/2023
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper is well written and organised, the figures are clear, and the authors published their work on GitHub which is a plus point. An ontology is a conceptualisation of a field. The vocabulary presented does cover essential, albeit well known parts of a simulation, and this is expected by the reliance of the authors on domain experts (which is also a plus point). The conceptualisation of the relevant ideas is lacking, or in fact missing, and the logical basis for the categorisation is not based on any upper or even mid level ontology, hence here too elementary components of an ontology per definition are missing. There is no common basis for the presented semantic network that would remit it to be significant in the sense needed by any one except the specific application targeted by the authors. For this reason, it would be recommended that the authors reconsider this work as a part of a specific application, rather than a stand alone ontology.