Review Comment:
# SWJ 2023 - A systematic mapping study on combining conceptual modeling with semantic web
**Title:** A systematic mapping study on combining conceptual modeling with semantic web
**Authors:** Cordula Eggerth, Syed Juned Ali, and Dominik Bork
**Submission Type:** Survey Article
**Venue:** Semantic Web Journal
## Overview
The paper presents a systematic mapping study investigating publications on the intersection between conceptual modeling and semantic web. The paper is clear and well-structured, covering an expressive number of publications selected under good inclusion and exclusion criteria. The classification of each publication considers several relevant dimensions (referred to as taxonomies), and this information is made available through a dedicated portal that presents all collected metadata. That being said, I judge this to be a relevant submission to this venue and I congratulate the authors for their effort.
Still, I see several points that should be addressed for this submission to be accepted. All these points are listed below under [Major Comments](#major-comments) where I call special attention to the reproducibility of publication database queries, the consistency of presented data, the alignment between data and claims, and the availability of data through the portal. Note that some of the issues I list maybe be indeed oddities in the data, rather than mistakes, but I would like the authors to check these points.
I look forward to the revised version of this submission.
The next specific comments are divided between major and minor depending on whether they influenced the overall evaluation of the submission.
## Major Comments
- I did not understand the use of reference [4].
- Reference [7] seems to be a course that I couldn't find the contents of online. If that's the case, do not use it as a reference. Refer instead to the seminal works you have used in the classroom.
- Please consider this reference in your related work: On the Philosophical Foundations of Conceptual Models; doi:10.3233/FAIA200002.
- The conclusion section is quite poor. There are no meaningful reflections on the work done or signaling to future works.
- I tried reproducing the query of Figure 1 as presented below. It returns over 28 000 hits, almost 7 000 on computer science. How did you arrive at 50? Please present queries in text or code format to ease reproducibility.
```
(survey OR systematic mapping study OR sms OR mapping study OR systematic mapping) AND (semantic web OR semantic systems OR knowledge graph OR linked data OR linked open data OR ontology OR rdf) OR (survey OR systematic mapping study OR sms OR mapping study OR systematic mapping) AND (conceptual model OR modeling language OR modelling language)
```
- Table 2 seems to be completely wrong, with duplicated titles and incorrect author/title matches. Examples:
- Alloghani and Gacitua as the first authors of "The XML and Semantic Web"
- Sabou as the author of "Semantic Web Services Testing" instead of "Semantic Web and Human Computation: The Status of an Emerging Field"
- Section 3.1 could benefit from an accompanying text highlighting the reasoning at least for the most relevant research questions. RQ5, for instance, reads in a confusing way.
- I am of the impression that in Section 3.2 you want to describe the phases of your SMS, and then explain how you execute them, or what they mean in the context of your research. The way it is currently written is a rephrasing of things you said earlier. Some ideas here could also have been used to better explain Section 3.1.
- "...considering title and abstract...": according to Figure 2, you also consider keywords.
- I believe that it would be relevant to include the reasoning behind the keywords chosen for the search query. Also, when trying to reproduce the query of Figure 2 in Scopus, it returned ~1500 publications, rather than ~2100. I don't see this as a serious issue because it is still clear the breadth of the research query. Nonetheless, I would like that these queries were easier to reproduce, especially outside Scopus. See below how I reproduced the query of Figure 2:
```
TITLE-ABS-KEY(
(
({conceptual modeling} OR {conceptual modelling} OR {metamodel} OR {meta-model} OR {metamodels} OR {meta-models} OR {domain specific language} OR {domain-specific language} OR {modeling formalism} OR {modelling formalism} OR {modelingformalisms} OR {modelling formalisms} OR {modeling tool} OR {modelling tool} OR {modeling tools} OR {modelling tools} OR {modeling language} OR {modelling language} OR {modeling languages} OR {modelling languages} OR {modeling method} OR {modellingmethod} OR {modeling methods} OR {modelling methods} OR {modeldriven} OR {model-driven} OR {mde})
AND
({knowledge graph} OR {knowledge graphs} OR {linked data} OR {linked-data} OR {semanticweb} OR {ontolog} OR {RDF} OR {OWL} OR {SPARQL} OR {SHACL} OR {semantic systems} OR {semantic system} OR {semantic technologies} OR {semantic technology} OR {RDFS} OR {protege} OR {SKOS} OR {simple knowledge organisation system} OR {JSON-LD} OR {rule interchange format} OR {semantic modeling} OR {semantic modelling} OR {linked open data} OR {vocabularies})
)
AND
(
LIMIT-TO (SUBJAREA, "COMP")
)
)
```
- The ACM Digital Library, for instance, encodes the search query in a URL (example below), so you could use a hyperlink to make your queries easily reproducible to those accessing your paper's PDF. But I would not ask to have these huge URLs spelled out in the text though.
```
https://dl.acm.org/action/doSearch?fillQuickSearch=false&target=advanced...
```
- There seem to be some important errors on https://me.big.tuwien.ac.at/cmsw
- When searching for authors like "walter" and "buchmann", the interface returns an error:
```
AttributeError at /cmsw/search
'NoneType' object has no attribute 'startswith'
Request Method: POST
Request URL: http://me.big.tuwien.ac.at/cmsw/search
Django Version: 3.2.18
Exception Type: AttributeError
Exception Value:
'NoneType' object has no attribute 'startswith'
Exception Location: /apps/app/searchutils.py, line 146, in
Python Executable: /usr/local/bin/python
Python Version: 3.7.16
Python Path:
['/',
'/usr/local/lib/python37.zip',
'/usr/local/lib/python3.7',
'/usr/local/lib/python3.7/lib-dynload',
'/usr/local/lib/python3.7/site-packages']
Server time: Mon, 17 Apr 2023 14:07:01 +0000
```
- Other authors return a seemly low number of publications, like "guizzardi" returning 10.
- The summary tables on the homepage, like "Detailed Analysis of Institute By Papers", do not seem to agree with the paper's numbers. For example, the University of Vienna has 21 publications according to the paper, and only 7 according to the website.
- I would like to have access to the working Web Knowledge Base to better understand the taxonomies listed in step 4 of Section 3.2. Most searches for publications I tried running returned errors, even "uml", which is used in the example of Figure 21.
- I believe that some sentences are problematic in terms of causation vs correlation. This excerpt is an example of what I mean: "This confirms that the field was growing but was not maturing until then. Since 2019, the number of conference papers published has come down to a level similar to the number of journal articles, which indicates that the research in the field is starting to mature in recent years (see Fig. 5)."
- First, I believe that more rigorous analysis is called for to support conclusions about the maturity of the research in the field.
- Second, I find it difficult to disassociate the drop in conference papers in 2019 and the global pandemic. The dataset doesn't consider how conferences have been organized in those years, neither how many submissions they received overall.
- The information in tables 3 and 4 (they are actually 3 tables) seems strange. If these tables are indeed correct and there no incomplete information in the spreadsheets, there are huge long tails in the distributions of publications per institution and venue. Is it possible for the authors to share their raw spreadsheets? Check this out:
- Total publications analyzed: 484
- Top 10 institutions summed up (including overlaps, I assume): 23+21+14+10+8+8+7+7+6+6=110
- Minimum overall number of institutions: (484-110)/6=63
- Top 10 conferences summed up: 17+6+5+5+5+4+4+3+3+3+3+3=61
- Top 10 journals summed up: 6+5+5+4+4+4+4+3+3+3+3=44
- Minimum overall number of venues (conferences and journals): (484-61-44)/3=127
- These numbers mean that there are at least (likely much more) 63 other institutions and 127 other venues beyond those listed in the tables.
- Bear in mind that several papers would have co-authors from different institutions, increasing their number. Also, the same institution could be listed with the names of their research groups or under different spellings may be affecting the data.
- Regarding plots like the one in Fig. 6, it is hard to make any conclusions about the trends in the data. If you don't compare the rates of growth of each line with the overall rate of growth in the field, the former could be simply following the latter.
- Take this excerpt now: "In the mid-2000s, all categories started from a low level, while the number of publications on Linked Data and Vocabularies increased considerably after 2011, the number of publications on Inference and Queries achieved merely a slightly higher level in this time period." I think this is an interesting finding in the SMS deserving of a deeper reflection. Is it the case that queries and inferences are not seen as relevant subjects in the intersection between CM and SW by the community? Or is it the case that authors simply view queries and inferences as a byproduct of expressing their models in the SW world? The information in the taxonomy of Fig. 10 can help in this analysis.
- Again, in Fig. 14 it seems hard to disassociate the change in numbers from the overall growth in the number of publications. Also, some combinations simply have a sample size too small to support any meaningful conclusions.
- That the following excerpt: "Overtime notably the combination of methods with representation have grown considerably, as well as in general all of the largest combinations mentioned above. However, the combinations of taxonomy elements in the lower left corner exhibited a significant decrease over time." Fig. 15 does not support this analysis over time.
- Please consider carefully how significant are the differences between bubble sizes when drawing your conclusions. In Fig. 16 I cannot see a significant difference between BPMN and OCL, ER, or AML. I am not saying that there isn't one, but the plot doesn't make this clear.
- Is it the case that some combinations are especially interesting, or do they simply involve more popular elements? For instance, in Fig. 17, it seems that any combination with UML will be heavily influenced by the overall number of publications involving UML.
- Could you please check whether the affiliations of the most prominent authors (e.g., D. Gasevic) are listed correctly in Table 3? Something seems strange. I am also surprised that there is no cluster around Guizzardi in Fig. 19 given his involvement with two top institutions of Table 3 and the number of publications involving OntoUML.
- As of April 19, some features of the website are back online (e.g., searching for "uml" works now). The tables remain inconsistent however with the search itself. For example, there are 2 publications associated with "guizzardi" according to the list of authors, but the search returns 10. Still, please consider this return of features when reading the comments above.
- I really appreciate the authors' effort to provide an interface for users to crawl their data. However, I would invite them to go one step further and embrace the semantic web approach by publishing the metadata as linked data. Doing so while adopting the FAIR principles would be even better.
- Table 3, the University of Valencia is not in China.
- Table 4, it is curious that there are as many publications in the SW/CM intersection in the IEEE Aerospace Conference and the International Semantic Web Conference. I am not saying that it is wrong, but I wonder if CM authors avoid/cannot publish at ISWC.
- Regarding the arguments about the field's maturity, I wonder how much of it relates to how the community is organized as well, i.e., how prevalent conference papers are. Also, when compared to other fields, conference papers in the CM and SW fields are subject to quite a rigorous peer review that evaluates the submission in its entirety, rather than just an abstract, for example.
## Minor Comments
- From the beginning, you tie conceptual models to information system design but leave the "semantic web" free of that constraint. I invite you to think of both in the same manner, as this idea may justify the unintended presence of design concerns in conceptual models.
- "was was"
- "When Sandkuhl et al. (2018) conceived...": I guess this reference is not compatible with the brackets style. Shouldn't it be "Sandkuhl et al. [13]"? This seems correct later in the conclusion; "... from Petersen [23] and Kitchenham [24]."
- I would have liked to see more about the research goals in the introduction. The intro does a good job on the motivation side, but there is very little on "what we are going to do" or "what we want to achieve" at that point.
- There is a different header in Table 1, "Topic".
- If you want to adopt the abbreviations SW and CM, it would be best to go all-in instead of using them alongside the full terms.
- In the text, spell logical operators using capital letters (e.g., "AND") the same way you spell them in your queries.
- In Figure 2, there are a few entries that could have been misspelled (e.g., missing spaces between words). This is not a major issue, but still a consistency issue.
- "One publication was assigned to only one Semantic Web activity area." Do mean "each publication was assigned"? Also, I would replace the Earlier reference to "W3C activity area" with "Semantic Web activity area" to avoid confusion.
- Please eliminate the names with vertical alignment in the x-axis, it is unnecessary and very space-demanding (e.g., Fig. 10).
- Consider re-ordering the taxonomies presented in Section 3 according to the order they are used in Section 4.
- Please increase the resolution of all figures but with special attention to Fig. 14 which is not readable even on a computer.
- Please avoid cropped plots. Fig. 16 is a major example of this.
- I noticed a few grammatical mistakes in the paper, but these can be corrected with the support of automated tolling.
|