Review Comment:
This manuscript was submitted as 'Data Description' and should be evaluated along the following dimensions: (1) Quality and stability of the dataset - evidence must be provided. (2) Usefulness of the dataset, which should be shown by corresponding third-party uses - evidence must be provided. (3) Clarity and completeness of the descriptions.
This paper presents the WarSampo knowledge graph, a shared semantic infrastructure, and its Linked Open Data service, which is aiming at publishing data about the Second World War, with special focus on the Finnish military history. As it is widely acknowledged, cultural heritage represents a complex domain due to the heterogeneity of its contents and, for the same reason, it is one of the domains where Semantic Web technologies and Linked Data can bring great advantages and benefits to the Digital Humanities research.
This paper is largely built upon on the authors' previous works in the domain. Even though there are no new relevant research results in this paper, I think this is a good use of existing works since it provides a comprehensive and detailed overview of WarSampo knowledge graph and its Linked Open Data service for the first time. The proposed work describes an extensive dataset of 14 million triples, on which WarSampo portal is based. The general approach seems quite appropriate and the work well fits in the Linked Data Descriptions category of the journal.
The authors provide a clear description of what are the source datasets (in particular, I appreciated the organization and the high detail level of the information presented in Table 1), the event-based data model used for harmonizing the data, and the data transformation process in 5 steps to populate the model.
***Quality and stability of the dataset***
The quality of the data is certainly high, since most of the considered sources have been provided by national archives, institutions and associations. Also, the stability of the datasets is not questioned; URIs seem to be stable and reliable, and the versions’ history of the datasets is presented in Table 3. The dataset is realised under the Creative Commons BY 4.0.
***Usefulness (or potential usefulness) of the dataset***
The WarSampo knowledge graph represents interconnected data about events occurred during the Second World War as well as information about the actors’ lives (e.g., soldiers) that participated in. The digital humanities research aims to grasp the potential of this data for humanistic inquiry, but also the general public could be interested in finding out information about, e.g., battles, soldiers’ lives, etc. The authors reported that more than 500 000 end-users accessed the datasets through the WarSampo portal. I would be curious to know how many of those end-users are domain experts, such as, e.g., historians, and access the data for research purposes. In other words, it would be interesting to know in detail how this data is used by interested parties.
***Clarity and completeness of the descriptions***
The whole paper is well-written and quite easy to follow. The authors clearly described the sources datasets, the vocabularies and ontologies used in the data model and the data transformation process for populating the data model. The main classes in the data model are described well, and there are two good diagrams (represented in Figure 1 and Figure 2) of how these classes interact. However, an example with real individuals and properties might have been nice.
A few minor comments/suggestions for the improvement of the work are as follows:
- in the Abstract, it is reported that WarSampo portal had have over 400 000 end-users since 2015, while in Section 1 this number changes to 550 000.
- I would suggest changing the structure of contents in this way: Section 1 should be the “Introduction” of the paper, while Section 2 should present the WarSampo initiative, including the few related works mentioned. The outline of the following sections should be moved at the end of the Introduction.
- Footnote n. 9: website page not found.
- In Section 4, the explanation of the difference between domain ontologies and meta-datasets should be added, as you did in one of your previous works - i.e. reference [20].
- The KG webpage on the LDF platform is well organized and contains all the important information, but I only got through to a couple of SPARQLE query examples. I would suggest to the authors to add a larger selection, since they can help users to become familiar with the dataset schemas and would give a clearer idea of what can be achieved.
- In a previous version of the datasets, FOAF vocabulary was used in your schema for modelling, e.g., family names and firstnames. I am not clear why you decided to remove it from the current version and preferred to use your own ontology.
- In Figure 2, the WarSampo core classes are presented. Why crm:E52_Time-Span is not included as core class? Time should be a core class in an event-based model, since events are mainly characterized by times.
|