Review Comment:
This paper introduces a novel framework and semantic portal named PARLIAMENT SAMPO, designed for the analysis of parliamentary speeches, culture, language, and activities in Finland. Natural Language Processing (NLP) techniques were employed by the authors for the automatic semantic annotation and thematic classification of the speeches from the Parliament of Finland (PoF) dating back to 1907. The data and services have been utilized in Digital Humanities (DH) research projects and for creating applications, particularly for the operational semantic portal, PARLIAMENT SAMPO. This infrastructure and portal were launched on February 14, 2023, on the internet under the open CC BY 4.0 license. PARLIAMENT SAMPO is the latest addition to the "Sampo" series, comprising over 20 interconnected LOD services and semantic portals in Finland, rooted in a national Semantic Web infrastructure.
(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided)
The tool is well motivated, uses a high-quality data source, and I believe offers a valuable use-case in the deployment and application of semantic web technologies. The data itself is based on the Parliament of Finland digitizing its minutes. However, utilizing these digital records is challenging due to several issues: they have been created independently for various periods, stored in disparate data formats, vary in terms of quality, and are missing descriptive metadata. PARLIAMENT SAMPO comprises two primary datasets, or knowledge graphs (KG), that span the entire history of the Parliament of Finland (PoF) since its inception in 1907 (Prosopographical Knowledge Graph or P-KG, and Speeches of Plenary Sessions). These are fairly well described.
Overall, I believe the work is novel as a system/tool and is detailed enough to be useful to other countries or groups looking to do something similar to advance digital humanities.
(2) Clarity, illustration, and readability of the describing paper
The paper is fairly readable, although it can be dense at times. However, I personally like the technical substantiveness of the material that has been presented. Figuratively citing an example of some records/data in the introduction itself might be useful in conveying how the motivating questions (enumerated in the introduction) can be answered. I encourage the authors to do so to maximize impact of the work.
(3) Assessment of the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and whether the provided data artifacts are complete.
I went to the resource and could not find any problems with it. The repository seems well organized and meets the criteria of this track in the journal. I do have a suggestion that the authors include a brief readme file talking about the utility of the different files. Otherwise, it is well documented. Maybe the authors do talk about the organization files somewhere, but having a README file as an explicit part of the repo is useful.
I am recommending a minor revision to address the two comments above (adding some kind of figurative illustration in the introduction to increase impact and provide better intuition of the tool, and to add an explicit README file in the repo).
|