Publishing and Using Parliamentary Linked Data on the Semantic Web: ParliamentSampo System for Parliament of Finland

Tracking #: 3605-4819

Authors: 
Eero Hyvonen
Laura Sinikallio
Petri Leskinen
Senka Drobac
Rafael Leal
Matti La Mela
Jouni Tuominen
Henna Poikkkimäki
Heikki Rantala

Responsible editor: 
Guest Editors KG Gen from Text 2023

Submission type: 
Tool/System Report
Abstract: 
This paper presents a new infrastructure and semantic portal called ParliamentSampo for studying parliamentary speeches, culture, language, and activities in Finland. For the first time, the entire time series of some million plenary speeches of the Parliament of Finland (PoF) since 1907 have been converted from text into knowledge graphs and data services in unified formats, including CSV, Parla-CLARIN, ParlaMint, and RDF Linked Open Data (LOD). The speech data have been interlinked with a semi-automatically created ontology and a knowledge graph about the activities of over \num{2800} Members of Parliament (MP) and other speakers in the plenary sessions of the PoF. The data was enriched by data linking to external data sources and by reasoning into a broader LOD service. Knowledge extraction techniques based on Natural Language Processing (NLP) were used for automatic semantic annotations and topical classification of the speeches. The data and data services have been used in Digital Humanities (DH) research projects and for application development, especially for developing the in-use semantic portal ParliamentSampo. The infrastructure and the portal were published on February 14th 2023 on the Web using the open CC BY 4.0 license, and quickly gathered thousands of users, including citizens, media, politicians, and researchers of politics. \ps\ is a new member in the ``Sampo'' series of over 20 interlinked LOD services and semantic portals in Finland, based on a national Semantic Web infrastructure.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Jan/2024
Suggestion:
Minor Revision
Review Comment:

The paper describes a mature project for studying the parliamentary culture, language, and activities of politicians in Finland by publishing entire time series of some million plenary speeches of the Parliament of Finland (PoF) since 1907 as Linked Open Data and data services. After a good motivation already pointing out some examples for analysis with the published data, it describes the whole process of data preparation and publishing in detail. It provides insights in this process that might be useful for other projects as well. The authors took much effort to show how to use the results of their project such as exporting the data for external use, querying the endpoint and studying results, data-analysis by scripting and using the ParliamentSampo portal supporting faceted search based on ontologies and analysis with the help of seamlessly integrated visualization and data analysis tools.

The only disadvantage seems to be that while it might be most interesting for finnish citizens, it might be of less interest in the international context. It would be a good extension and future work may deal with speeches in international parliaments and institutions.

Large parts of the paper (like in the introduction, but also in other sections) are the same as in the TEXT2KG version of the paper (https://ceur-ws.org/Vol-3447/Text2KG_Paper_1.pdf). The authors should revise the paper to decrease parts that are the same in both versions.

Comments regarding the review dimensions:

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

According to the authors, the data, system, resources and portal has been used already by thousands of users, such that it definitely has a large impact at least in Finland, and shows definitely the possibilities for research and use cases for parliamentary linked data, and is hence a leading pioneer for other countries to follow and based on the experiences made in Finland go for publishing political debates as well.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess
(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,
(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,
(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability.

The data is published on Zenodo with its advantages like long-term availability and access via doi. The published data follows also the LOD principles, which is another plus. The project page of the Semantic Computing Research Group (SeCo) is in finnish and I did not see a way to access an english version of the project webpage. This excludes international users, which is a pity and I recommend to set up an english version of the project webpage as well.

The available data and how to import/use it seems to be clear by providing files in the CSV- and XML-formats (but did not check completely because of the mass of data).

(4) whether the provided data artifacts are complete.

Seems to be, but I dod not check completely.

There is a cut of the data in 2022. An update to include current speeches (of the year 2023) and continuosly updating the data is recommended.

Review #2
Anonymous submitted on 09/Feb/2024
Suggestion:
Accept
Review Comment:

PARLIAMENTSAMPO demonstrates a good quality by transforming parliamentary speeches into accessible knowledge graphs and data services. The evidence of its utility and applicability is reflected in its by several of users, including citizens, media, politicians, and political researchers. The significance of this work lies in its ability to facilitate access to and analysis of historical parliamentary speeches, thereby enriching the field of Digital Humanities and promoting greater democratic transparency.

This paper exemplifies a good quality research and development work in the field of Digital Humanities by effectively leveraging parliamentary speeches, which are traditionally textual and static data sources, and transforming them into dynamic, accessible knowledge graphs and data services. This transformation not only enhances the accessibility of historical parliamentary data but also facilitates complex analyses and visualizations that were previously challenging or impossible to conduct. The utility and applicability of PARLIAMENTSAMPO are further underscored by its widespread adoption across a diverse user base.

The paper excels in articulating the functionalities, potential uses, and inherent constraints of PARLIAMENTSAMPO. The resources provided by PARLIAMENTSAMPO are comprehensive, reflecting a deliberate effort to ensure that they can support the replication of experiments and facilitate rigorous academic inquiry.

the paper not only presents PARLIAMENTSAMPO as a pioneering tool in the transformation and utilization of parliamentary data but also as a catalyst for new research methodologies in the Digital Humanities and related fields. The system's development, characterized by innovative data processing techniques and user-centric design, sets a benchmark for similar initiatives aiming to unlock the potential of historical and political data. Through this work, the authors contribute significantly to the advancement of knowledge discovery and dissemination in the digital age, paving the way for future explorations and applications in the rich domain of parliamentary records.

PARLIAMENTSAMPO has been utilized in Digital Humanities research projects and application development, such that the resources are comprehensive and structured in a way that supports replication of experiments.

Review #3
Anonymous submitted on 02/Mar/2024
Suggestion:
Minor Revision
Review Comment:

This paper introduces a novel framework and semantic portal named PARLIAMENT SAMPO, designed for the analysis of parliamentary speeches, culture, language, and activities in Finland. Natural Language Processing (NLP) techniques were employed by the authors for the automatic semantic annotation and thematic classification of the speeches from the Parliament of Finland (PoF) dating back to 1907. The data and services have been utilized in Digital Humanities (DH) research projects and for creating applications, particularly for the operational semantic portal, PARLIAMENT SAMPO. This infrastructure and portal were launched on February 14, 2023, on the internet under the open CC BY 4.0 license. PARLIAMENT SAMPO is the latest addition to the "Sampo" series, comprising over 20 interconnected LOD services and semantic portals in Finland, rooted in a national Semantic Web infrastructure.

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided)

The tool is well motivated, uses a high-quality data source, and I believe offers a valuable use-case in the deployment and application of semantic web technologies. The data itself is based on the Parliament of Finland digitizing its minutes. However, utilizing these digital records is challenging due to several issues: they have been created independently for various periods, stored in disparate data formats, vary in terms of quality, and are missing descriptive metadata. PARLIAMENT SAMPO comprises two primary datasets, or knowledge graphs (KG), that span the entire history of the Parliament of Finland (PoF) since its inception in 1907 (Prosopographical Knowledge Graph or P-KG, and Speeches of Plenary Sessions). These are fairly well described.

Overall, I believe the work is novel as a system/tool and is detailed enough to be useful to other countries or groups looking to do something similar to advance digital humanities.

(2) Clarity, illustration, and readability of the describing paper

The paper is fairly readable, although it can be dense at times. However, I personally like the technical substantiveness of the material that has been presented. Figuratively citing an example of some records/data in the introduction itself might be useful in conveying how the motivating questions (enumerated in the introduction) can be answered. I encourage the authors to do so to maximize impact of the work.

(3) Assessment of the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and whether the provided data artifacts are complete.

I went to the resource and could not find any problems with it. The repository seems well organized and meets the criteria of this track in the journal. I do have a suggestion that the authors include a brief readme file talking about the utility of the different files. Otherwise, it is well documented. Maybe the authors do talk about the organization files somewhere, but having a README file as an explicit part of the repo is useful.

I am recommending a minor revision to address the two comments above (adding some kind of figurative illustration in the introduction to increase impact and provide better intuition of the tool, and to add an explicit README file in the repo).