Publishing and Using Parliamentary Linked Data on the Semantic Web: ParliamentSampo System for Parliament of Finland

Tracking #: 3683-4897

Authors: 
Eero Hyvonen
Laura Sinikallio
Petri Leskinen
Senka Drobac
Rafael Leal
Matti La Mela
Jouni Tuominen
Henna Poikkkimäki
Heikki Rantala

Responsible editor: 
Guest Editors KG Gen from Text 2023

Submission type: 
Tool/System Report
Abstract: 
This paper presents a new infrastructure and semantic portal called ParliamentSampo for studying parliamentary speeches, culture, language, and activities in Finland. For the first time, the entire time series of some million plenary speeches of the Parliament of Finland (PoF) since 1907 have been converted from text into knowledge graphs and data services in unified formats, including CSV, Parla-CLARIN, ParlaMint, and RDF Linked Open Data (LOD). The speech data have been interlinked with a semi-automatically created ontology and a knowledge graph about the activities of over 2800 Members of Parliament (MP) and other speakers in the plenary sessions of the PoF. The data was enriched by data linking to external data sources and by reasoning into a broader LOD service. Knowledge extraction techniques based on Natural Language Processing (NLP) were used for automatic semantic annotations and topical classification of the speeches. The data and data services have been used in Digital Humanities (DH) research projects and for application development, especially for developing the in-use semantic portal ParliamentSampo. The infrastructure and the portal were published on February 14th 2023 on the Web using the open CC BY 4.0 license, and quickly gathered thousands of users, including citizens, media, politicians, and researchers of politics. ParliamentSampo is a new member in the ``Sampo'' series of over 20 interlinked LOD services and semantic portals in Finland, based on a national Semantic Web infrastructure. Although the paper uses Finnish parliamentary data as a case study, the approach, methods, and tools presented can be adapted also to other parliamentary datasets in other countries.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 24/Jun/2024
Suggestion:
Accept
Review Comment:

(1) Quality, importance, and impact
This paper presents the ParliamentSampo System and discusses the process of transforming the national parliamentary corpora of the Parliament of Finland (PoF) into a“machine-undestandable”data corpus.
The results are very important as the dataset covers almost the whole history of the PoF since 1907. Nearly million speeches and over 2800 parliamentarians has been created and published openly as harmonized enriched open data with data services.

The established PARLIAMENTSAMPO portal (built upon the data and services) demonstrates how the data can be used for application development. This obviously will have impact both on the research in the Linked Data field, as well as in the parliamentary research studies, as the approach is generic and can be reused in other countries.

(1) Clarity, illustration, and readability

The process of building the PARLIAMENTSAMPO Dataset is clear. Additionally, for more information, the authors have cited their previous work and provided links to open resources in Finland.

(3) “Long-term stable URL for resources”:
PARLIAMENTSAMPO enhances the national Semantic Web infrastructure, please check https://www.ldf.fi/. The link to to ParliamentSampo Knowledge Graph is public (https://zenodo.org/records/7636420).

However I had difficulties to open https://doi.org/10.5281/zenodo.7636419 (I have received 504 Gateway Time-out error). Please check or remove the link from the journal paper.

Review #2
Anonymous submitted on 16/Aug/2024
Suggestion:
Accept
Review Comment:

My comments have been dealt with in this revision.

Review #3
Anonymous submitted on 02/Sep/2024
Suggestion:
Accept
Review Comment:

The article is well written and easy to read. It handles a real problem and an interesting problem: PARLIAMENTSAMPO is a tool that transforms parliamentary speeches into accessible knowledge graphs and data services. Even some improvements have been introduced with respect to the previous version.