Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Using Natural Language Generation to Bootstrap Missing Wikipedia Articles: A Human-centric Perspective

Submitted by Lucie-Aimée Kaffee on 09/30/2020 - 04:11

Tracking #: 2586-3800

A new version of this paper is available

Authors:

Lucie-Aimée Kaffee

Pavlos Vougiouklis

Elena Simperl

Responsible editor:

Philipp Cimiano

Submission type:

Full Paper

Abstract:

Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in 14 under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate a 'introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.

Full PDF Version:

swj2586.pdf

Revised Version:

Using Natural Language Generation to Bootstrap Missing Wikipedia Articles: A Human-centric Perspective

Previous Version:

Using Natural Language Generation to Bootstrap Empty Wikipedia Articles: A Human-centric Perspective

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By John Bateman submitted on 23/Dec/2020

Suggestion:
Minor Revision

Review Comment:

This revision has done a very good job at meeting the original review comments and is now a very strong paper. Perhaps its main contribution, apart from the very practical one of moving towards a more linguistically and culturally balanced wikipedia, is the very extensive and well explained evaluation stages. For the actual range of material generated automatically, this degree of extensive evaluation might seem a little like overkill, but its strenght is that this procedure can then naturally extend when the material generated itself becomes more substantial, and indeed for related generation tasks in this context. The paper is well written and it is now very clear what was happening at each stage and why each design decision was taken. I have just a few extremely minor corrections/typos and suggestions for final improvements, that I list below.

p. 1, l. 21: something odd here, "a 'introductory" --> "an introductory"

p. 2 left:
p. 2, l. 23: "produce one summary sentence" --> "produce a single summary sentence"
p. 2, l. 29: "as it is the case" --> "as is the case"
p. 2, l. 40: "to generate a short Wikipedia-style summary" --> "to generate a Wikipedia-style summary sentence"

p. 2 right:
p. 2, l. 26: "the text is fluent" --> "the sentence is fluent"
p. 2, l. 36: "the summaries generated" --> "the summary sentences generated"

p. 4:

the point here of using the generated versions as a way of keeping a more 'local'
expression that is not just a translation of the English is important and interesting. This
then adds a further potential consideration for the review of previous relevant text generation
work, particularly as given in §2.2. In addition to the work on producing text from
triples and similar, there is also the earlier work on multilingual natural language
generation, which was sometimes argued for as an alternative to translation precisely
because one would then be able to generate text fully within a target culture's norms
and not simply as a translation. Examples of such work and discussion include:

Kruijff, G.-J.; Teich, E.; Bateman, J. A.; Kruijff-Korbayová, I.; Skoumalová, H.; Sharoff, S.; Sokolova, L.; Hartley, T.; Staykova, K. & Hana, J. A multilingual system for text generation in three Slavic languages Proceedings of the 18th. International Conference on Computational Linguistics (COLING'2000), 2000, 474-480

Bateman, J. A.; Matthiessen, C. M. I. M. & Zeng, L. Multilingual natural language generation for multilingual software: a functional linguistic approach. Applied Artificial Intelligence, 1999, 13, 607-639

Hartley, A. & Paris, Cécile. Multilingual document production: from support for translating to support for authoring Machine Translation, 1997, 12, 109-129

and going all the way back to:

Kittredge, R.; Polguère, A. & Goldberg, E. Synthesizing weather reports from formatted data Proceedings of the 11th. International Conference on Computational Linguistics, International Committee on Computational Linguistics, 1986, 563-565

Some of the issues here may become more relevant still when generation moves beyond the single sentence stage.

p. 21, right, l. 39: "the generated summary" --> "the generated summary sentence"

Review #2

By Leo Wanner submitted on 26/Dec/2020

Suggestion:
Minor Revision

Review Comment:

The authors incorporated most of the comments into the revised version of the submission; the paper reads now much better and also reflects better the acquaintance of the authors with the state of the art in NLG. with one exception: the authors should still justify why they do not adapt as one of the baselines a state-of-the-art neural RDF-to-text model.

Review #3

By Denny Vrandecic submitted on 17/Jan/2021

Suggestion:
Accept

Review Comment:

I refer to the review of the submission of the previous version of this paper. The major issues and many minor ones have been either considerably or entirely resolved. Based on that, I suggest to accept the paper for publication.

Log in or register to post comments
3350 reads

Main menu

Editorial Board

Syndicate

Using Natural Language Generation to Bootstrap Missing Wikipedia Articles: A Human-centric Perspective

Tracking #: 2586-3800

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Using Natural Language Generation to Bootstrap Missing Wikipedia Articles: A Human-centric Perspective

Tracking #: 2586-3800

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles