Reducing the Underrepresentation of Transnational Writers through Biographical Event Extraction

Tracking #: 3556-4770

Authors: 
Marco Stranisci
Viviana Patti
Rossana Damiano

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
Wikidata represents an important source of literary knowledge, collaboratively created and curated by a large community of users. In this archive, it is possible to find hundreds of thousands pages about writers and their works. However, Wikidata is affected by the underrepresentation of Transnational authors, as recently demonstrated in several studies. In this paper we present an approach for the augmentation of structured knowledge about Transnational writers by automatically extracting biographical information from Wikipedia. The approach is based on 4 distinct modules: Coreference Resolution, Event Detection, Named Entity Recognition, and Entity Linking. Modules are combined through Lexico-Semantic Patterns, which represent a tool for mapping extracted knowledge into Wikidata semantic model. Results show that our approach dramatically increases the number of biographical triples on Wikidata for Transnational writers. Such enhanced knowledge fosters the discovery of these writers both by a general public, which can discover less known works through fairer Recommendation Systems, and researchers, who have access to more complete sources of structured information about them.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/Jan/2024
Suggestion:
Major Revision
Review Comment:

The paper presents a pipeline methodology to extract a number of specific biographical events related to literary authors from wikipedia.
* While the main aim of the paper seems to be reducing the underrepresentation of “transnational” authors on wikidata, the proposed method does not have any inherent characteristics that make it suitable for this specific task. As such, it could be applied to unstructured data about any type of author (“transnational” or otherwise).
* To my mind, the framing of the proposed approach as an underrepresentation mitigation work is problematic. The paper describes a biographical event extraction pipeline, tested on a number of author-related triples. The fact that running this system yields new triples for authors whose wikidata entires are incomplete, is an expected outcome of the process, not an underrepresentation mitigation methodology.
* While the new version includes some more details on the problem definition, most of my initial comments about the definition of the task still stand. Specifically,
* while the definition of "transnational" was made a bit clearer in this version, I think that it is ill-defined, both theoretically (what is the purpose of this distinction and who makes the cut of “transnationalism”?) and technically (there is no list of the "countries of birth" and "ethnic minorities in a Western country").
* while the motivation comment on discovery by tools and scholars stands, it is a general comment about incomplete knowledge graphs. Underrepresentation (i.e. not having a wikipedia page or many raw text sources easily available/in English) is not tackled by an automatic event extraction system.
* the authors do not discuss their positionality and how the definition of "transnational" they came up with emerges from their own understandings.
* underrepresentation is not discussed at length: the fact that a number of authors might not have had formal education, might not have received (or have been nominated for) any (well-known and well-represented) awards, actually perpetuates the issue that the proposed method sets out to address.
* Moreover, the proposed approach includes a number of experimental, technical, and methodological shortcomings. Specifically:
* section 4.1.1
* the difference between this step (Entity Detection) and the one described in section 4.2 (NER) is not clear.
* it would be interesting to see the performance of the same models without the extra training instances, since a training set of size 5 is potentially really small to change the performance of a large model. Combined with the fact that fine-tuning is run for 30 epochs, it might be possible that the model is overfitted to the small number of training examples. It would be nice to also test the model with more examples to gauge its generalisation ability.
* section 4.1.2
* is there a reason that you chose to create even splits for training/eval/testing?
* is there a reason why you did not combine all the training slices in one training set and train a general model?
* information about the event detection model is missing: type of model, experimental setup, values of hyperparameters.
* it would be great to include more details about data selection in Section 4.2, so that the proposed approach is reproducible. Specifically
* total size and distribution of training set: did you select some sentences from OntoNotes and MultiNerd? if so, how did you select them?
* NER model information is missing: type of model, experimental setup, values of hyperparameters.
* section 4.3
* what is the cosine similarity calculated on? embeddings of the mention and the wikipedia title? if so, what types of embeddings are used and why?
* since all scores are surface scores (string and/or embedding similarity between the mention and the wikipedia page title), how do you deal with aliases (e.g. “the king” for King Charles III) partial mentions (e.g. “king charles” for “Charles III”) and disambiguation pages? There is a number of EL systems that take the whole wikipedia page into account in an effort to mitigate some of those problems.
* section 5.1
* error analysis is very useful, especially for pipeline systems. However, in the analysis there seems to be a conflation between NER and entity linking: e.g. wrongly linking “Senate” to the “Senate of the United States of America” is a linking error, not a recognition error.
* Technical details are a crucial part of the description of a model, as they enable the reader to reproduce the described work. Mentioning “entity coreference resolution and event detection, based on finetuning a DistilBert-based Language Model (LM) on different combinations of documents from datasets that are reusable for this task” is not enough technical description of a scientific experiment.

Review #2
By Daniele Metilli submitted on 04/Feb/2024
Suggestion:
Major Revision
Review Comment:

Compared to the previous version of the paper, the authors have made significant improvements, addressing several issues that I had identified with regard to the methodology and evaluation. They have also addressed some technical shortcomings by implementing a much needed entity linking step. They have remade the evaluation and entirely rewritten the corresponding section. Finally, they have also gracefully accepted some of my suggestions as future work. I appreciate all these improvements and the significant efforts that the authors have put into this new version of the paper, sufficiently addressing most concerns about technical issues that I had listed in my previous review.

However, I still have serious concerns about the motivation and framing of the paper. The authors have built a system to generate Wikidata triple statements using text from Wikipedia which is general and applicable to any biographical entity, yet they are presenting its purpose in a misleading way, i.e., as a way to augment only a specific subset of under-represented biographical entities. The previous version of the paper clearly showed that if the tool is applied uniformly to all biographical entities, it is not able to address under-representation, because the whole set of biographical entities is improved, and the ones that are already over-represented are likely to be improved even more.

When the reviewers pointed out this issue, the authors’ solution was to hide the issue from the readers. They completely removed the over-represented entities and focused only on the under-represented ones, measuring how much they can be augmented. This choice seems highly questionable, effectively bending the results to preserve a flawed framing. This makes the paper worse, not better.

The other important issue that has not been adequately addressed is the definition of "Transnational", a novel invention by the authors which both me and Reviewer 1 had criticised. The authors attempted to improve the definition by looking at the ethnicities of all writers in the corpus and then keeping only those who originate from certain former colonies or are part of ethnic minorities in Western countries. It is true that compared to the previous version of the paper, the definition has now been made clearer. However, the term “Transnational” is still confusing and not supported by the literature, or even by its dictionary definition ("extending or operating across national boundaries").

I find this approach very problematic and question whether it is at all useful or needed to classify people in this way, especially considering that the system developed by the authors can clearly be applied to all writers without “othering” certain groups of people. What is the purpose of this strange classification system? The authors claim that they want to “avoid a colonial view”, but then specifically select those authors who come from former colonies or who belong to an ethnic minority in a Western (i.e., non-colony) country. In my view, any classification that divides people based on whether they ethnically originate from a former colony or not is inherently colonial.

In their efforts to exclude "false positives", i.e., anyone who is not "Transnational" enough according to their particular definition, the authors are effectively redefining whiteness / non-whiteness through proxy terminology. They seem to admit to this when they state that a white person from South Africa should not be considered "Transnational" regardless of where they were born or which countries they lived in. If "Transnational" means non-white, then I need to ask: Is it acceptable for a team composed exclusively of white European researchers to judge who is or isn’t white, who is or isn’t Western, who is or isn’t colonised, who does or does not belong to a minority?

My answer is no. As a fellow Italian — a descendant of colonisers who repeatedly invaded multiple regions of North Africa, mercilessly murdering tens of thousands of people, oppressing indigenous populations and stealing their wealth — I believe that any such attempt is harmful, even when the people who do it have the best intentions (as they do in this case). It is never up to the colonisers to define the identity of the colonised.

The more I read the paper, the more it seems to me that it cannot be fixed without completely reframing it and removing the focus on addressing under-representation, an objective that has arguably not been achieved by the authors. Moreover, compared to the original paper, the flawed dichotomy Western/Transnational is still there. The engagement with anti-colonial scholarship is still limited to a few references. The positionality statement requested by Reviewer 1 — which should be a requirement when discussing these topics — is one bare sentence in a footnote. This paper desperately needs to be reviewed — or even better, co-authored — by an expert in post-colonial studies.

As things stand, I unfortunately need to recommend Major Revision again. I am well aware that this evaluation may result in rejection due to the journal's two-strike policy, but given the issues above, it might be preferable for the authors to rework the paper and resubmit it in a different form, as the technical work has merit but is unfortunately dragged down by the highly misleading framing.