Using Wikidata Lexemes and Items to Generate Text from Abstract Representations

Tracking #: 3367-4581

Mahir Morshed

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Tool/System Report
Ninai/Udiron, a living function-based natural language generation system, uses knowledge in Wikidata lexemes and items to transform abstract representations of factual statements into human-readable text. The combined system first produces syntax trees based on those abstract representations (Ninai) and then yields sentences from those syntax trees (Udiron). The system relies on information about individual lexical units and links to the concepts those units represent, as well as rules encoded in various types of functions to which users may contribute, to make decisions about words, phrases, and other morphemes to use and how to arrange them. Various system design choices work toward using the information in Wikidata lexemes and items efficiently and effectively, making different components individually contributable and extensible, and making the overall resultant outputs from the system expectable and analyzable. These targets accompany the intentions for Ninai/Udiron to ultimately power the Abstract Wikipedia project as well as be hosted on the Wikifunctions project.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/May/2023
Minor Revision
Review Comment:

# outline
This paper outlines Ninai/Udiron, a natural language generation system, designed to generate text from abstract representations, based on Abstract Wikipedia.
Specifically, the proposed system consists of two subsystems:
* Ninai, which combines abstract concepts, creates a consolidated abstract representation, and processes it into a syntax tree.
* Udiron, which converts the syntax trees to natural language text. Udiron is in principle capable of generating text in multiple languages, in accordance with the overarching goal of Abstract Wikipedia.
The author describes a number of conceptual, linguistic, and technical choices that should be taken into account by a system like Ninai/Udiron.

# strengths
* The Abstract Wikipedia project is very interesting, and the proposed systems present material steps towards realising the potential of the idea behind Abstract Wikipedia.
* The submission includes a URL to the implementation of Ninai/Udiron, open-sourced and accessible to the community.

# weaknesses
* While I understand that this is a paper submitted as a "Tools and Systems report", a more elaborate background section would be immensely helpful to the interested reader. Specifically, it is not entirely clear why the proposed abstract representation is chosen, what are its advantages and disadvantages, and how it compares to other abstract representations, like, for example a grounded to wikidata Abstract Meaning Representation.
* The description of the systems is superficial, and at times, can be confusing. Specifically,
* it is not clear what parts of the description are conceptual abstractions and what parts are technical/implementation abstractions. While both can be important for readers to understand the inner workings of Ninai/Udiron, the latter are less important, since there can be potentially multiple implementations of the same concepts.
* the state of the project is not very clear. While the author provides a URL to a (current) implementation of Ninai/Udiron, based on the paper text, it is not very easy for readers to identify which parts of this project are implemented, tested, stable, and ready to be used. I understand that parts of the implementation might depend on external factors, but a clear description of the current state of the project would be really useful to the interested reader.
* to my mind, some examples would make the description of Ninai/Udiron much stronger than it currently is. Concrete examples would make subsections of Section 3 much clearer. The same holds for the example of Figure 4; not all details of its relationship with Table 1 are clear. On a related note, the linked implementation seems to have some examples; it would be great if some of those could be used in the paper.
* a clearer version of Figure 3 might help make some of the modules and the relationship between each other more apparent to the reader.

Review #2
Anonymous submitted on 25/Jul/2023
Minor Revision
Review Comment:

The author proposes a natural language generation system that aims at transforming abstract representations of factual statements into human-readable text. The system makes use of Wikidata lexemes and items and is function-based in accordance with Wikifunctions. This system is made of two components. The first one is Ninai which generates syntax trees based on abstract representations; the second one is Udiron, which provides human-friendly sentences based on the generated syntax trees by Ninai. Although the system is not general enough there are continuous efforts to achieve this.
The system can have a great contribution to Abstract Wikipedia, in a way to allow generating new knowledge for multi-lingual settings. Ninai/Udiron uses the lexicographical data of Wikipedia in addition to its encyclopedic aspect in the test generation, which makes it one of the pioneer tools in this aspect.

The paper is overall easy to follow, and well-structured and organized. However, there are some sentences that need to be revised and have weird structures, meanings and/or words. Please proofread accordingly.

(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data: yes
(B) whether the provided resources appear to be complete for replication of experiments, and if not, why: repository seems to be complete
(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability: yes, on gitlab
(D) whether the provided data artifacts are complete: yes