Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

Tracking #: 2710-3924

This paper is currently under review
Omri Suissa
Maayan Zhitomirsky-Geffet1
Avshalom Elmalech

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been devel-oped. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genea-logical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: 1) representing genealog-ical data as knowledge graphs, 2) converting them to texts, 3) combining them with unstructured texts, and 4) training a trans-former-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was per-formed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implica-tions for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general pub-lic.
Full PDF Version: 
Under Review