Orbis: Explainable Benchmarking of Information Extraction Tasks

Tracking #: 2877-4091

This paper is currently under review
Adrian M.P. Brasoveanu
Albert Weichselbraun
Roger Waldvogel
Fabian Odoni
Lyndon Nixon

Responsible editor: 
Anna Lisa Gentile

Submission type: 
Full Paper
Competitive benchmarking of information extraction methods has considerably advanced the state of the art in this field. Nevertheless, methodological support for explainable benchmarking, which provides researchers with feedback on the strengths and weaknesses of their methods and guidance for their development efforts, is very limited. Although aggregated metrics such as F1 and accuracy support comparison of annotators, they do not help in explaining annotator performance. This work addresses the need for explainability by presenting Orbis, a powerful and extensible explainable evaluation framework which supports drill-down analysis, multiple annotation tasks and resource versioning. It, therefore, actively aids developers in better understanding evaluation results and identifying shortcomings in their systems. Orbis currently supports four information extraction tasks: content extraction, named entity recognition, named entity linking and slot filling. This article introduces a unified formal framework for evaluating these tasks, presents Orbis’ architecture, and illustrates how it (i) creates simple, concise visualizations that enable visual benchmarking, (ii) supports different visual classification schemas for evaluation results, (iii) aids error analysis, and (iv) enhances interpretability, reproducibility and explainability of evaluations by adhering to the FAIR principles, and using lenses which make implicit factors impacting evaluation results such as tasks, entity classes, annotation rules and the target knowledge graph more explicit.
Full PDF Version: 
Under Review