A New Open Information Extraction System Using Sentence Difficulty Estimation

Tracking #: 1728-2940

This paper is currently under review
Vahideh Reshadat
Maryam Hoorali
Heshaam Faili

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
Open Information Extraction (OIE) is a relation-independent extraction paradigm designed to extract assertions directly from massive and heterogeneous corpora. Because of its propriety for applications that rely on large-scale relation extraction, a main requirement for Open Relation Extraction (ORE) systems is low computational cost. A large number of ORE methods have been proposed recently, covering a wide range of NLP tools, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling). There is a tradeoff between NLP tools depth versus efficiency (computational cost) of ORE systems. The deeper the tools, the higher is the computational cost of ORE systems. This paper describes a novel approach called Sentence Difficulty Estimator for Open Information Extraction (SDE-OIE) for automatic estimation of relation extraction difficulty by developing some difficulty classifiers. We train some classifiers which apply a diverse set of features and assign each input sentence to the proper OIE extractor based on difficulty score. Therefore, they would combine the advantages of shallow and deep OIE extractors while attempting to avoid the limitations of each. Our evaluations show that intelligent selection of proper depth of ORE systems has a significant efficacy on the effectiveness and scalability of SDE-OIE. It avoids wasting resources and achieves the same performance as its constituent deep extractor in a more reasonable time.
Full PDF Version: 
Under Review