Difficulty-level Modeling of Ontology-based Factual Questions

Tracking #: 1898-3111

This paper is currently under review
Vinu E. V
P Sreenivasa Kumar

Responsible editor: 
Michel Dumontier

Submission type: 
Full Paper
Semantics-based knowledge representations such as ontologies are found to be very useful in automatically generating meaningful factual questions. Determining the difficulty-level of these system-generated questions is helpful to effectively utilize them in various educational and professional applications. The existing approach for for predicting the difficulty-level of factual questions utilizes only few naive features and, its accuracy (F-measure) is found to be close to only 50% while considering our benchmark set of 185 questions. In this paper, we propose a new methodology for this problem by identifying new features and by incorporating an educational theory, related to difficulty-level of a question, called Item Response Theory (IRT). In the IRT, knowledge proficiency of end users (learners) are considered for assigning difficulty-levels, because of the assumptions that a given question is perceived differently by learners of various proficiency levels. We have done a detailed study on the features/factors of a question statement which could possibly determine its difficulty-level for three learner categories (experts, intermediates, and beginners). We formulate ontology-based metrics for the same. We then train three logistic regression models to predict the difficulty-level corresponding to the three learner categories. The output of these models is interpreted using the IRT to find a question’s overall difficulty-level. The accuracy of the three models based on cross-validation is found to be in satisfactory range (67-84%). The proposed model (containing three classifiers) outperforms the existing model by more than 20% in precision, recall and F1-score measures.
Full PDF Version: 
Under Review