Abstract:
Successful exams require a balance of easy, medium, and difficult questions. Question difficulty is generally eitherestimated by an expert or determined after an exam is taken. The latter provides no utility for the generation of new questionsand the former is expensive both in terms of time and cost. Additionally, it is not known whether expert prediction is indeed agood proxy for estimating question difficulty. In this paper, we analyse and compare two ontology-based measures for difficulty prediction of multiple choice questions,as well as comparing each measure with expert prediction (by 15 experts) against the exam performance of 12 residents over acorpus of 231 medical case-based questions that are in multiple choice format. We find one ontology-based measure (relationstrength indicativeness) to be of comparable performance (accuracy = 47%) to expert prediction (average accuracy = 49%)