Gender differs in how to say things. Age does in what to say.

Tracking #: 1633-2845

This paper is currently under review
Seifeddine Mechti
Rim Faiz
Maher Jaoua
Frabcisco Rangel
Lamia Hadrich Belguih

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
In this study, we present an original method for profiling the author of an anonymous English text. The aim of the proposed method is to determine the authors’ age and sex, especially authors of user-generated content in social media. To obtain the best classification, machine learning methods were used in previous works. However, two important details were ignored in the proposed approaches: (1) in most cases, authors are classified according to their speeches and the expressions they use, but this classification does not show the type of features useful for each dimension (age, sex).Our study is based on the hypothesis that gender depends on the writing style, while age depends on the text content. (2) Methods using the Bayesian networks did not yield the best results. Therefore, we propose a method relying on advanced Bayesian networks for age prediction based on content features and decision trees for gender detection based on stylistic features to overcome the previously-mentioned problems. Our experimentation proved that gender differs in how to say things whereas age differs in what to say. Our method showed a high accuracy level by achieving one of the best results at the PAN@CLEF 2013 shared task: we obtained the second rank for gender prediction and third rank for joint (age plus gender) identification.
Full PDF Version: 
Under Review