Multi-label categorizations and question answering for imbalanced raw web-based data

Tracking #: 2025-3238

Julien Lacombe
Rémy Chaput
Julien Perier-Camby
Feras Al Kassar
Marc Bertin
Frédéric Armetta

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
The endless vastness of the Internet makes it necessary to offer extraction and classification tools that are essential for a good design and use of tomorrow's applications. A challenging task is to predict labels for a large raw web-based data. An other challenging task is related to the general question answering problem or how to freely question a system. Machine learning techniques and neural networks can address theses problems individually, we introduce an extension of memory networks to tackle these problems together. We show that our proposal is a pragmatic way to increase the set of questions the network can answer with multi-label predictions, and find numerous applications. The proposed approach appears to be competitive when compared with the top-ten classifiers dedicated to multi-label categorization. Experimental results show that efficiency of predictions remain constant when applying multi-question tasks on the same network. Web datasets are usually imbalanced and costly to acquire, which can deteriorate the quality of predictions. Results and parameters are discussed in relation to the rarity of data and how the internal representation can be used to improve the efficiency of multi-label categorization.
Full PDF Version: 

Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Thierry Declerck submitted on 03/Dec/2018
Minor Revision
Review Comment:

General comment. The new version is very interesting to read and the authors propose relevant evaluation studies. What I am missing is a deeper consideration of the Semantic web aspects. So for example the last sentence of section 6 could be elaborated.
Also it would be good to consider a typical Semantic web data sets (DBpedia, Wikidata, etc...) as the input for the experiment. The selected data set can be considered as semi-structured, but seems to be far from the formalization offered by semantic-web data sets.
In order to fit into the topics of his special issue, I would really recommend to give more details on how the semantic web could benefit from the (pretty good) work presented in this submission.

1) Beginning of section 2, the authors write "SPARQL Protocol and RDF Query Language are commonly used to reach documents from the Web but have to be fed with tags that can’t be easily provided in a huge and dynamic environment such as the Web.". I guess that the first occurence of "Web" should be "Semantic Web"

2) I didn't find a link to the Reuter's data set used in the submission.

Review #2
Anonymous submitted on 05/Dec/2018
Review Comment:

The revised version of the paper “Multi-label categorizations and question answering for imbalanced raw web-based data” was greatly improved by the authors.

The major points raised by reviewers in the first round were satisfactorily addressed. The content, the presentation, and the English now meet the standards for publication, which I therefore recommend. The manuscript just contains some typos that will be corrected in the proofs.

Review #3
Anonymous submitted on 21/Jan/2019
Major Revision
Review Comment:

This is a brief solicited review.

I cannot recommend the paper in its current form. It suffers from a scope that is too broad (predicting ‘labels’ for ‘raw web-based data’ – what are labels and what is web data?) and so poorly written that it is difficult to understand the intent of the paper and the description of its technical contribution. While the application of an end-to-end memory network to web document labeling is an interesting idea, the poor presentation does not make up this or the comprehensive and well thought out experimental evaluation.

Review #4
Anonymous submitted on 24/Jan/2019
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.