A Machine Learning Approach for Product Matching and Categorization

Tracking #: 1571-2783

Authors: 
Petar Ristoski
Petar Petrovski
Peter Mika
Heiko Paulheim

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Abstract: 
Consumers today have the option to purchase products from thousands of e-shops. However, the completeness of the product specifications and the taxonomies used for organizing the products differ across different e-shops. To improve the consumer experience, e.g., by allowing for easily comparing offers by different vendors, approaches for product integration on the Web are needed. In this paper, we present an approach that leverages neural language models and deep learning techniques in combination with standard classification approaches for product matching and categorization. In our approach we use structured product data as supervision for training feature extraction models able to extract attribute-value pairs from textual product descriptions. To minimize the need for lots of data for supervision, we use neural language models to produce word embeddings from large quantities of publicly available product data marked up with Microdata, which boost the performance of the feature extraction model, thus leading to better product matching and categorization performances. Furthermore, we use a deep Convolutional Neural Network to produce image embeddings from product images, which further improve the results on both tasks.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Kristian Kersting submitted on 26/Mar/2017
Suggestion:
Major Revision
Review Comment:

This is a resubmission. Therefore I will only focus on the issues I raised in my previous reviews:

(1) Some aspected of the experimental protocol are unclear: which significance test was used for evaluation, cross-validation used across all experiments?
(2) Missing related work, in particular on existing deep learning approaches for task at hand.
(3) Justification of the deep architecture used
(4) Reducing the number of times the word “deep” is used.

The authors have placed a general reference to deep learning and have included
several additional reference; from originally 20 up to 39. In particular, they have related the present paper to several of the references provided in my first review.
Thanks! So, (2) has been nicely addressed. And, the authors now justify the CNN architecture used by describing it in more details and providing a reference. As the present paper is less about the particular architecture, this is more than fine. So (3) has been addressed. Thanks! Also, the authors switch from “deep” to “neural” at several occasions, which I agree is the better alternative. Thanks! So, (4) has been addressed,
although it is not clear to me, why a McNemar test was used and not some t-test. The McN
is for nominal values (as far as I know) and given that we here are interested
in accuracy etc. a t-test would have been more natural, in my opinion. I guess, the
authors considered the win-loss ratio, which is fine. Overall, I would accept the
test, but it would be nice to have an explanation here. I leave this (whether this is important as well as if so the check of the revision) to the editor.

The only still remaining downside, as far as I see, is the still unclear protocol
for the first experiment, the evaluation of the CRF features. As the authors do not
touch upon the cross-validation setting, I read this part again. I noticed that
the neural embeddings are learned on the complete dataset. This is unfair as
the embCRF has now more knowledge of the dataset as the standard one. In turn, the
results in Table 3 have to be better justified. Is is because of using the
full training set for embCRF? This has to be clarified before publication. As the same
features have been used in all other experiments, the other experiments should be checked,
too. Or, as asked for in my previous review (sorry if this was not clear) the authors
should justify the experimental setup. Generally, a significance test should be run everywhere, but I leave this decision to the editor. So contrary to what the authors
argue, all classifiers has not been trained on the same data (at least potentially).

To summarise, (2-4) have been addressed well. Thanks! (1) has been addressed partly
and raised a new, more refined issue.

Review #2
Anonymous submitted on 20/Apr/2017
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

All the requests and issued I had raised have been addressed.

Review #3
By Bettina Berendt submitted on 25/Apr/2017
Suggestion:
Minor Revision
Review Comment:

The revision is clearer than the first version, and the authors have addressed most of my concern. However, the request to add an error discussion and limitations section was only addressed in a very superficial way, and these are key components of good scientific work.

The error discussion is buried somewhere in the paper, it is very short and sloppily described. Clearly, the authors regard the achievements of their method as much more noteworthy and deserving of detailed description than the errors. This is a misunderstanding of the role of errors: they guide you towards a better understanding of your method and ultimately towards scientific progress. Error discussions are also important to readers in order to allow them to strive for scientific progress.

There is also some handwaving in the contents of the error discussion. For example, is the fact that different products are described together really a "data error"? Maybe it is just the way some websites organise their contents, and they do this for a reason?

A limitations section is missing, but needed. (All methods have limitations. Again, one can learn from them.)

The future work section therefore remains quite shallow and consists mainly of claims that the method can be applied to all kinds of other purposes.

Please provide these missing elements.

Minor issues: Please proofread carefully. Especially towards the end, there are a number of grammatical mistakes and stray words.