(Partial) User Preference Similarity as Classification-Based Model Similarity
Review 1 by Vojtech Svatek
The paper presents a method and extensive experiment in using machine learning for user-similarity-based recommendation.
My main problem with the current version of the paper is (following the experience of a previous reviewer) the cumbersome way in which the authors approach is explained. I had to read most of the text 2-3times, and finally I would say that it can be summarized as a straightforward sequence of activities from the ML point of view:
1) For each user, a classifier is learned from labeled examples.
2) Unlabeled examples are labeled by this classifier.
3) Based on the classified examples (hypothetical ratings) the similarity of users is computed.
4) When recommending items to a certain user, the ratings of similar users are taken into account.
I dont see any point in the lengthy formal treatment of the problem, with 24 numbered formulae, where most statements in both symbols and text are inherently trivial but explained in complex terms. On the other hand I still miss a well worked-out running example, which was demanded by the previous reviewers.
The formal part itself has quite a few flaws, for example:
- In (4) c is introduced as if it were invariant wrt. g, which is definitely not the case.
- In the itemlist between (5) and (6), the set of items Gi is certainly not part of experience of a user by itself
- (7) is trivial any probability in the world must be smaller or equal than 1!
- The hypothesis space H is only mentioned once what does it mean?
- The left side of equation (20) doesnt contain s - why?
I dont see much novelty in the probabilistic nor correlation-based similarity. Furthermore, the claim that the unified rated item set approach in the former is a good approximation is not entirely convincing. It may give too much weight to the relative size of the common rated item set. For example, if the two users rated 55 objects each, 10 of them are shared, and all of these are rated same, the value of the metric is only 0,1. In contrast, if the two users rated 15 objects each, 10 of them are shared, and 5 (i.e. only half) of these are rated same, the value is 0,4.
Even simple notions such as interval rating and ratio rating are explained in an absolutely unclear way.
An interesting observation is the directionality of partial similarity, meaning that e.g. similarity of user A and user B is measured wrt. a category C considered by user A. However, its impact is not studied further.
Another problem that remains is the limited relevance to semantic web topics. The paper now mentions the YouLike ontology and shows an RDF fragment associated with it, however the ontology is not explained and there is even no link to its code (it was only published as part of the first author's thesis?).
Despite the previous review rounds, the quality of English is quite low in most of the text. There are too many typos and syntax-grammatical errors than it would make sense to list in the review. Some sentences in the text are repeated in same or very similar form.
Fig. 4 doesnt contain any plots! This must be by error.
I appreciate the size of the experiment. However, given the opaque presentation of the approach itself, it is hard to judge how well the outcomes of the experiment justify it.
All in all, in my opinion the paper could only be published after a decent revision of the formal/explanatory part, addition of running example and language correction.
Review 2 by anonymous reviewer:
Review of paper "(Partial) User Preference Similarity as Classification-Based Model Similarity"
(revised version) by Amancio Bouza and Abraham Bernstein.
(Not a full review, but an update of my previous review wrt. the modifications made
by the authors.)
The authors have sufficiently addressed the issues pointed out in my review.
The direct connection to Semantic Web research is still somewhat poor (no, classification
alone is not yet "semantic"...); however, I believe the paper gives enough hints now in
order to make the proposed approach usable by and thus relevant for other Semantic Web and
Linked Data researchers.
So, I believe the revised paper is acceptable for the special issue of SWJ.
However, I would ask the authors to remove the following remaining (new or newly discovered)
minor issues:
- sometime you abbreviate "Figure" as "Fig.", "Table" as "Tab." etc, sometimes you don't.
- it is unusual (and ugly) to have a line in a formula starting with a comma (e.g., (20), (21)).
Better write "with ... satisfying ..." (without comma)
- you should add a little explanation of what the "YOULIKE ontology" is. You cannot
assume people have read (or will read) your thesis.
- On page 15, you seem to use a different line spacing in each column (but probably IOS
will take care about that).
The reviews below are from the previous version of the manuscript.
Review 1 by Alexandra Moraru:
Recommendation: accept
(Partial) User Preference Similarity as Classification-Based Model Similarity - Review
Summary:
The paper tackles the problem of collaborative filtering, introducing new metrics for calculating (partial) user similarities in order to provide personalized recommendations. The paper nicely presents the related work and formally describes the problem based on hypothesized user preferences. Evaluation is done for the new metrics compared to the state-of-the-art approaches, with positive results. Overall, the paper is technically sound, easily understandable and detailed.
Novelty:
While I'm not an expert on collaborative filtering and related areas, the paper novelty seems to consist of three similarity metrics (probabilistic classification similarity, correlation-based similarity and partial similarity)which are then applied on a hypothesized user preference model in order to provide recommendations.
Overall Recommendation:
A good paper overall, accept.
Additional Comments:
The paragraph in introduction presenting the paper contributions is a little ambiguous and therefore I would recommend reformulating it more clearly.
- Is it the formal framework, the new classification-based similarity metrics, or both?
- Also the preprocessing methods are mentioned only in introduction and not directly referred to in the rest of the paper. In addition, we introduce a preprocessing step for methods that allows the comparison of partial preferences. The new methods with and without preprocessing
More details on the pre-processing steps could be added.
Review 2 by anonymous reviewer:
Paper: (Partial) User Preference Similarity as Classification-Based Model Similarity
Recommendation: Major revision
Review:
This paper presents an approach to recommender systems which makes use
of machine learning in order to tackle the cold-start problem
of traditional recommender systems. For the case users provide
only a small number of ratings, hypothetical (learned) user preferences
are used with various similarity metrics in order to identify like-minded people.
I found this paper very interesting and the presented approach mostly convincing.
The problem of how to obtain reliable and rich recommendations if there is only
a small number of ratings is still an important topic in the area of recommender systems,
and the presented solution appears to be quite original (but see below for an
issue) and significant. The significance of the approach is underpinned by a
very extensive experimental evaluation. Technical quality is all in all fine.
A shortcoming of the paper is that it is only remotely related to the Semantic Web.
On the other hand, the presented approach could certainly contribute to the Semantic Web.
Nevertheless, the authors should indicate how concretely their approach could be implemented
and/or contribute using/to Semantic Web technology, e.g. as for the formal representation
of user ratings. At the moment, all data, concepts etc are represented proprietarily
and essentially indexically. Listing 1 (which is not well-formed XML btw.)
is certainly not sufficient.
Presentation quality is all in all good, but here and there it is not sufficient yet.
Some ideas could be explained in higher clarity (see below).
- The beginning of Section 5.2. should be rewritten. At the moment,
it is unclear what "partial user preferences" mean in this context
and why they are required.
Furthermore, you should in this context discuss the following missing reference:
Predicting User Preferences via Similarity-Based Clustering
by Mian Qin, Scott Buffett, Michael W. Fleming, Springer 2008
Their approach also deals with similarities among partial preferences
(also in the context of recommender systems and MovieLens). It uses
the notion of preference relations, but I guess these would map
easily to rating vectors.
So the claim that "Current preference similarity metrics do not account
for partially similar preferences among people" might not be accurate
(otherwise please clarify).
- It does not become clear early enough in the paper why learned (and thus inaccurate)
preferences can be used to compensate for missing data (user ratings).
It becomes clearer later, but you should explain the idea behind
this effect more clearly already at the beginning of the paper.
Further issues, mostly minor:
- p. 2, "that explicit choices"
- p. 3: are you sure the complexity is just O(n^2)?
- p. 3 (and other places): write "i.e.," instead of "i.e."
- Various places: commas in excess, e.g., "It is shown, that"
- p. 8: what does the dot within the subindices mean (e.g., R^{true}_{i.})
- It should be mentioned that the approach only works if the
user preferences are stationary and truthful.
- p.7 "preferences ... is"
- p. 12: "performs better then"
- Entire paper: It is not clear what your system for numbering or
non-numbering formulas is (is there any?)
- in References: ", and G. Editors."
Review 3 by anonymous reviewer:
Summary:
The paper proposes a recommendation approach that is based on a partial
similarity measure between users, a measure that looks at similarity
constrained by features of the rated items, from which rating
functions are learned using classifier-learning methods. These features may be content-related, but it appears they *need* not be content-related. The measures are introduced, and an evaluation on the standard Movielens dataset is
presented.
Relevance:
While this is an interesting approach to personalization, I doubt that the
Journal of Web Semantics is the right place for this paper. Semantics does not appear to play a role in the proposed approach (except as a contrast in the Related Work section). Having said this, given that the paper was accepted at IRMLES, it is possible that the paper is relevant in the context of an IRMLES Special Issue of the Journal of Web Semantics.
Novelty:
The paper goes into detail about several other approaches to recommender
systems, but does not cover enough literature. In particular, the
multitude of more recent proposals is not covered well, and the empirical
part does not compare with advanced baselines (Google Scholar, for example,
shows that there is a large number of new algorithms applied to the
Movielens data set. As just one example that is somewhat related to your
proposal and appears to have better MAE/RMSE results, consider:
EPL, 88 (2009) 68008 www.epljournal.org
doi: 10.1209/0295-5075/88/68008
Relevance is more significant than correlation:
Information filtering on sparse data
Ming-Sheng Shang1, Linyuan Lu2, Wei Zeng1,3, Yi-Cheng Zhang2,3(a) and Tao
Zhou2,4)
This literature would need to be covered in order for the paper to be
considered for journal publication.
In general, the literature overview needs to be revised/extended by relevant publications that are more recent than 2008.
Technical soundness / presentation:
Unfortunately, I have to judge these together. I read and reread the paper
several times and needed a long time to understand what you
propose. This is a consequence of presentation, but the presentation also
obscures your technical contribution. This is regrettable, since the basic
idea of your paper is nice, and I am sure that with enough revision, this
can become an interesting paper.
Specifically:
The introduction in particular does
not give the reader any concrete image or intuition of what you are
trying to do (repeating that you want to judge "partial similarity" does
not help). Then later on, a large formal apparatus is introduced quickly,
with some symbols not defined, and again no help for understanding.
Concrete examples that help the reader suspect what you mean are
introduced in passing (Listing 1, first sentence on p.1). The evaluation
contains a lot of details, but does not give the reader an integrated
understanding of ideas, hypotheses, or results.
Concrete recommendations, questions, and comments:
* Use a well-worked-out running example that you introduce early on.
* Describe and evaluate ideas and findings not only at the detail level,
but also at a higher level.
* Carefully check your text for the difference between
- assumptions
- observations
- facts
... it appears to me that often, you describe your assumptions as if they
were facts.
* Use the services of a proofreader! The language not only severely
impedes understanding, it is also not fit for a journal publication.
Specific remarks about content and writing, in order of appearance:
* throughout the paper: countable things like options require number of, only uncountable things can have amount of
*
* throughout the paper: distinguish between the comparative than and the temporal then.
* check the formulation of para. 2. Amazon introduced its personalization options at the end of the 1990s, hardly a recent development ...
* throughout the paper: please use commas or other (correct) punctuation between clauses; this will make your text much more readable. Remove misleading/incorrect commas (e.g. p. 2, para. 4, sentence 1; commas after that).
* throughout the paper: please check for singular/plural mismatches (e.g. p.2, para. 3; many more occurrences)
* p.2, para. 3: that -> than
* Section 2.1, para. 2: classify item -> classify an item
* Section 2.1, para. 2: ... to hypothesized a users preferences : sentence unclear.
* Section 2.2, para. 2: making collaborative filtering : ditto
* Section 2.3, sentence 1: dont start a sentence with reference numbers.
* Section 2.4, para. 1: too few -> not enough; are -> is
* P.4, para. 2: face -> counter
* P.4: It is ... shown that like-minded people with respect to a domain are likely to be like-minded in another domain. : If this is the case, the article presents a counter-argument to the very intuition that motivates your proposal. Please supply also counter-examples and/or at least discuss this inconsistency.
* P.4: likes not -> does not like
* P.4: but two times -> but not that it is two times
* P.4: The definitions of interval and ratio scales do not appear to be correct. The normalization is not the point; the permitted operations are.
* P.4: which user i -> that user i
* P.5, last sentene before sec. 4: suggests that you are able to fill the whole vector. Is that intended? (I understood that this is not necessarily the case.)
* P.5: Privacy ... which relates the user with certain items in a dubious manner : I do not know what you want to express here, but it looks a lot like the (incorrect and somewhat defamatory) belief that privacy is for people who have something to hide. Please reconsider.
* P.5: according to [22], the learning problem ... needs to be well-defined : what is the purpose of this sentence? How would you be able to do any machine learning without a well-defined problem?
* P.5: If you choose one particular performance measure, please motivate why.
* Throughout the paper: hypotheses space -> hypothesis space
* P.5: predicts best -> best predicts
* P.6, left column: is full of assumptions that are described like facts. Please make this clearer by rephrasing.
* P.6: ... Eq. 3, the accuracy ... : This sentence is unclear, and it makes a big statement. This needs more explanation.
* P.6: both types : which ones?
* P.6, last-but-one para: classify item -> classifying item
* P.7, left column : 1. What is the consequence of not being representative? 2. How do you know the union is representative? 3. Why if you know the function h anyway do you extend to this set and not to all items?
* P.7, left colum, bottom: Do you not need additional assumptions on the error term to infer this definition?
* P.7: What are self-containing individual preferences?
* P.7: Explain the cases better.
* P.7: Para under the u(g) formula unclear. No comma after Note, Disjunct -> disjoint, by by -> by
* P.7: D_s^a has not been introduced.
* P.7: Para under equation 7 unclear. Why do you now introduce a choice function?
* In general in Section 5: The premises D are key, but they are not really explained.
* P.8: user provides -> user provided. Please check the whole section for uniformity of tense. Past tense preferred.
* P.8: Sentence The genre information ... is unclear.
* P.8: Note, the -> Note that the
* P.8: In particular, users and item which ...: Sentence unclear.
* P.9: The two paras below the RMSE formula need rewriting; they are unclear and contain grammatical mistakes.
* P.9: Precision and recall based on a sequence of Bernoulli experiments: Is that a general observation (in that case please give a reference) or an assumption (in that case please say it)
* P.10: hypothesizes ... significantly best is ungrammatical.
* P.10f., general question: The extensive empirical evaluation is nice. But it is somewhat unstructured it would need a description o the structural differences between these approaches, your expectations regarding this, and an integrating interpretation. All of these are missing; the reader is left with an impression of lots of unrelated findings. For example, you compare content-based and collaborative-filtering approaches. However, in the introduction, you reject content-based approaches. Also, shouldnt this juxtaposition be discussed first at a more general level?
* P.13: please revise the RHS; it has many unclear and/or ungrammatical formulations.
* P.13: The number of ratings a user : sentence incomplete, and what is this supposed to tell the reader? What is a learning effect, and what is a high learning effect?
* P.14: the formatting changed.
* P.14: what is a totally rated item? boosting -> , which would boost ?!