Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter

Tracking #: 1938-3151

Ziqi Zhang
Lei Luo

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
In recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and researchers. A large number of methods have been developed for automated hate speech detection online. This aims to classify textual content into non-hate or hate speech, in which case the method may also identify the targeting characteristics (i.e., types of hate, such as race, and religion) in the hate speech. However, we notice significant difference between the performance of the two (i.e., non-hate v.s. hate). In this work, we argue for a focus on the latter problem for practical reasons. We show that it is a much more challenging task, as our analysis of the language in the typical datasets shows that hate speech lacks unique, discriminative features and therefore is found in the 'long tail' in a dataset that is difficult to discover. We then propose Deep Neural Network structures serving as feature extractors that are particularly effective for capturing the semantics of hate speech. Our methods are evaluated on the largest collection of hate speech datasets based on Twitter, and are shown to be able to outperform state of the art by up to 6 percentage points in macro-average F1, or 9 percentage points in the more challenging case of identifying hateful content.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Aug/2018
Review Comment:

This paper proposed a deep learning architecture for the task of hate speech detection. The authors also did an insightful study of current datasets for hate detection. Along with that, a new dataset for hate detection based on available datasets was also introduced. The experiments deployed on those datasets show that the proposed method achieved promising results compared to state-of-the-art methods.

I satisfied with the authors respond, especially about the intuition of the proposed approach, the contribution of the paper, and the experimental comparison. I would like to accept the revision as in current form.

Review #2
By Armand Vilalta submitted on 10/Aug/2018
Review Comment:

The paper studies the problem of hate speech detection on Twitter. This work extends the previous conference paper in several ways. It studies the evaluation of hate speech reasoning on the importance of focusing on the hate class results. It identifies as the main challenge the unbalance between hate and non-hate classes in current publicly available datasets and the lack of unique features in the data. The study computes some ad-hoc statistics on the publicly available datasets to characterize the data. A significant contribution extending previous work is the use of skipped-CNN instead of GRU for the last part of the DNN architecture, which improves previous results.

General impression
There is an important improvement in the overall impression of the paper compared to the previous version. The authors effectively have narrowed the scope, better motivated the work, and better highlighted the specific novelty and contributions to the research area. The results obtained are now easily interpretable.
The work is an original integration work where different existing DNN architectures are combined to approach the hate speech detection problem. The experiments are done on different datasets and the evaluation metrics are well reasoned. The results obtained are superior by a small margin to the baselines reproduced and to the results in the conference paper.

Writing and presentation
The paper is in general well written. The main problems found previously have been removed and clarity has improved substantially. A final revision according to publication style guides needs to be done. A few comments are included in the per sections review.

Per sections review
1. Introduction
The authors have clarified the contributions of the paper coherently with the contents of the work.

2. Related work
The section has been considerably reduced and better focused.
The added section 2.3 includes the reasoning on the importance of focusing on the hate class results. This is, in my opinion, an important study as it shows a flaw in the evaluation method used by most of the related work providing support for better evaluation practices in the future. It also helps to put in context the improvements made by the proposed skipped CNN architecture.

3. Dataset Analysis
The detailed explanation on the creation of RM dataset has been removed since it has already been published in another work. In my opinion, this is a good choice, but I would appreciate a reference to this work and a link to the files.
The explanation of metrics used has been reduced and explanation of the figures have been improved. The whole section is clearer, including the findings explanation. Although it remains quite general and superficial, it is aligned with the rest of the paper.

4. Methodology
The introduction specifies more clear the contributions of the paper and the baselines it compares to.
In section 4.1, in the explanation of the architecture (3rd paragraph), the strides of the convolutional and pooling layers are not specified.
Reading the text seems that the 1D max pooling is applied to the joined output of different convolution sizes, so a single pooling op may contain inputs from different convolution sizes. Is it correct, or are they actually processed separately?
In the skipped-CNN section, It would be fair to include some references to other works (beyond Mikolov) that used the same idea in other contexts, eg. Atrous convolutions in image processing, or word level skip grams for sentiment analysis,...

5. Experiments
The part of word embeddings has been reduced to its most important findings. I consider this reduction is highly beneficial for the simplification of the results presentation.
The substitution of figures 6-8 for tables also makes easier the interpretation of results.

Minor comments:
- Implementation: Plural of epoch is epochs.
- 5.1 TF-IDF: Write the full name in the first occurrence.
-According to the style guide, the tables captions should be placed above the table. It is preferable to include the trailing 0 in the results: .81 → 0.81 (Or to remove the “.” indicating the units in the caption).

Review #3
By Thierry Declerck submitted on 26/Aug/2018
Minor Revision
Review Comment:

An interesting submission that describes both a DNN approach to dealing with hate-speech in Twitter posts and also the development of a very relevant data set, discussing in details the problems of existing data sets.
I have some comments that in my opinion should be addressed in an addtional editorial round.
1) It seems that only one author is named in the headers of the submission pages ("Lei Luo"). Should it no tbe the case that both authors should be named in the headers?
2) I am bit puzzled that the author seem to consider F1 as the same type of measure as "accuracy" (page 2, 3rd paragraph). Please check and reformulate if necessary.
3) What is meant by "largest collection of English Twitter datasets,". This notion is first introduced in section 3.1.
4) Wondering why there is the need for an input feature representation (page 4 beginning). Is it not so that a Neural Network can learn from raw data to extract feature representations(and for sure also from an input data that contains feature representations).
5) Some not very correct English forms, like "methods belong to this category include". The English language shoud be harmonized across the whole paper.
6) On page 4, please introduce GRU (as you did for RNN, CNN etc.)
7) I would not write "Tweets" but "tweets", when not at sentence begin. Everywhere in the submission.
8) There is an issue that needs to be clarified: were the tweets in the different data sets collected randomly or on the basis of key words filtering the streaming data?
9) On page 5, I would rather use the expression "surface forms", instead of "surface words" (all words are in a sense "surface words"), if you want to distinguish from lemma forms. If not: "words" would be enough.
10) Page 5, 6: It would be interesting to investigate how much the normalisation process applied to tweets can influence on the potential distinction of "hate" vs "non hate" tweets.
11) "combines traditional a CNN" => "combines a traditional CNN"?
12) Section 4.1: it would be good to know how good the segmentation of hashtags is working? Are there hashtags that are wrongly segmented (one can imagine something like #YouTube to be segmented into "you" and "tube", in case the authors do not have ways to avoid this.
13) The authors write "Further, our pre-processing already reduces the noise in the language to some extent." (section 4.1). I think one needs to get sme quantitative information here (even if only an estimate)
14) It would be very helpful to suggest at least in a very tentative what kind of non-linguistic features could help in the classification task.