A Hybrid Feature Learning Approach for Aspect-based Sentiment Analysis in Drug Reviews

Tracking #: 2661-3875

This paper is currently under review
Asmaa Sweidan
Nashwa Elbendary
Haytham Al-feel

Responsible editor: 
Mehwish Alam

Submission type: 
Full Paper
This paper aims to develop a novel hybrid feature learning approach for aspect-based sentiment analysis to detect and classify unlabeled data utilizing widely available social data. The proposed approach combines the sentiment lexicon with a pre-trained BERT (Bidirectional Encoder Representations from Transformers) embeddings system based on Ontology and Latent Dirichlet Allocation (LDA) feature extraction for topic modeling. Ontology-based on fuzzy reas- oning to describe the semantic knowledge and its relation related to the topics. The BERT with LDA is used to predict the context words to learn the sentence vector and the document vector that is disintegrated into a document weight vector (the weight of each topic) and the topic vector represents one topic that stores related words near each topic. Next using Bi-directional Long Short-Term Memory (Bi-LSTM) to classify extracted sentiment. Various experiments are con- ducted on social media datasets about Drugs to evaluate the effectiveness of the proposed approach on the aspect as a case-study. Also, several performance evaluation metrics are used to measure their performance. Obtained experimental results showed that the proposed hybrid feature learning approach outperforms other tested feature learning state-of-the-art approaches and improves the feature and topic extraction for unstructured social media text and sentiment classification. Based on the obtained results, it is observed that the performance of the proposed approach increases when using Ontology with BERT embeddings and LDA topic modeling as feature extraction and LSTM as the classifier against using word2vec or BERT individually. The proposed approach achieved an average accuracy of 98.4%, an AUC score of 97.5%, and a F-score of 0.98% for used datasets
Full PDF Version: 
Under Review