|

An ensemble deep learning classifier for sentiment analysis on code-mix Hindi-English data.

Researchers

Journal

Modalities

Models

Abstract

Code-mixing on social media is a trend in many countries where people speak multiple languages, such as India, where Hindi and English are major communication languages. Sentiment analysis is beneficial in understanding users’ opinions and thoughts on social, economic, and political issues. It eliminates the manual monitoring of each and every review, which is a cumbersome task. However, performing sentiment analysis on code-mix data is challenging, as it involves various out of vocabulary terms and numerous issues, making it a new field in natural language processing. This work includes dealing with such text and ensembling a classifier to detect sentiment polarity. Our classifier ensembles a multilingual variant of RoBERTa and a sentence-level embedding from Universal Sentence Encoder to identify the sentiments of these code-mixed tweets with higher accuracy. This ensemble optimises the classifier’s performance by using the strength of both for transfer learning. Experiments were conducted on real-life benchmark datasets and revealed their sentiment. The performance of the proposed classifier framework is compared with other baselines and deep learning models on five datasets to show the superiority of our results. Results showed improved and increased performance in the proposed classifier’s accuracy, precision, and recall. The accuracy achieved by our classifier on code-mix datasets is 66% on Joshi et al. 2016, 60% on SAIL 2017, and 67% on SemEval 2020 Task-9 dataset, which is on average around 3% as compared to contemporary baselines.© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *