Problem Statement:

Sentiment analysis is the process of computationally determining the emotional tone or opinion expressed in a piece of text (e.g., Tweet or product review), helping to understand whether the sentiment is positive, negative, or neutral.

Video demo

Abstract

A text can provide extensive insight into the sentiment conveyed by the author. Consequently, sentiment analysis derived from textual input is a well-established problem statement in the domains of machine learning and natural language processing (NLP). In this project, we aimed to address this challenge by employing a conventional machine learning methodology to analyze a dataset of over 30,000 tweets from Twitter. The objective was to implement and analyze different classifiers on the Sentiment Analysis Dataset, utilizing various preprocessing techniques (LDA & PCA) and tokenizers (BERTTokenizer and TFIDtokenizer). The project explored and thoroughly analyzed the performance of classifiers- Decision Trees, Random Forests, SVMs, Naive-Bayes, Perceptron and Logistic Regression, optimizing each to yield maximum accuracy by modifying any parameters or hyperparameter tuning. Finally, ensemble learning was employed to achieve optimum performance of the model. It was observed that TFID gave the maximum accuracy, which reached as high as 71.5%.