Pattern Recognition & Machine Learning

Course Project-CSL2050

Sentiment Analysis of Twitter Tweets Dataset

(Text Sentiment Analysis)

Problem Statement:

Sentiment analysis is the process of computationally determining the emotional tone or opinion expressed in a piece of text (e.g., Tweet or product review), helping to understand whether the sentiment is positive, negative, or neutral.

Video demo

Abstract

A text can provide extensive insight into the sentiment conveyed by the author. Consequently, sentiment analysis derived from textual input is a well-established problem statement in the domains of machine learning and natural language processing (NLP). In this project, we aimed to address this challenge by employing a conventional machine learning methodology to analyze a dataset of over 30,000 tweets from Twitter. The objective was to implement and analyze different classifiers on the Sentiment Analysis Dataset, utilizing various preprocessing techniques (LDA & PCA) and tokenizers (BERTTokenizer and TFIDtokenizer). The project explored and thoroughly analyzed the performance of classifiers- Decision Trees, Random Forests, SVMs, Naive-Bayes, Perceptron and Logistic Regression, optimizing each to yield maximum accuracy by modifying any parameters or hyperparameter tuning. Finally, ensemble learning was employed to achieve optimum performance of the model. It was observed that TFID gave the maximum accuracy, which reached as high as 71.5%.

Results

Performance Overview:

Group members

Mukund Gupta

Mukund Gupta

B22CS086

LinkedIn
Harshit Goyal

Harshit Goyal

B22CS024

LinkedIn
Gouri Patidar

Gouri Patidar

B22AI020

LinkedIn
Krishna Balaji Patil

Krishna Balaji Patil

B22CS078

LinkedIn
Aarohi Dharmadhikari

Aarohi Dharmadhikari

B22AI001

LinkedIn
Saumitr Agrawal

Saumitr Agrawal

B22AI054

LinkedIn