Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGPT Sentiment Analysis #431

Merged
merged 17 commits into from
Jan 17, 2024
Merged

Conversation

JIGYASAKARAKOTI
Copy link
Contributor

Pull Request for DL-Simplified 💡

Issue Title : ChatGPT Sentiment Analysis

  • Info about the related issue (Aim of the project) : The aim of this project is to analyze the sentiments of the tweets made on/against ChatGPT.
  • Name: Jigyasa Karakoti
  • GitHub ID: JIGYASAKARAKOTI
  • Email ID: jigyasakarakoti@gmail.com
  • **Idenitfy yourself: SWOCS4

Closes: #411

Describe the add-ons or changes you've made 📃

The code implements sentiment analysis on Twitter data using various machine learning models, including Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Logistic Regression, Naive Bayes, and Random Forest. The dataset is loaded, explored, cleaned, and preprocessed before being used to train and evaluate the models.

Key points:

-> Data Exploration and Cleaning:
The initial exploration revealed the distribution of sentiment labels in the Twitter dataset.
Data cleaning involved removing duplicates, links, special characters, and stopwords.
Balancing the data was performed to address class imbalance.

-> Text Data Preprocessing: Tokenization and lemmatization were applied to convert text data into a suitable format for machine learning models.
Word clouds were used to visualize the most frequent words before and after cleaning.

-> Model Training and Evaluation:
The code implemented models such as CNN, LSTM, Logistic Regression, Naive Bayes, and Random Forest for sentiment analysis.
Training and testing sets were created, and models were trained on the training set and evaluated on the testing set.
Various metrics, including accuracy, recall, precision, and F1 score, were calculated to assess model performance.

-> Visualization:
Visualizations, such as bar charts, pie charts, word clouds, and ROC curves, were used to gain insights into data distribution, model performance, and the impact of data preprocessing.

-> User Interaction:
The code includes an interactive function allowing users to input sentences for sentiment prediction, demonstrating the practical use of the trained models.

-> Model Comparison:
By training and evaluating multiple models, the code provides a comparative analysis of their performance on sentiment analysis tasks.

Type of change ☑️

What sort of change have you made:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Code style update (formatting, local variables)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested? ⚙️

Describe how it has been tested
Describe how have you verified the changes made

Checklist: ☑️

  • My code follows the guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly wherever it was hard to understand.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have added things that prove my fix is effective or that my feature works.
  • Any dependent changes have been merged and published in downstream modules.

Copy link

Our team will soon review your PR. Thanks @JIGYASAKARAKOTI :)

Copy link
Owner

@abhisheks008 abhisheks008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. CNN and LSTM looks good to me, but replace the other machine learning models with deep learning models such as MobileNet, VGG and so on.
  2. In the Dataset folder, update the file name from Dataset Link to README.md and put the dataset link there.
  3. Add the EDA results/outputs in the README.md file along with the accuracy metrices generated from the project.

@JIGYASAKARAKOTI

@JIGYASAKARAKOTI
Copy link
Contributor Author

JIGYASAKARAKOTI commented Jan 12, 2024

can i use VADER Model and Twitter-roBERTa-base?

@abhisheks008
Copy link
Owner

can i use VADER Model and Twitter-roBERTa-base?

Yes, you can.

Copy link
Owner

@abhisheks008 abhisheks008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.
@JIGYASAKARAKOTI

@abhisheks008 abhisheks008 added Status: Approved Approved PR by the PA. Level: MEDIUM SWOC S4 Issues under Social Winter of Code, 2025 Points Updated and removed Status: Requested Changes Changes requested. labels Jan 17, 2024
@abhisheks008 abhisheks008 merged commit 2b67a1d into abhisheks008:main Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Level: MEDIUM Points Updated Status: Approved Approved PR by the PA. SWOC S4 Issues under Social Winter of Code, 2025
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChatGPT Sentiment Analysis
2 participants