This Repository comprises of,
- Pfizer Tweet Data Understanding
- Sentiment Analysis through VADER & TextBlob
- Topic Modelling through LDA
- Dominant Topic Analysis
- Topic Cluster Visualization
Source of Data: Kaggle
Conclusions from Data,
- 'Vaccine', 'covid', 'vaccination', 'people' & 'first' are the most commonly used words in the Tweets
- Top hashtags are 'PfizerBioNTech', 'COVID19' & 'CovidVaccine'
- It is observed that 9.9% of the Tweets are from Verified Users of Twitter
- TextBlob classifed 51.2% of Tweets to be Neutral sentiment, while, VADER classified 39.8% tweets to Neutral sentiment
- TextBlob classified 39.8% Tweets as Positivie, while VADER classified 51.2% tweets as Postive sentiment
- Both TextBlob & VADER classified 9.0% Tweets as Negative sentiment
- Maximum Favorites are receieved by Neutral sentiment Tweets according to TextBlob, while according to VADER maximum Favorites are received by Positive sentiment Tweets
- Both TextBlob & VADER reveal that Maximum Retweets are received by Neutral Sentiement Tweets
- Analysing the Topic Cluster,
- Cluster-1 has a positive outlook with terms such as 'grateful', 'good', 'thanks' & 'received'
- Cluster-2 has a concerned outlook with terms such as 'emergency', 'injection', 'sore' & 'health'
- Cluster-3 has a negative outlook with terms such as 'ban', 'red', 'death', 'mutation', 'protect' & 'please'
- Most Tweets belong to Cluster-1, Cluster-2 & Cluster-3 (descending order)
- It is seen that percentage of positive sentiment tweet is highest in Cluster-1 justifying the positive outlook in the Tweets topics
- In Cluster-2, Neutral statement Tweets are highest
- In Cluster-3, it is seen thet in comparison to other two Topic Clusters, the number or count of Negative sentiment Tweets in highest Cluster -3 with 207 tweets
Topic Clusters Visualization