- This task was performed as an academic assignment on NLP.
- performed sentiment analysis on twitter dataset.
- the data was extracted from twitter API using tweepy module.
- the tweets are divided into three - positive, negative, neutral. These tweets are divided based on sentiment analyzer scores.
- these scores were calculated using TextBlob module.
- performed EDA, visulaization and analysis on the tweets.
- import the necessary libraries.
- used twitter API to extract tweets.
- defined functions which clean the tweets, apply sentiment to the tweets.
- The user is asked to enter the keyword of tweets and number of tweets.
- stored the tweets in tweet list.
- stored positive, negative, neutral tweets in separate lists.
- next, i converted these lists into separate dataframes.
- plotted a pie chart for the tweets depending on their sentiments.
- now our tweets dataframe conatins tweet and a sentiment as separate columns.
- added positive, neutral, negative, compound scores of each tweet as sperate columns in dataframe.
- created a function for word cloud and displayed the word cloud of all tweets, positive tweets, negative tweets, neutral tweets.
- added polarity and subjectivity for each tweets as separate column names in dataframe.
- added word count and tweet length for each tweet as columns.
- next i removed punctuations from tweets, removed stopwords, tokenized them and stemmed them and made separate columns for each in the dataframe.
- next, i cleaned the tweets and applied Countvectorizer on these cleaned tweets.
- made a separate dataframe for these count vectorized tweets and displayed the most used words.
- made a function to convert the tweets to n-gram.
- displayed the bi-grams and tri-grams of tweets on screen.
- next, separated the training and testing data and applied sentiment analysis on logistic regression. Displayed the classification report for the same.
- To get the data from twitter API you have to make an account on twitter and then fill a form giving details about how you'll use the data.
- Then you have the access the consumer key, consumer secret, access token and access token secret