In this project, I built a sentiment analysis model on IMDB movie reviews using PyTorch. The dataset consisted of 50,000 movie reviews, with 25,000 for training and 25,000 for testing. The reviews were labeled as either positive or negative, and the task was to predict the sentiment of a given review.
To run this project, you will need to install following packages:
- Python
- PyTorch
- NumPy
- Pandas
- NLTK
The following are the text prerocessing techniques used on the data:
- Removing HTML tags and punctuation.
- Converting all reviews to lowercase.
- Removing stopwords
- Lemmatizing the reviews
- Tokenizing the reviews
- Padding or truncating the reviews to a fixed length
- Converting the tokens to integers
The model used in this project is a multi-layer LSTM network that was trained on IMDB movie reviews with the goal of predicting the sentiment of a given review. The model was also evaluated on a validation set during the training process to ensure it was not overfitting. Finally, the model was evaluated on the test set, and the evaluation metric was accuracy. The model achieved an accuracy of approximately 79%.
Here are some predictions made by the model on the following sentenses:
- This is a bad movie. NEGATIVE
- I am very happy today.POSITIVE
- I don't like this product it has a bad quality.NEGATIVE