All the necessary libraries to run the code were already available in the Anaconda distribution of Python, except:
- WordCloud, which can be easily installed with !pip install wordcloud.
This script was written using Python version 3.*.
Since we all receive spam messages in our email accounts or through our cell phones, I wanted to explore text messages to understand why spam filters are not perfect, even with the advances in technology. Are there significant differences between spam and not spam messages? Is there something we can do to improve the models in the process of idetifying these differences?
- Notebook - Jupyter Notebook with the script developed for classifying spam text messages.
- Csv file - a copy of the dataset containing text messages previously classified as spam and not spam.
As a result, different models were created with different approaches in order to improve the performance in classifying spam messages, and some new features were created while analyzing differences between spam and not spam messages.
Credits must be given to the University of Michigan for making the data available along in its Applied Text Mining in Python course available in the Coursera platform.