The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Scikit-learn.
The dataset consists of disaster messages that are classified into 36 different classes. The model aims to classify an input message into these different classes.
A Web Application was developed, allowing you to analyze the dataset and write your own message to be classified.
The dataset consists of disaster messages classified into 36 different classes. The dataset is highly imbalanced, with different distributions for each class. To reduce this problem, a class-weighted approach was used, where we made the classifier aware of the imbalanced data by incorporating the weights of classes into the cost function.
In the Random Forest model, the parameter class_weight was set to 'balanced', using the values of y to automatically adjust weights inversely proportional to class frequencies in the input data
- data/ : ETL folder. Data preparation. To load the data from scratch:
python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db
- models/ : Machine Learning models. To train the model:
python train_classifier.py ../data/DisasterResponse.db classifier.pkl
- app/ : Contains the scripts for the web application. In order to run de application go into the app/ folder an run the command:
python run.py
.
├── LICENSE
├── README.md
├── app
│ ├── run.py # Flask file that runs app
│ └── templates
│ ├── go.html # classification result page of web app
│ └── master.html # main page of web app
├── data
│ ├── DisasterResponse.db # database to save clean data to
│ ├── disaster_categories.csv # data to process
│ ├── disaster_messages.csv # data to process
│ └── process_data.py
├── models
│ ├── classifier.pkl # saved model
│ └── train_classifier.py
└── requirements.txt
pip install -r requirements.py
Other models of architectures were also explored. You can check the solution for the same problem using RNN with keras in this other GitHub Repo: Multi-Label Text classification problem with Keras
Special thanks to Figure Eight for the dataset.