Skip to content

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.

License

Notifications You must be signed in to change notification settings

DanielDaCosta/disaster-webapp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi Label Text Classifier Project

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Scikit-learn.

The dataset consists of disaster messages that are classified into 36 different classes. The model aims to classify an input message into these different classes.

A Web Application was developed, allowing you to analyze the dataset and write your own message to be classified.

Dataset

The dataset consists of disaster messages classified into 36 different classes. The dataset is highly imbalanced, with different distributions for each class. To reduce this problem, a class-weighted approach was used, where we made the classifier aware of the imbalanced data by incorporating the weights of classes into the cost function.

In the Random Forest model, the parameter class_weight was set to 'balanced', using the values of y to automatically adjust weights inversely proportional to class frequencies in the input data

Web Application

Message Classifier

Usage

  • data/ : ETL folder. Data preparation. To load the data from scratch:

python process_data.py disaster_messages.csv disaster_categories.csv DisasterResponse.db

  • models/ : Machine Learning models. To train the model:

python train_classifier.py ../data/DisasterResponse.db classifier.pkl

  • app/ : Contains the scripts for the web application. In order to run de application go into the app/ folder an run the command:

python run.py

File Structure

.
├── LICENSE
├── README.md
├── app
│   ├── run.py # Flask file that runs app
│   └── templates
│       ├── go.html # classification result page of web app
│       └── master.html # main page of web app
├── data
│   ├── DisasterResponse.db # database to save clean data to
│   ├── disaster_categories.csv # data to process
│   ├── disaster_messages.csv # data to process
│   └── process_data.py
├── models
│   ├── classifier.pkl # saved model 
│   └── train_classifier.py
└── requirements.txt

Installation

pip install -r requirements.py

Development

Other models of architectures were also explored. You can check the solution for the same problem using RNN with keras in this other GitHub Repo: Multi-Label Text classification problem with Keras

Acknowledgments and References

Special thanks to Figure Eight for the dataset.

About

The project consists of a Multi-Label Text Classifier project using a Random Forest Classifier with MultiOuputClassifier from Sklearn.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published