Main objective of this project is the classification of Patient-Provider Communication Sequences via Deep Learning and Probabilistic Models. There are three main subprojects under the main repository:
- deeplearning
- hmm-python
- mchain-python
- Markov Chain
- 1st order
- 2nd order
- Hidden Markov Model
- Recurrent Neural Network (RNN)
- Long Short Term Memory (LSTM)
- Vanilla LSTM
- LSTM with minimized loss at each step
- Gated Recurrent Unit (GRU)
- Vanilla GRU
- GRU with minimized loss at each step
- Long Short Term Memory (LSTM)
- When Undersampling is used for data balancing (# of hidden units = 15, embedding dimension = 44, batch size = 8, learning rate = 0.00005, minimum change = 0.0010)
Method | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Markov Chain 1st Order | 0.7044 | 0.7060 | 0.7044 | 0.7038 |
Markov Chain 2nd Order | 0.6385 | 0.6395 | 0.6385 | 0.6379 |
HMM | 0.6143 | 0.6244 | 0.6143 | 0.6067 |
LSTM | 0.8626 | 0.8672 | 0.8626 | 0.8622 |
LSTM-TR | 0.8681 | 0.8733 | 0.8681 | 0.8677 |
GRU | 0.8648 | 0.8674 | 0.8648 | 0.8646 |
GRU-TR | 0.8676 | 0.8705 | 0.8676 | 0.8673 |
- When Oversampling is used for data balancing (# of hidden units = 15, embedding dimension = 44, batch size = 55, learning rate = 0.00010, minimum change = 0.0005)
Method | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
Markov Chain 1st Order | 0.7799 | 0.7932 | 0.7799 | 0.7775 |
Markov Chain 2nd Order | 0.7029 | 0.7111 | 0.7029 | 0.7000 |
HMM | 0.7567 | 0.7775 | 0.7567 | 0.7520 |
LSTM | 0.8372 | 0.8411 | 0.8372 | 0.8368 |
LSTM-TR | 0.8385 | 0.8424 | 0.8385 | 0.8381 |
GRU | 0.8342 | 0.8379 | 0.8342 | 0.8337 |
GRU-TR | 0.8377 | 0.8412 | 0.8377 | 0.8373 |
Download or clone the project from our Github repository. To run our models, several python packages are required. See the prerequisites section for details. See Running the tests notes on how to run the project on your local system.
The following python packages are required to run our program:
- keras - The Python Deep Learning library. Keras uses the following dependencies:
- TensorFlow - Used TensorFlow in backend:
- hmmlearn - Simple algorithms and models to learn HMMs (Hidden Markov Models) in Python.
- imblearn - API for over and under sampling.
- scikit-learn - A set of python modules for machine learning and data mining.
For running the test program, we need to run one python file containing obesity as a prefix. You will find that python file in each folder of the corresponding subproject directory. There are several command line parameters are required to execute the program. However, I set the default value for each of the parameters.
For example, to run HMM with oversampling and 10 folds cross-validation, you have to go to the hmm-python directory and then run the following command:
python obesity-hmm.py --sampling 'over' --folds 10
Similarly, for deep learning method
python obesity-gru-lstm.py --sampling 'over' --folds 10 --min_change 0.0015
N.B: data folder is only given in split-data folder. You need to create your own data sets (over and under sampling) for the execution of each subproject.
- Mehedi Hasan - Department of Computer Science at Wayne State University
- Alexander Kotov - Department of Computer Science at Wayne State University
Please cite the following paper if you use our code or ideas in your work:
Mehedi Hasan, Alexander Kotov, April Idalski Carcone, Ming Dong, Sylvie Naar, "Predicting the Outcome of Patient-Provider Communication Sequences using Recurrent Neural Networks and Probabilistic Models", In Proceedings of the 2018 Informatics Summit of the American Medical Informatics Association (AMIA Summit 2018).
- We would like to thank the student assistants in the department of Family Medicine and Public Health Sciences at Wayne State University School of Medicine for their help with transcribing the recordings of motivational interviews.