A project to detect and classify abnormal video clips (fighting and falling down) using various neural network architectures (Long-term Recurrent Convolutional Network, 3-Dimensional Convolutional Network).
Data were collected from multiple sources:
- The falling dataset consist of footage shot from multiple angles in a lab by E. Auvinet et al from Université de Montréal, as well as other falling down footage found on various online sources.
- The fighting dataset was a combination of Assault and Fighting videos from the UCF-Crime Dataset by W. Sultani et al from the University of Central Florida
Each of the video clip is split into multiple subclips of 30 frames over 5 seconds, and the annotation of the 5-seconds clips was done manually.
- To be updated
For each frame in the video, 6 different feature representations were generated:
Note * : Extracted using OpenPose application
Note ** : Background subtraction with past 5 frames as history, using OpenCV's Gaussian Mixture-based Background/Foreground Segmentation Algorithm
- OS : Windows 10
- GPU : NVIDIA RTX2070
- RAM : 16 GB
- Package Manager : Anaconda 4.8.0
- Processed Data with 6 different representations
- Annotations
- Extracted Features (Optionally, you can run the
extract_features.py
to obtain the same set of features) - Pretrained Weights for C3D
- The scripts and yaml from this repository
Using anaconda prompt, change directory to your project_folder
from the previous step.
Create the environment (make necessary changes the prefix in the yml file to your Anaconda directory):
conda env create -f tf_gpu_115.yml
Activate the environment:
conda activate tf_gpu_115
Extract features from the assault-fall-data
folder.
Run python extract_features.py
. New folders c3d
, mobilenet
, resnet50v2
will be created, with all the extracted features
Note : Only run this if you have not download the Extracted Features from previous section.
Train the select model using selected features (image type), over a specified N-fold cross validation for M times.
Run the script model_training.py
with these flags (available options in {}
):
--model {mobilenet,resnet50v2,c3d,all}
Select model you want to build.
--folds {1,2,3,4,5,6,7,8,9,10}
Specify N-fold validations.
--runs {1,2,3,4,5,6,7,8,9,10}
Specify N runs. During each run, N-fold CV is done.
--imgtype {raw,hm,kp,hhb,rhb,hkb,all}
Specify feature representation.
For example, if you want to train all models, with all image types using a 5-fold cross validation for 10 times, run:
python model_training.py --model all --folds 5 --runs 10 --imgtype all
A training results folder will be created for each model, with subfolders for each image type. The training metrics for each model (1 model is created for each fold during each run) is saved as an image in the respective subfolders, which looks like this:
A prediction csv file is also generated, which consist of the test predictions of every model trained.
is_fight | is_fall | raw_run1_fold1_fight | raw_run1_fold1_fall | raw_run1_fold2_fight | raw_run1_fold2_fall |
---|---|---|---|---|---|
0 | 1 | 0.1323 | 0.6754 | 0.2351 | 0.1231 |
1 | 0 | 0.3245 | 0.1234 | 0.7231 | 0.3275 |
... | ... | ... | ... | ... | ... |
See the analysis notebook here.