It is a re-implementation code for the following paper:
- Kao Zhang, Zhenzhong Chen, Shan Liu. A Spatial-Temporal Recurrent Neural Network for Video Saliency Prediction. IEEE Transactions on Image Processing (TIP), vol. 30, pp. 572-587, 2021.
Github: https://github.com/zhangkao/IIP_STRNN_Saliency
Related Project
- Kao Zhang, Zhenzhong Chen. Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 29, no. 12, pp. 3544-3557, 2019.
Github: https://github.com/zhangkao/IIP_TwoS_Saliency
The code was developed using Python 3.6+ & pytorch 1.10+ & CUDA 10.0+. There may be a problem related to software versions.
- Windows10/11 or Ubuntu20.04
- Anaconda latest, Python
- CUDA, CUDNN, and CUPY
You can try to create a new environment in anaconda, as follows
*The implementation environment of our experiment (GTX1080 and TITAN Xp)
conda create -n strnn python=3.6
conda activate strnn
conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 cudnn
conda install cupy==7.8.0
pip install opencv-python torchsummary hdf5storage h5py scipy scikit-image matplotlib
*For GEFORCE RTX 30 series, such as RTX3060, 3080, etc.
conda create -n strnn python=3.8
conda activate strnn
conda install pytorch torchvision torchaudio cudatoolkit=11.3 cudnn
conda install cupy
pip install opencv-python torchsummary hdf5storage h5py scipy scikit-image matplotlib
*Please change the parameters in "correlation.py" for this environment(cudatoolkit >= 11.0)
"@cupy.util.memoize(for_each_device=True)" --> "@cupy.memoize(for_each_device=True)"
Download the pre-trained models and put the pre-trained model into the "weights" file.
- PWC-model OneDrive; Google Drive (36M)
- STRNN-model OneDrive; Google Drive (361M)
The parameters
-
Please change the working directory: "dataDir" to your path in the "Demo_Test.py" and "Demo_Train_DIEM.py" files, like:
dataDir = 'E:/DataSet'
-
More parameters are in the "train" and "test" functions.
-
Run the demo "Demo_Test.py" and "Demo_Train_DIEM.py" to test or train the model.
The full training process:
- Our model is trained on SALICON and part of the DIEM dataset and then tested on DIEM20, CITIUS-R, LEDOV41, SFU and DHF1K benchmark. In the SF-Net module, we retrain the St-Net on SALICON dataset and fine-tune the PWC-Net of the OF-Net on the training set of DIEM dataset. Then, we train the whole network on the training set of DIEM dataset and fix the parameters of the trained PWC-Net.
The training and testing datasets:
The training and test data examples:
And it is easy to change the output format in our code.
- The results of video task is saved by ".mat"(uint8) formats.
- You can get the color visualization results based on the "Visualization Tools".
- You can evaluate the performance based on the "EvalScores Tools".
Results: ALL (6.2G): DIEM20 (261M), CITIUS-R (54M), LEDOV41 (839M), SFU (39M)
Results for DHF1K:
We use the first 300 frames of each video from the DHF1K training set to retrain the model and generate the new results.
-
strnn_res_dhf1k_test : OneDrive; Google Drive (3.94G)
-
strnn_res_dhf1k_val : OneDrive (1.09G)
If you use the STRNN video saliency model, please cite the following paper:
@article{zhang2020spatial,
title={A spatial-temporal recurrent neural network for video saliency prediction},
author={Zhang, Kao and Chen, Zhenzhong and Liu, Shan},
journal={IEEE Transactions on Image Processing},
volume={30},
pages={572--587},
year={2021}
}
Kao ZHANG
Laboratory of Intelligent Information Processing (LabIIP)
Wuhan University, Wuhan, China.
Email: zhangkao@whu.edu.cn
Zhenzhong CHEN (Professor and Director)
Laboratory of Intelligent Information Processing (LabIIP)
Wuhan University, Wuhan, China.
Email: zzchen@whu.edu.cn
Web: http://iip.whu.edu.cn/~zzchen/