Official PyTorch implementation from the following paper:
Audio classification with Dilated Convolution with Learnable Spacings.
by Ismail Khalfaoui Hassani, Timothée Masquelier and Thomas Pellegrini.
- AudioSet-2M dataset
- AudioSet-2M Training Code
- Pre-trained models on AudioSet-2M
Model @ 128x1001 | Kernel Size / Count | Method | # Parameters | mAP | Throughput (sample/s) | model |
---|---|---|---|---|---|---|
ConvFormer-S18† | 7x7 / 49 | Depth. Conv. | 26.8M | 43.14 ± 0.03 | 513.3 | model |
ConvFormer-S18† | 23x23 / 26 | DCLS-Gauss | 26.8M | 43.68 ± 0.02 | 396.8 | model |
FastVIT-SA24‡ | 7x7 / 49 | Depth. Conv. | 21.5M | 43.82 ± 0.05 | 633.6 | model |
FastVIT-SA24‡ | 23x23 / 26 | DCLS-Gauss | 21.5M | 44.4 ± 0.07 | 551.7 | model |
ConvNeXt-T | 7x7 / 49 | Depth. Conv. | 28.6M | 44.83 ± 0.14 | 591.4 | model |
ConvNeXt-T | 23x23 / 26 | DCLS-Gauss | 28.6M | 45.52 ± 0.05 | 509.4 | model |
† Trained using LAMB, ‡ No ImageNet pretraining.
Please check INSTALL.md for installation instructions.
We give an example evaluation command for an AudioSet-2M pre-trained FastVit-DCLS-AUDIO-SA24:
Single-GPU
python main.py --model fastvit_dcls_audio_sa24 --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/
Multi-GPU
python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_dcls_audio_tiny --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/
See TRAINING.md for training and fine-tuning instructions.
This repository was inspired by ConvNeXt.
This project is released under the MIT license. Please see the LICENSE file for more information.
If you find this repository helpful, please consider citing:
@article{khalfaoui2023audio,
title={Audio classification with Dilated Convolution with Learnable Spacings},
author={Khalfaoui-Hassani, Ismail and Masquelier, Timoth{\'e}e and Pellegrini, Thomas},
journal={arXiv preprint arXiv:2309.13972},
year={2023}
}