Skip to content

K-H-Ismail/DCLS-Audio

Repository files navigation

arXiv

Official PyTorch implementation from the following paper:

Audio classification with Dilated Convolution with Learnable Spacings.
by Ismail Khalfaoui Hassani, Timothée Masquelier and Thomas Pellegrini.


Catalog

  • AudioSet-2M dataset
  • AudioSet-2M Training Code
  • Pre-trained models on AudioSet-2M

Results and Pre-trained Models

AudioSet-2M trained models

Model @ 128x1001 Kernel Size / Count Method # Parameters mAP Throughput (sample/s) model
ConvFormer-S18† 7x7 / 49 Depth. Conv. 26.8M 43.14 ± 0.03 513.3 model
ConvFormer-S18† 23x23 / 26 DCLS-Gauss 26.8M 43.68 ± 0.02 396.8 model
FastVIT-SA24‡ 7x7 / 49 Depth. Conv. 21.5M 43.82 ± 0.05 633.6 model
FastVIT-SA24‡ 23x23 / 26 DCLS-Gauss 21.5M 44.4 ± 0.07 551.7 model
ConvNeXt-T 7x7 / 49 Depth. Conv. 28.6M 44.83 ± 0.14 591.4 model
ConvNeXt-T 23x23 / 26 DCLS-Gauss 28.6M 45.52 ± 0.05 509.4 model

† Trained using LAMB, ‡ No ImageNet pretraining.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We give an example evaluation command for an AudioSet-2M pre-trained FastVit-DCLS-AUDIO-SA24:

Single-GPU

python main.py --model fastvit_dcls_audio_sa24 --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/ 

Multi-GPU

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_dcls_audio_tiny --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/

Training

See TRAINING.md for training and fine-tuning instructions.

Acknowledgement

This repository was inspired by ConvNeXt.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@article{khalfaoui2023audio,
  title={Audio classification with Dilated Convolution with Learnable Spacings},
  author={Khalfaoui-Hassani, Ismail and Masquelier, Timoth{\'e}e and Pellegrini, Thomas},
  journal={arXiv preprint arXiv:2309.13972},
  year={2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages