Audio classification with Dilated Convolution with Learnable Spacings

Official PyTorch implementation from the following paper:

Audio classification with Dilated Convolution with Learnable Spacings.
by Ismail Khalfaoui Hassani, Timothée Masquelier and Thomas Pellegrini.

Catalog

AudioSet-2M dataset
AudioSet-2M Training Code
Pre-trained models on AudioSet-2M

Results and Pre-trained Models

AudioSet-2M trained models

Model @ 128x1001	Kernel Size / Count	Method	# Parameters	mAP	Throughput (sample/s)	model
ConvFormer-S18†	7x7 / 49	Depth. Conv.	26.8M	43.14 ± 0.03	513.3	model
ConvFormer-S18†	23x23 / 26	DCLS-Gauss	26.8M	43.68 ± 0.02	396.8	model
FastVIT-SA24‡	7x7 / 49	Depth. Conv.	21.5M	43.82 ± 0.05	633.6	model
FastVIT-SA24‡	23x23 / 26	DCLS-Gauss	21.5M	44.4 ± 0.07	551.7	model
ConvNeXt-T	7x7 / 49	Depth. Conv.	28.6M	44.83 ± 0.14	591.4	model
ConvNeXt-T	23x23 / 26	DCLS-Gauss	28.6M	45.52 ± 0.05	509.4	model

† Trained using LAMB, ‡ No ImageNet pretraining.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We give an example evaluation command for an AudioSet-2M pre-trained FastVit-DCLS-AUDIO-SA24:

Single-GPU

python main.py --model fastvit_dcls_audio_sa24 --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/

Multi-GPU

python -m torch.distributed.launch --nproc_per_node=8 main.py \
--model convnext_dcls_audio_tiny --eval true \
--resume https://zenodo.org/record/8370979/files/fastvit_dcls_audio_sa24.pth \
--drop_path 0.4 \
--data_path /path/to/AudioSet-2M/hdf5s/waveforms/

Training

See TRAINING.md for training and fine-tuning instructions.

Acknowledgement

This repository was inspired by ConvNeXt.

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@article{khalfaoui2023audio,
  title={Audio classification with Dilated Convolution with Learnable Spacings},
  author={Khalfaoui-Hassani, Ismail and Masquelier, Timoth{\'e}e and Pellegrini, Thomas},
  journal={arXiv preprint arXiv:2309.13972},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
TRAINING.md		TRAINING.md
augmentations.py		augmentations.py
datasets.py		datasets.py
engine.py		engine.py
main.py		main.py
optim_factory.py		optim_factory.py
run_with_submitit.py		run_with_submitit.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio classification with Dilated Convolution with Learnable Spacings

Catalog

Results and Pre-trained Models

AudioSet-2M trained models

Installation

Evaluation

Training

Acknowledgement

License

Citation

About

Releases

Packages

Languages

License

K-H-Ismail/DCLS-Audio

Folders and files

Latest commit

History

Repository files navigation

Audio classification with Dilated Convolution with Learnable Spacings

Catalog

Results and Pre-trained Models

AudioSet-2M trained models

Installation

Evaluation

Training

Acknowledgement

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages