Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Suggestions for supplementing the configuration file to achieve the best model. #787

Closed
milaXT opened this issue Mar 14, 2025 · 2 comments
Assignees
Labels
DOC: documentation documentation ENH: enhancement enhancement; new feature or request

Comments

@milaXT
Copy link
Contributor

milaXT commented Mar 14, 2025

During the process of training the model, David @NickleDave helped me discover some flexible adjustments that could be made in the config file, which ultimately led to the model performing better. We believe that these adjustable parameters might need to be added to the config file to help others achieve better model performance:

  1. transform_type and freq_cutoffs in [vak.prep.spect_params] part.
  2. window_size set to higher value may improve the model in [vak.train.dataset.params] part.
@milaXT milaXT added the ENH: enhancement enhancement; new feature or request label Mar 14, 2025
@NickleDave NickleDave self-assigned this Apr 2, 2025
@NickleDave NickleDave added the DOC: documentation documentation label Apr 2, 2025
@NickleDave
Copy link
Collaborator

Thank you @milaXT for raising this issue.

Very much agree after working with you we can do a better job of documenting what config options give better models.
The trade-off is we want the first tutorial to be quick to run--it's more about showing the steps of train/eval/predict than it is about training a model with good performance.

I think a way we can achieve both is to have an additional callout box in the tutorial, that says something like "look at these config files (link to download) if you want to train better models, but know that these configs will take longer to run"

Thanks also for documenting the key things we need to highlight for better performance.

  1. transform_type and freq_cutoffs in [vak.prep.spect_params] part.
  2. window_size set to higher value may improve the model in [vak.train.dataset.params] part.

Right, so we should have

# SPECT_PARAMS: parameters for computing spectrograms
[vak.prep.spect_params]
# fft_size: size of window used for Fast Fourier Transform, in number of samples
fft_size = 1024
# step_size: size of step to take when computing spectra with FFT for spectrogram
# also known as hop size
step_size = 64
transform_type = "log_spect"
freq_cutoffs = [500, 8000]

And then

[vak.train.dataset.params]
# bigger windows = better. 
# For frame classification models, prefer smaller batch sizes with bigger windows
window_size = 2000

I'm going to paste in the full toml we ended up using below, for reference

# PREP: options for preparing dataset
[vak.prep]
# dataset_type: corresponds to the model family such as "frame classification" or "parametric umap"
dataset_type = "frame classification"
# input_type: input to model, either audio ("audio") or spectrogram ("spect")
input_type = "spect"
# data_dir: directory with data to use when preparing dataset
data_dir = "./fortrain2"
# output_dir: directory where dataset will be created (as a sub-directory within output_dir)
output_dir = "./prep"
# audio_format: format of audio, either wav or cbin
audio_format = "wav"
# annot_format: format of annotations
annot_format = "simple-seq"
# labelset: string or array with unique set of labels used in annotations
labelset = "abcde"
# train_dur: duration of training split in dataset, in seconds
train_dur = 1200
# val_dur: duration of validation split in dataset, in seconds
val_dur = 170
# test_dur: duration of test split in dataset, in seconds
test_dur = 350

# SPECT_PARAMS: parameters for computing spectrograms
[vak.prep.spect_params]
# fft_size: size of window used for Fast Fourier Transform, in number of samples
fft_size = 1024
# step_size: size of step to take when computing spectra with FFT for spectrogram
# also known as hop size
step_size = 64
transform_type = "log_spect"

# TRAIN: options for training model
[vak.train]
# root_results_dir: directory where results should be saved, as a sub-directory within `root_results_dir`
root_results_dir = "./result"
# batch_size: number of samples from dataset per batch fed into network
batch_size = 16
# num_epochs: number of training epochs, where an epoch is one iteration through all samples in training split
num_epochs = 10
# standardize_frames: if true, standardize (normalize) frames (input to neural network) per frequency bin, so mean of each is 0.0 and std is 1.0
# across the entire training split
standardize_frames = true
# val_step: step number on which to compute metrics with validation set, every time step % val_step == 0
# (a step is one batch fed through the network)
# saves a checkpoint if the monitored evaluation metric improves (which is model specific)
val_step = 1000
# ckpt_step: step number on which to save a checkpoint (as a backup, regardless of validation metrics)
ckpt_step = 500
# patience: number of validation steps to wait before stopping training early
# if the monitored evaluation metrics does not improve after `patience` validation steps,
# then we stop training
patience = 6
# num_workers: number of workers to use when loading data with multiprocessing
num_workers = 4
# device: name of device to run model on, one of "cuda", "cpu"

# dataset_path : path to dataset created by prep. This will be added when you run `vak prep`, you don't have to add it

# dataset.params = parameters used for datasets
# for a frame classification model, we use dataset classes with a specific `window_size`

[vak.train.dataset]
path = "prep/fortrain2-vak-frame-classification-dataset-generated-250305_124445"


[vak.train.dataset.params]
window_size = 2000

# To indicate the model to train, we use a "dotted key" with `model` followed by the string name of the model.
# This name must be a name within `vak.models` or added e.g. with `vak.model.decorators.model`
# We use another dotted key to indicate options for configuring the model, e.g. `TweetyNet.optimizer`
[vak.train.model.TweetyNet.optimizer]
# vak.train.model.TweetyNet.optimizer: we specify options for the model's optimizer in this table
# lr: the learning rate
lr = 0.001

# TweetyNet.network: we specify options for the model's network in this table
[vak.train.model.TweetyNet.network]
# hidden_size: the number of elements in the hidden state in the recurrent layer of the network
hidden_size = 256

# this sub-table configures the `lightning.pytorch.Trainer`
[vak.train.trainer]
# setting to 'gpu' means "train models on 'gpu' (not 'cpu')"
accelerator = "gpu"
# use the first GPU (numbering starts from 0)
devices = [0]

@NickleDave
Copy link
Collaborator

@milaXT would you be willing to make a pull request adding this callout to the autoannotate.md page?

The steps you'd follow are in the contributor's guide: https://vak.readthedocs.io/en/latest/development/contributors.html
(You don't need to do all the testing etc for this relatively simple docs contribution)

You'd want to add the callout right at this line after the set-up steps:

It should say something like this:

:::{hint}
The config files in this tutorial have options that make the tutorial run fast, so you can quickly learn the steps to using vak; they will not necessarily give you the best performing models. Click the following link to download a train config file with additional options that will improve performance. See comments in the file for more information.

{download}`gy6or6_train.toml <../toml/gy6or6_train_good.toml>`
:::

And then you'd add that file in the ./doc/toml folder with the additional options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DOC: documentation documentation ENH: enhancement enhancement; new feature or request
Projects
None yet
Development

No branches or pull requests

2 participants