-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine tuning on novel dataset #14
Comments
Hi thank you! The important method in the dataset file are If you want to use the framework to fine-tune the model on a new dataset, I'd recommend:
|
Thanks for this information! At the moment, I'm trying to get this running in Google Colab, and it appears it will take some time to resolve the conda/pip/mamba environment, as well as removing all the MongoDB requirements, which I imagine will be quite difficult to disentangle. I will report back if I manage to get it working, and how I go with the fine tuning. Cheers! |
Hello. I was hoping to use your model to train on the good sounds dataset (musical instrument recordings, stored in .wav file). At the beginning of epoch 2 (epoch 1 trains successfully which is more confusing) the program crashes with error
The stack trace is fully within pytorch lightning. I can provide it if needed, but I imagine the problem is elsewhere. I was wondering if it was some sort of tensor shape error? It gives me an Image size warning
I don't know where these dimensions came from. What should the shapes of the dataset.getitem be? My understanding is that: waveform.reshape(1,-1) essentially flattens the data, then row.filename is a string, and target is a single number (long). Any help or insight would be greatly appreciated! |
Hi, I'm not sure what can cause CUDA error. It can be that you are feeding longer audio clips to the model (longer than 10 seconds)? the 128x500 are the input spectrogram resolution. the pretrained model expects 128 mel-frequency bins time ~1000 time frames (corresponding to 10 seconds). But it shouldn't be a problem to fine tune the model on shorter audio clips. |
Hello. Firstly, thank you for this great work! I've already had very promising results looking at the "scene" embeddings from these models, and looking to fine tune a model on a new dataset - similar to ESC50 & others. (as a side note, using scene embeddings & a logistic regression, I'm having acceptably good results, however I'm convinced true fine tuning would be significantly better).
I'm having a bit of trouble interpreting the example scripts. Are you able to give a simple explanation of what is required for fine-tuning (e.g. the data format, directories vs JSON file, formal of labels CSV, etc)? It's quite hard to reverse engineer this from the code. I have a directory of files, and known labels, and simply want to fine tune a model on it. And once the data is in place, which functions/CLI scripts should be invoked?
Many thanks, and if I'm missing something obvious, apologies. I know the Audioset page has a few more details but it's still not crystal clear how to proceed. Cheers!
The text was updated successfully, but these errors were encountered: