Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: stft requires the return_complex parameter be given for real inputs #38

Open
loukasilias opened this issue Dec 15, 2023 · 3 comments

Comments

@loukasilias
Copy link

Hello!
I am using the following code:

from hear21passt.base import get_basic_model,get_model_passt
import torch
# get the PaSST model wrapper, includes Melspectrogram and the default pre-trained transformer
model = get_basic_model(mode="logits")
print(model.mel) # Extracts mel spectrogram from raw waveforms.
print(model.net) # the transformer network.

# example inference
model.eval()
model = model.cuda()
with torch.no_grad():
    # audio_wave has the shape of [batch, seconds*32000] sampling rate is 32k
    # example audio_wave of batch=3 and 10 seconds
    audio = torch.ones((3, 32000 * 10))*0.5
    audio_wave = audio.cuda()
    logits=model(audio_wave) 

I am getting the following error:

RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

How can I solve this issue please?
Thank you!

@kkoutini
Copy link
Owner

Hi!
Are you using the latest release ?
This issue should have been fixed in kkoutini/passt_hear21@dce8318

try uninstalling your current passt_hear21 package and reinstalling:

pip install -e 'git+https://github.com/kkoutini/passt_hear21@0.0.25#egg=hear21passt' 

@loukasilias
Copy link
Author

Thank you!
It has been solved.
I have some additional questions.

I have several audio files of variable duration (40 secs, 1 min, etc.). Is it possible to use your library?
I am using the following:

import librosa

path = '/kaggle/input/audio-files/audio_files/S004.wav'
x , sr = librosa.load(path, sr=32000)

What can I do next?
Thank you again for your help!

@kkoutini
Copy link
Owner

Hi the model is compatible with the HEAR API.
Here is an example of the base model:

from hear21passt.base import load_model, get_scene_embeddings, get_timestamp_embeddings

model = load_model().cuda()
path = '/kaggle/input/audio-files/audio_files/S004.wav'
audio, sr = librosa.load(path, sr=32000)

embed, time_stamps = get_timestamp_embeddings(audio, model)
print(embed.shape)
embed = get_scene_embeddings(audio, model)
print(embed.shape)

If you need more control, take a look here where these methods are implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants