You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a necklace that have a local language model running with whisper transcription. whisper do provide the time frame of each sentence, however it doesn't tell you who is saying what. eventually you get a very long paragraph that is pushed into the language model and confuses it.
I need a bit of help building off of what you guys built. to be able to detect different voices in the recording and also determine if that person is someone that the user has talked to before and if so, who.
so to be more detailed, I need to identify different people in the recording crop it to every person separately then train the model.
also I need to be able to export the time frame in which each individual was speaking, so I can reference it within the transcription.
PS. the system is written in python and runs on ubuntu24.
The text was updated successfully, but these errors were encountered:
I am building a necklace that have a local language model running with whisper transcription. whisper do provide the time frame of each sentence, however it doesn't tell you who is saying what. eventually you get a very long paragraph that is pushed into the language model and confuses it.
I need a bit of help building off of what you guys built. to be able to detect different voices in the recording and also determine if that person is someone that the user has talked to before and if so, who.
so to be more detailed, I need to identify different people in the recording crop it to every person separately then train the model.
also I need to be able to export the time frame in which each individual was speaking, so I can reference it within the transcription.
PS. the system is written in python and runs on ubuntu24.
The text was updated successfully, but these errors were encountered: