hi #5

Abdulrahman392011 · 2024-10-29T08:57:29Z

I am building a necklace that have a local language model running with whisper transcription. whisper do provide the time frame of each sentence, however it doesn't tell you who is saying what. eventually you get a very long paragraph that is pushed into the language model and confuses it.

I need a bit of help building off of what you guys built. to be able to detect different voices in the recording and also determine if that person is someone that the user has talked to before and if so, who.

so to be more detailed, I need to identify different people in the recording crop it to every person separately then train the model.

also I need to be able to export the time frame in which each individual was speaking, so I can reference it within the transcription.

PS. the system is written in python and runs on ubuntu24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hi #5

hi #5

Abdulrahman392011 commented Oct 29, 2024

hi #5

hi #5

Comments

Abdulrahman392011 commented Oct 29, 2024