Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hi #5

Open
Abdulrahman392011 opened this issue Oct 29, 2024 · 0 comments
Open

hi #5

Abdulrahman392011 opened this issue Oct 29, 2024 · 0 comments

Comments

@Abdulrahman392011
Copy link

I am building a necklace that have a local language model running with whisper transcription. whisper do provide the time frame of each sentence, however it doesn't tell you who is saying what. eventually you get a very long paragraph that is pushed into the language model and confuses it.

I need a bit of help building off of what you guys built. to be able to detect different voices in the recording and also determine if that person is someone that the user has talked to before and if so, who.

so to be more detailed, I need to identify different people in the recording crop it to every person separately then train the model.

also I need to be able to export the time frame in which each individual was speaking, so I can reference it within the transcription.

PS. the system is written in python and runs on ubuntu24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant