Replies: 1 comment
-
| 
         I know this issue was also one that we had to solve, combining pyannote with Whisper, as the segment times don't usually line up perfectly and some synchronization has to be done between the two segments. Usually Pyannote is more accurate in terms of time than Whisper, which can lead to some inaccuracies. We ended up integrating this into a one-click-install interface, which does this backend segment math, and allows you to specify the max number of speakers for Pyannote. I'll take a look at your code to see if I can help out futher!  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I’m not from the IT field (this is actually my very first project in programming), but I’ve been trying to put together a simple pipeline using Whisper and speaker diarization with some help from AI tools.
The goal is pretty basic: I just want to transcribe telesales calls in Portuguese and separate the speakers automatically. I managed to combine Whisper with pyannote’s diarization model, but I’m hitting some issues:
Sometimes the diarization only detects one speaker, even though there are clearly two.
I’m not sure if I’m using the best approach to align diarization segments with Whisper segments.
Since I’m a beginner, I might be missing something obvious. Could anyone help me with this issues?
Thanks a lot for your patience and help 🙏
If anyone needs it, my code is located here: https://github.com/pfcout/whisper_transcription
Beta Was this translation helpful? Give feedback.
All reactions