diarization only detects one speaker issue #2663

pfcout · 2025-10-01T21:25:35Z

pfcout
Oct 1, 2025

Hi everyone,

I’m not from the IT field (this is actually my very first project in programming), but I’ve been trying to put together a simple pipeline using Whisper and speaker diarization with some help from AI tools.

The goal is pretty basic: I just want to transcribe telesales calls in Portuguese and separate the speakers automatically. I managed to combine Whisper with pyannote’s diarization model, but I’m hitting some issues:

Sometimes the diarization only detects one speaker, even though there are clearly two.
I’m not sure if I’m using the best approach to align diarization segments with Whisper segments.

Since I’m a beginner, I might be missing something obvious. Could anyone help me with this issues?

Thanks a lot for your patience and help 🙏

If anyone needs it, my code is located here: https://github.com/pfcout/whisper_transcription

jonathgh · 2025-10-15T09:29:25Z

jonathgh
Oct 15, 2025

I know this issue was also one that we had to solve, combining pyannote with Whisper, as the segment times don't usually line up perfectly and some synchronization has to be done between the two segments. Usually Pyannote is more accurate in terms of time than Whisper, which can lead to some inaccuracies.

We ended up integrating this into a one-click-install interface, which does this backend segment math, and allows you to specify the max number of speakers for Pyannote. I'll take a look at your code to see if I can help out futher!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

diarization only detects one speaker issue #2663

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

diarization only detects one speaker issue #2663

Uh oh!

pfcout Oct 1, 2025

Replies: 1 comment

Uh oh!

jonathgh Oct 15, 2025

pfcout
Oct 1, 2025

jonathgh
Oct 15, 2025