Loading article…
Self-hosted pyannote vs built-in diarization: when each makes sense
5 min readUpdated May 27, 2026
Questions
Related reading
Speaker diarization explained: how it works and where it fails
Diarization assigns speaker labels to audio segments without knowing who the speakers are. Here is how voice prints work, why similar vocal profiles cause problems, and how merge thresholds control the output.
Speech-to-text providers compared: Whisper, Deepgram, AssemblyAI, Google STT
An honest comparison of the four speech-to-text providers most documentary editors actually use, scored on accent handling, diarization, latency, and cost per hour of audio.
Transcription accuracy: what WER measures and what it misses
Word error rate is the standard accuracy metric, but it understates the problems that matter most for documentary audio: proper nouns, accents, crosstalk, and technical terms. Here is what WER measures and what it does not.