Try it now
A working sandbox. No sign-up, no project. Sample data only.
- Voice fingerprinting runs on our servers. No raw audio is sent to a third-party service.
- Cross-file matching: name a speaker on Day 04 and Day 17 picks up the label too.
- Merge or split identities project-wide. Renames apply everywhere instantly.
- Manual override on any turn re-runs the match and reconsiders nearby appearances.
How it works
Three steps from raw material to result.
Each speaker turn is analysed locally on our server to create a unique voice fingerprint. No audio is sent to a third-party speaker service.
Fingerprints are compared against existing project identities. Strong matches auto-group; ambiguous ones surface for review.
Rename a speaker on any file and the change ripples to every appearance, including files you uploaded weeks ago.
Frequently asked questions
How does cross-file matching work?
Each speaker turn is analysed to create a unique voice fingerprint, then compared against the identities already in the project. Matches above a tuned confidence threshold are grouped automatically; ambiguous ones are flagged for manual review.
Can I merge two speakers I named separately?
Yes. Merge from the speaker panel and all appearances of the absorbed identity are renamed across every file. Splitting one identity into two is also supported.
What if the system gets a speaker wrong?
Correct the label on any turn and the system re-analyses and reconsiders nearby matches automatically. A project-wide cleanup pass can be triggered manually after large edits.
Does it work for two speakers who sound alike?
Voice fingerprinting separates most adult voices reliably even when timbre is close. Identical-twin-level similarity occasionally needs manual review, especially in short turns under two seconds.
Is speaker identification run locally?
No. It runs on our servers, not in your browser. No raw audio is sent to a third-party speaker service.
Related capabilities
Further reading
Background guides and comparisons.
Diarization assigns speaker labels to audio segments without knowing who the speakers are. Here is how voice prints work, why similar vocal profiles cause problems, and how merge thresholds control the output.
Running your own pyannote diarization stack makes sense for custom fine-tuning, offline requirements, or research. For production documentary work, the real cost is not the model but the annotation and tuning loop.