Question 1

How does cross-file matching work?

Accepted Answer

Each speaker turn is embedded with SpeechBrain's ECAPA-TDNN model (192-dim voice print) and compared against the project's existing identities. Matches above a tuned similarity threshold are auto-grouped; ambiguous ones are surfaced for review.

Question 2

Can I merge two speakers I named separately?

Accepted Answer

Yes. Merge from the speaker panel and all appearances of the absorbed identity are renamed across every file. Splitting one identity into two is also supported.

Question 3

What if the system gets a speaker wrong?

Accepted Answer

Correct the label on any turn and the system re-embeds and reconsiders nearby matches automatically. A project-wide cleanup pass can be triggered manually after large edits.

Question 4

Does it work for two speakers who sound alike?

Accepted Answer

Voice prints separate most adult voices reliably even when timbre is close. Identical-twin-level similarity occasionally needs manual review, especially in short turns under two seconds.

Question 5

Is the speaker model run locally?

Accepted Answer

Speaker embedding runs in a Python sidecar process on our servers, not in the browser. No raw audio is sent to a third-party speaker-ID API.

Speaker identification that works across every file in your project

Try it now

How it works

Frequently asked questions

Related capabilities

Further reading

Put speaker identification to work on your project.