Try it now
A working sandbox. No sign-up, no project. Sample data only.
- WAV, MP3, M4A, MP4, MOV, MKV and most other common formats accepted.
- Speaker turns separated automatically; rename one speaker and the change applies project-wide.
- Click any transcript line to seek the audio player; edits save instantly.
- Your choice of transcription engine per file. Swap any time to suit the source language.
How it works
Three steps from raw material to result.
Audio or video, up to a few hours per file. The audio is extracted automatically and routed to the transcription engine you picked.
Take me back to that night. Did you have any idea what was going to happen?
It wasn't planned. None of it was. We were sitting there… and the lights went out.
Speaker turns are detected and a voice fingerprint is captured for every voice in the file. Each line is clickable and seeks the waveform.
Correct typos inline, name the speakers once, and the renames propagate across every interview where their voice appears.
Frequently asked questions
What audio formats can I upload?
WAV, MP3, M4A, AAC, OGG, FLAC, and most common video containers (MP4, MOV, MKV, WebM). The audio is extracted server-side automatically before transcription runs.
How accurate is the transcript?
Word error rate depends on audio quality and accent. On clean studio interviews in major languages, accuracy lands in the high 90s; on noisy field recordings with strong regional accents, it drops into the 80s. You always get the source audio aligned to the text so you can quickly review and correct.
Can I edit the transcript after it lands?
Yes. The transcript opens in a three-panel editor next to the audio waveform. Click any line to seek the player, edit text inline, rename speakers, or merge identities. Changes save instantly.
Does the transcript include speakers?
Speaker turns are detected automatically. Identities are voice-fingerprinted and matched across every interview in the same project, so once you name a speaker, the change applies everywhere they appear.
How long can a file be?
Single uploads are capped at a few hours of audio per file in the free tier. Longer interviews are usually split into reels during recording; you can upload them as a folder and the project handles them as one session.
Related capabilities
Further reading
Background guides and comparisons.
Word error rate is the standard accuracy metric, but it understates the problems that matter most for documentary audio: proper nouns, accents, crosstalk, and technical terms. Here is what WER measures and what it does not.
Otter.ai is built for meeting notes. PaperCuts is built for multi-speaker documentary interviews, speaker identity across files, and a post-production assembly layer. The gap is not about accuracy.