Try it now
A working sandbox. No sign-up, no project. Sample data only.
- Sub-second latency. Most words appear on screen within a second of being spoken.
- The browser records locally in parallel so connection drops don't lose audio.
- Live-share link for up to 25 viewers, each picking their own translation language.
- 20+ source languages across four providers; pick the right one per session.
- Producer chat lets remote editors and stakeholders message the on-set producer without the interviewee ever seeing it.
Click start to see interim captions stream in.
How it works
Three steps from raw material to result.
When did you first realise the bridge wouldn't hold?
We knew by the second week. The numbers [typing…]
Microphone audio streams from the browser to a transcription engine in real time. Interim captions appear within a few hundred milliseconds.
We knew by the second week.
The numbers stopped making sense.
Lo supimos a la segunda semana.
Los números dejaron de tener sentido.
A parallel translation column updates as words land. A shareable link lets up to 25 viewers join, each picking their own target language.
The browser keeps recording a local copy the whole time. If the network drops, any gap is filled automatically by re-transcribing the local audio when you reconnect.
Speaker recognition across every voice
Each speaker turn is fingerprinted and matched automatically. Rename once and every appearance updates, including files you uploaded weeks ago.
Voice prints matched automatically. Rename once and the change applies to every file.
Select, highlight, send. Without leaving the transcript.
Highlight any span, add a colour marker, send the quote plus its timecode and speaker label directly to a paper cut, or copy it for use elsewhere.
Remote oversight without disrupting the room
Editors at HQ, executive producers, legal stakeholders, and news editorial teams can follow the session and message the on-set producer in real time. The interviewee never sees the chat.
Waiting for messages…
Walk me through the sequence of events.
It was complicated. We'd been watching the situation for months.
From the beginning: when did you first see the report?
The first time I actually read it
Frequently asked questions
What's the latency like?
Most words appear within 300–800ms of being spoken, depending on the engine. The fastest engines deliver results in under half a second.
Can I switch providers mid-session?
No, but you can pick the best provider for your language before you start. The session locks to that engine for stability; the next session can pick a different one.
Does it work if my connection drops?
The browser keeps a local copy the whole time. When you come back online, any gap is filled by re-transcribing the locally saved audio against the server transcript.
How many people can watch the live share?
Up to 25 viewers per live session, each picking their own translation language independently.
What languages does live transcription support?
20+ source languages across the four providers. The dropdown shows which languages each provider supports before you start.
Who can send messages in the producer chat, and who sees them?
Anyone with the live-share link can send chat messages: a remote editor at HQ, an executive producer, a legal observer, a news director. The messages go only to the session host (the on-set producer). The interviewee or subject never sees the chat, so sensitive direction, editorial pushback, or legal guidance stays private to the production team.
Related capabilities
Further reading
Background guides and comparisons.
How to choose between Deepgram, Google STT V2, AssemblyAI, and OpenAI realtime for live interview transcription. The key axes are latency, diarization quality, language coverage, and cost per minute.
Both produce searchable multi-speaker transcripts and export in multiple formats. The divergence is in what comes after: Trint has no beat sheet or paper cut assembly; PaperCuts has no NLE-style video scrubbing.