How transcription works
The short version: you drop a file, it uploads, VoiceQC transcribes it in the background, and the row in your Session items table fills in with text the moment the result is ready. No polling. No refresh button.
This page unpacks the why. It’s background reading, not a how-to. If you’re trying to actually upload audio, see Upload audio files.
The flow, end to end
Section titled “The flow, end to end”When you drop a WAV onto the upload zone, three things happen, mostly in parallel:
-
The browser uploads your audio. Your audio goes directly to VoiceQC’s storage — it doesn’t pass through the marketing site or any intermediate service. That keeps uploads fast and means a large file isn’t held up by other things VoiceQC is doing.
-
A new Session item is created. Right away, before transcription starts, a row appears in the Session items table with the file’s name. Empty for now where the transcription will go.
-
Transcription runs. VoiceQC picks up the new audio and runs it through automatic speech recognition. This is the slow part — typically seconds to a minute per file depending on length. Multiple files transcribe in parallel.
When transcription completes, the row in your Session items table fills in with the text. The browser knows this happened because it holds a live connection back to VoiceQC; you don’t need to refresh, navigate away and back, or click anything.
Why this design
Section titled “Why this design”A few choices fall out of audio QC’s particular shape.
Direct-to-storage upload. Audio files are heavy. Routing them through the web servers that render the UI would make uploads slower and would tie up resources that should be answering “draw the next page” requests. The browser uploads straight to storage; the web tier never sees the audio bytes.
Async transcription. A 5-minute file takes longer than a 5-second file. If the upload request had to wait for transcription to finish, browsers would time out. Separating “upload finished” from “transcription finished” means the UI can show you “your file is in” within seconds, and you can keep working while transcription runs.
Push, not poll. The browser could ask “is it done yet?” every few seconds, but that scales badly when one team has hundreds of files in flight. Instead the browser opens a single live connection at sign-in and VoiceQC pushes updates as items complete. A team uploading 300 files at once gets a steady drip of “this one’s done” notifications instead of 300 timers running concurrently.
What this means in practice
Section titled “What this means in practice”A few practical consequences of the design:
You can close the tab. Transcription doesn’t depend on your browser being open. Come back later and the rows are filled in. The live update only matters when you’re actively watching.
Uploads are safe to retry. If your network drops mid-upload, drop the file again. VoiceQC dedupes on the filename within a Session, so you won’t end up with two rows for the same file from your retry.
Order isn’t guaranteed. If you upload a.wav, b.wav, c.wav together, they might finish in any order. Short files finish first; transcription is per-file, not strictly sequential.
Re-evaluating an Auto-rule is safe. The Mismatch and Non-Verbal rules can be re-run any number of times. Each evaluation looks at the current Property assignments and reconciles — adding what should be there, removing what shouldn’t. No risk of duplicate Properties or stuck state.
What you can’t see or control
Section titled “What you can’t see or control”A few things are deliberately abstracted from the user:
- Which transcription model is in use. VoiceQC may swap or upgrade the underlying speech recognition over time. You’ll see quality improve; you won’t see a “pick a model” option.
- Where in the queue your file is. Transcription is roughly first-come-first-served per account, but VoiceQC doesn’t surface a position number. The toast that says “Evaluating session items…” is the only progress indicator.
- Per-second confidence scores. The transcription is delivered as final text. The Non-Verbal rule uses internal confidence signals, but they aren’t surfaced as raw numbers in the UI.
If any of these matter for your workflow, the Feedback form is the place to say so — it’s how the product team learns what to expose next.
See also
Section titled “See also”- Mismatch vs Non-Verbal — what gets tagged after transcription finishes, and what neither rule catches.
- Upload audio files — the how-to.
- Glossary: Transcription