Mismatch vs Non-Verbal — choosing the right rule
VoiceQC ships with two Auto-rules: Mismatch and Non-Verbal. They look superficially similar — both flag Session items and apply a Property automatically — but they answer different questions. This page is about when each is the right tool, and (just as importantly) what neither catches.
The two rules at a glance
Section titled “The two rules at a glance”| Mismatch rule | Non-Verbal rule | |
|---|---|---|
| Question it answers | ”Did the actor say the right words?" | "Did the actor say words at all?” |
| Runs on | The transcript (after speech recognition) | The audio (before speech recognition) |
| Needs a Script | Yes — without one, nothing to compare against | No |
| Typical Property applied | Status / Mismatch | Status / Non-verbal or Quality / No take |
| Main knob | Normalization toggles + Synonyms | Speech threshold (0.00 to 1.00) |
The order matters. Non-Verbal runs first and removes the takes that aren’t really speech. Mismatch runs second and QCs the takes that are speech against what was supposed to be said.
When to use Mismatch
Section titled “When to use Mismatch”Use the Mismatch rule when:
- You have a script — every line was written before it was recorded.
- You care whether the take matches the script — wrong word, dropped phrase, paraphrase, line ad-libbed.
- The transcription quality is good enough that “the words differ” is a real signal, not noise from the speech recognizer.
Mismatch is most useful in scripted dialogue workflows: video game voice-over, audiobook narration, animated film looping. It’s less useful for unscripted material (interviews, livestreams) where there’s no canonical text to compare against.
When to use Non-Verbal
Section titled “When to use Non-Verbal”Use the Non-Verbal rule when:
- Your batches include takes that shouldn’t be transcribed at all — slate ins, room tone, fake takes, breath samples, an actor warming up.
- You want to filter those out automatically rather than scrolling past them in the items table.
- You don’t necessarily have a script.
Non-Verbal is the trash filter. It catches the stuff that would otherwise pollute your transcription pile. It’s especially valuable on raw production batches that haven’t been hand-curated by an editor first.
Use both together
Section titled “Use both together”The rules compose well:
- Non-Verbal removes the noise floor — takes the actor never really delivered.
- Mismatch flags the takes where the actor delivered something, but not what was scripted.
What’s left after both run is a clean column of items that are real speech and match the script. The flagged items are the ones a human should listen to.
What neither rule catches
Section titled “What neither rule catches”This is the part most people get wrong on first read, so it’s worth being blunt:
VoiceQC works at the transcript layer. It does not analyze waveforms.
| Audio problem | VoiceQC catches it? |
|---|---|
| Wrong words / dropped lines | Yes — Mismatch |
| Empty or non-speech takes | Yes — Non-Verbal |
| Clipping (peaks above 0 dBFS) | No |
| Background noise / room tone | No |
| Mouth clicks, plosives, sibilance | No |
| Loudness / RMS / LUFS issues | No |
| Mix balance / panning / dialogue ducking | No |
| Phoneme-level mispronunciation | No (only catches the words your speech recognizer chose, not how they were pronounced) |
For DSP-level audio analysis, use a dedicated tool — iZotope RX, an offline LUFS meter, your DAW’s analyzer plugins. VoiceQC sits next to those tools, not on top of them.
Why VoiceQC chose this layer
Section titled “Why VoiceQC chose this layer”The “no waveform analysis” choice is deliberate. Two reasons:
- The transcript layer is where the team disagreement is. Catching clipping tells you the audio engineer didn’t gain-stage well. Catching a mismatch tells you the take is unusable because the actor said the wrong thing — a much higher-stakes finding for a voice-over pipeline.
- Audio analysis is a crowded category. Building a worse version of iZotope RX wouldn’t help anyone. Building the only automated “did they say the script?” check has clear value.
If audio-level QC matters to your workflow, the practical pattern is: run your DSP tool first (catch the clipping, loudness, etc.), then upload the cleaned takes to VoiceQC for content QC.
See also
Section titled “See also”- Configure a Mismatch rule — the how-to.
- Configure a Non-Verbal rule — the how-to.
- Auto-rule configuration — the side-by-side reference.