Mismatch vs Non-Verbal — choosing the right rule

VoiceQC ships with two Auto-rules: Mismatch and Non-Verbal. They look superficially similar — both flag Session items and apply a Property automatically — but they answer different questions. This page is about when each is the right tool, and (just as importantly) what neither catches.

The two rules at a glance

	Mismatch rule	Non-Verbal rule
Question it answers	”Did the actor say the right words?"	"Did the actor say words at all?”
Runs on	The transcript (after speech recognition)	The audio (before speech recognition)
Needs a Script	Yes — without one, nothing to compare against	No
Typical Property applied	`Status / Mismatch`	`Status / Non-verbal` or `Quality / No take`
Main knob	Normalization toggles + Synonyms	Speech threshold (0.00 to 1.00)

The order matters. Non-Verbal runs first and removes the takes that aren’t really speech. Mismatch runs second and QCs the takes that are speech against what was supposed to be said.

When to use Mismatch

Use the Mismatch rule when:

You have a script — every line was written before it was recorded.
You care whether the take matches the script — wrong word, dropped phrase, paraphrase, line ad-libbed.
The transcription quality is good enough that “the words differ” is a real signal, not noise from the speech recognizer.

Mismatch is most useful in scripted dialogue workflows: video game voice-over, audiobook narration, animated film looping. It’s less useful for unscripted material (interviews, livestreams) where there’s no canonical text to compare against.

When to use Non-Verbal

Use the Non-Verbal rule when:

Your batches include takes that shouldn’t be transcribed at all — slate ins, room tone, fake takes, breath samples, an actor warming up.
You want to filter those out automatically rather than scrolling past them in the items table.
You don’t necessarily have a script.

Non-Verbal is the trash filter. It catches the stuff that would otherwise pollute your transcription pile. It’s especially valuable on raw production batches that haven’t been hand-curated by an editor first.

Use both together

The rules compose well:

Non-Verbal removes the noise floor — takes the actor never really delivered.
Mismatch flags the takes where the actor delivered something, but not what was scripted.

What’s left after both run is a clean column of items that are real speech and match the script. The flagged items are the ones a human should listen to.

What neither rule catches

This is the part most people get wrong on first read, so it’s worth being blunt:

VoiceQC works at the transcript layer. It does not analyze waveforms.

Audio problem	VoiceQC catches it?
Wrong words / dropped lines	Yes — Mismatch
Empty or non-speech takes	Yes — Non-Verbal
Clipping (peaks above 0 dBFS)	No
Background noise / room tone	No
Mouth clicks, plosives, sibilance	No
Loudness / RMS / LUFS issues	No
Mix balance / panning / dialogue ducking	No
Phoneme-level mispronunciation	No (only catches the words your speech recognizer chose, not how they were pronounced)

For DSP-level audio analysis, use a dedicated tool — iZotope RX, an offline LUFS meter, your DAW’s analyzer plugins. VoiceQC sits next to those tools, not on top of them.

Why VoiceQC chose this layer

The “no waveform analysis” choice is deliberate. Two reasons:

The transcript layer is where the team disagreement is. Catching clipping tells you the audio engineer didn’t gain-stage well. Catching a mismatch tells you the take is unusable because the actor said the wrong thing — a much higher-stakes finding for a voice-over pipeline.
Audio analysis is a crowded category. Building a worse version of iZotope RX wouldn’t help anyone. Building the only automated “did they say the script?” check has clear value.

If audio-level QC matters to your workflow, the practical pattern is: run your DSP tool first (catch the clipping, loudness, etc.), then upload the cleaned takes to VoiceQC for content QC.