Dialog Editing Part 4: Stems and Specs for Dialog Editors

This post may contain affiliate links. Please read my disclosure for more information.

Even though specs and stems apply mainly to the dialog mixer, it does affect how a dialog editor does their work, too.


Even though the goal of a mix is the balance all the tracks and elements together, we also have to output each of the sound categories individually. When you route all the tracks from a category together it’s called a “stem”. For example, when you mix a tv show for a broadcast network (like CBS, ABC, BBC, etc), the “deliverables” they may ask for are:

  • Mix
  • Stems (VO, Dialog, FX, Foley, Music)
  • M&E Stem (music and effects – for foreign dubs)
  • Mix Minus Stem (the mix without voiceover)


Any time an audio mix is going to broadcast or a distributor, it’s important to know about technical rules called “specs.” Specs are essentially a set of rules for each broadcaster and will cover everything from audio and video to closed captioning and subtitles. For audio, it should address issues like:

  • How loud content can be (overall average and peak levels)
  • What format to deliver (tape or files – including file type and bit depth)
  • Any specific mixing or monitoring requirements (such as “no music in the center channel”)

In the US, any sort of tv network (including Amazon and Netflix) have pretty strict guidelines which you will find on a “spec sheet.” The spec sheet will have the specific details of what they are asking for – everything from bit depth to stems to how loud the mix can be. Some spec sheets can be found online and others you may have to ask your client to get for you. Here’s a couple examples to see what they look like:

Specs can affect how you edit dialog. For example, some networks want all English words on the dialog stem and non-English speaking (like screams or breathing) included with sound fx. Other networks want all human sounds (whether it’s English or not) on the dialog stem. Some networks make it even more confusing and ask that dialog under interviews be sent to the sound fx tracks. (Called “incidental dialog” – when we see a shot with people speaking but there’s an interview talking over it).

Specs may also say how to handle cursing. Sometimes you need a “clean” version (with bleeps) along with a “dirty” version with cursing.

Basically, there’s no norms for specs which is why you want to know what network or distributor you’re preparing for and consult their spec sheet. Sometimes you can find it online or you may have to ask your re-recording mixer, sound supervisor (if there is one), or client. If the spec sheet doesn’t address an issue (like cursing), it means there’s no rules to adhere to.

What if there’s no specs?

If the project is going to web – a site like Youtube, Vimeo, Facebook, etc – there are no specs. If you’re mixing a short or film for a film festival or to shop around for a distributor, there’s usually no specs yet.

If there’s no specs, I keep all spoken English on the dialog tracks. I edit pfx onto separate pfx tracks. If there’s anything I’m unsure about (or might change), I’ll keep it on a separate track. Put in a marker or send an email to the mixer to let them know any details about tracks out of the ordinary.

Quality Control

The reason all this matters is for QC (quality control). Basically, the audio mix (and the video) will be reviewed for technical errors and problems like audio dropouts, sync issues, and language on stems they’re not supposed to be. If a dialog editor puts spoken language on the wrong tracks, it’ll get flagged and possibly kicked back to the mixer to fix.

It’s possible you work on a project long before it goes through QC. Mistakes happen, too, especially if you’re on a tight deadline or a confusing project. It’s one of those things where you do the best you can and know that from time to time something will get kicked back and need changes.

Leave a Comment