AC3 sound / audio frame offsets and QSF or edits


New member

Just looking for some basic info if the developers can share. I tend to record TV from an HDPVR 2, capturing AC3 sound tracks. I've noticed that--and I'm sure this is due to the difference in the frame rate of audio vs the frame rate of video--there are offsets in where VRD shows the sound track starting (and ending) compared to the video frame track.

I have also noticed that it looks like if there is an offset of a few frames of video before the audio starts that if I do a quick stream fix to the file, the resulting new file will have shifted the audio track to now start with the video--or, in some cases if there are many frames, it may only shift it partway.

I have also noticed--although here most of the research is using VRD 4, though I am working on looking at this in 6--that if you do a cut, say around commercials (it doesn't matter if this is on the original file or a QSF'd one) that the result seems to be--again this seems at least true in VRD4--that some integral number of audio frames will be dropped, and then the audio track will again be shifted--and this can be forward or backwards--to make up whatever the remainder gap is.

Since we are talking about something like 33 milliseconds per frame for 29.97 fps video, and 17 ms for 59.94 fps, and the AC3 audio frames I'm looking at seem to have lengths of about 33 milliseconds, cumulatively the effects here could add up. Additionally, one could have a video with a lot of gaps end up being behind on synchronization in one place and ahead in another.

On average, one would expect that the shifts might cancel out, and one could also take care to try to match the relative alignment of video and audio frames at the cuts to minimize shifts, but I am curious to know enough to figure out answers to the following:

1. Why is the sound shifted if I QSF a file that has no other errors, and why is this shift not reported in the report? You get the green check but no notice of any changes. As habit, I tend to run everything through, because TV often has problems, but lately I find the recordings are pretty stable. I don't want to introduce unnecessary shifts since I will end up topping & tailing this after (which seems like it could introduce a further offset).

2. Is there a reason other than that the program assumes the sound should start at the same time as the audio, that QSF shifts the sound? That is, is there any synchronization data in the audio and video tracks that VRD is reading and fixing for? (Not that broadcast doesn't often have sync errors anyway, nor do I fully understand if the capture device is consistent in the synchronization of the tracks, but those are separate tests.)

3. Is there a way to set the maximum tolerated audio shift? Can VRD introduce blank AC3 frames, or blank half-frames, or just drop the audio track and restart it at the right time? I am guessing it's not possible (for licensing or other technical reasons) for VRD to re-author the AC3 stream, that is, to just move the audio frame boundaries vs the data--I don't understand how AC3 works exactly but I assume that compression would make this non-trivial. (Will VRD tell me the audio frame rate? I'm not sure where to look for this.)

4. Does VRD keep track of the cumulative offset effect after several edits in a file, or does it just do the best at each one...or another way to put this: when it decides which is the last audio frame from before the cut to keep, and which to start with after, does it use the audio frames as they will be in the new file being output, thus taking into account the cumulative shift from earlier cuts, or does it compare with the original file? (It would seem the former would have to be the way it happens but I can't tell.)

5. How does this issue relate to the option to put in an audio shift, if VRD prefers to start with the audio aligned ( if that's the case)?

Obviously, I want to figure out the best workflow to minimize audio shifts, both a global shift (e.g. QSF) and the cumulative effects of edits, so understanding what VRD is supposed to be doing, rather than me just guessing, would help. I couldn't find anything in the documentation or the forum covering these effects.

Thanks for any info, and thanks for making what I find to be a very valuable, reliable, and useful product!


Senior Developer
Staff member
The sync is always maintained. We add silent frames to fill the gaps when we're aligning the audio at the start of the file or near cuts. We do this because some playback devices will ignore the audio offset value and start playing the audio and video simultaneously even if it creates a sync issue. By ensuring that the audio and video start off aligned we can avoid this. The math for maintaining the sync is relatively simple because each frame is a static duration. So if you insert a frame then you just adjust the timestamp by the duration of a frame. With some formats, like PCM, we can do this down to the sample. With other formats we have to use whole frames unless you're recoding to avoid having to decode and recode the audio.


New member

This is helpful. That is interesting that some playback devices ignore offset values.

Specifically in the case of working with integral-frame changes to audio, strictly speaking, it seems that the sync can't be maintained exactly: if you re-align at the beginning of the file (or around an edit), and you can't cut audio frames into fractional bits, and they are a different length than the video frames, then you will get remainders that push the audio track forward or backward against the video compared to where it was in the original file. You can see this if you compare original and QSF'd files, or a file around a commercial edit, and you look at the audio graphs. I'll try to post some pictures in a future post of exactly what I mean.

This is an acceptable compromise for not having to re-encode audio, but it does need to be managed. So the question in my mind is how much book-keeping VRD actually does around this.

For example, let's say that the audio frames in a file are just less than twice the duration of the video frames, and the audio starts just halfway through frame 4 (this is appropriate to a 59.97 fps video rate vs about 30 audio fps which is where I noticed these effects). That means there would be just less than 2 full audio frames "missing" in the original recording. Let's say you run this through QSF. In the new audio track, VRD will start the audio with the video, but it also has to put in blank audio frames to try to maintain sync. There would seem to be three possible choices:

1. Insert 2 full blank audio frames (ending just before frame 5) and delay the original soundtrack by about 1/4 of a video frame. The sound would be about 4 milliseconds late (1/4 of 17 ms) vs. the original file (globally). The sound still starts in frame 4. I think most people would have trouble noticing such a delay.

2. Remove 1 frame of video from the start, so that the time gap is now only just over 2.5 frames of video until the audio starts (instead of just over 3.5). Insert 1 blank audio frame (which eats up just under 2 of the video frames of delay) and then shift the original sound about 2/3 of a video frame early, so that it starts just before what is now the 3rd frame (originally 4th frame). The sound would be about 11 ms early (2/3 of 17 ms) compared to the original file. I can see a 10 ms error on my stereo when I look closely at the video.

3. Insert 1 full blank audio frame and shift the audio track about 1 3/4 video frames early, so that instead of starting halfway through frame 4, it now starts just before frame two. This amounts to the audio being 31 milliseconds early.

In the actual files that this is example is based on, VRD 4 and VRD 6 both chose option 3. Note that this introduces the largest shift in the sound. If subsequently one trimmed off the start of the new file, and due to unlucky cuts picked up another 3/4 of a video frame in the same direction, the file would be globally off by more than 2 video frames before shifts from commercial edits!

Unless I'm missing something?

I know VRD has a lot of semi-hidden settings, so again I'm wondering if there is anything that would affect how big a shift the program is willing to make, or if it preferentially shifts earlier or later, etc.

In any case, I'm glad that VRD shows the audio track so you can actually see these effects (I'll try to come up with some good images to illustrate this).
Top Bottom