Color grading and how to do it the hard way

I hadn’t paid much attention to color grading until I was blatantly subjected to it via the first Transformers movie. Yes, I have no taste but because I grew up in the 80’s I was compelled to watch it in the theater. A while later I ran across this page that addressed my concerns but then I asked myself if it would be possible to make it suck less.

At the time I was using Sony Vegas Movie Studio HD Platinum 11 but couldn’t find any decent method to do color grading the way I wanted to do it until I ended up stumbling upon AAV ColorLab’s plugin. I was able to scrape away a lot of the problems, but in some cases it just wouldn’t work like I hoped it would.

I then decided to give DaVinci Resolve a try, but I found the interface to be less than intuitive to me even after watching several videos on the subject so back to AAV Color Lab it was.

Around that time I really started to get back into Photography and had a picture that needed some major color correction and it was then that I discovered Selective Color in Photoshop CS5. I looked and looked for a video equivalent to that but didn’t find anything for years until I found that somebody added the same logic for selective color that Photoshop uses to FFmpeg. FFmpeg is my bag and after screwing around with the filter I finally found a workflow that sucks but works consistently.

The things that are most important to me when I am attempting to unscrew that horrible teal and orange color grading, which is often cyan and orange color grading, is to make sure that white remains white, grey remains grey, and black remains black. This also includes skin color and green trees and grass. If things are done properly skin color will return to normal and the overbearing teal/cyan that is slathered over the screen brings the original colors back to what they were, or at least close to what they might have been.

Now don’t get me wrong. There are a lot of things done right with color grading such as the Matrix series of movies and others like “Because of Winn Dixie” which have obvious color grading but help bring the movie to life rather than perform second degree assault on a person’s retinas. What annoys me to no end is when a perfectly good movie gets “the treatment” by somebody who thought the original movie needed a bit of help when put on DVD or Blu-ray. Don’t believe me? Take a look at the differences between “The Alien Legacy” on DVD and the “Alien Anthology” on Blu-ray. The DVDs in “The Alien Legacy” release didn’t have a lot of color grading on them outside some mild low, midrange, and high adjustments, but the “Alien Anthology” was broken so hard I had to fix several sections by hand in Sony Vegas.

For example, the scene where John Hurt’s character Kane descended into the the cave filled with eggs was missing most of the eggs because they crushed the blacks. After a lot of work dicking around with gamma and levels I was able to get most of the eggs back that were present in the DVD which I used for reference. Once done I was able to do a single pass with FFmpeg to unscrew the damage that was done. It now looks very similar to, but not exactly like, the DVD version.

Now how do I find a non horrible setting for color grading? I take a screenshot of the movie where the color grading sucks, pop it into Adobe Photoshop, and start tweaking Selective Color until it sucks less. In some instances it can all come together in ten screenshots or less, however in other instances I’ll have to go across an entire series to find the right global settings. To date every single movie series I have attempted to decrease the sucking wound that is teal/cyan and orange color grading and have used the same settings for multiple series of movies except the Alien series.

Below are a few examples from my FFmpeg script that may provide insight to what this looks like.

Alien 1979:
-vf selectivecolor=reds=0 -0.20 -0.20 0:yellows=0 0 -0.20 0.10:cyans=-0.66 -0.50 0.20 0.75:blues=0 0 -0.50 0.15

-vf selectivecolor=cyans=-0.33 0.45 0.33 -0.15

-vf selectivecolor=reds=0 -0.15 -0.15 0:yellows=0 0 -0.20 0.10:cyans=-0.33 0.25 0.33 -0.15

DC Extended Universe:
-vf selectivecolor=reds=0 -0.15 -0.15 0:yellows=0 0 -0.2 0.1:cyans=-0.33 0.33 0.33 -0.20

Harry Potter series:
-vf selectivecolor=cyans=-0.33 0.33 0.66 -0.2:greens=0.15 0.15 -0.15 0

Lord of the Rings and The Hobbit:
-vf selectivecolor==reds=0 -0.15 -0.15 0.15:yellows=0 0 -0.2 0:greens=-0.25 0.25 0 -0.15:cyans=0 0.50 0.50 -0.33

Mavel Cinematic Universe (CMU):
-vf selectivecolor=reds=0 -0.2 -0.2 0.1:yellows=0 0 -0.2 0.05:cyans=-0.50 0.50 0.50 -0.30

-vf selectivecolor=reds=0 -0.1 -0.1 0.1:yellows=0 0 -0.1 0.05:cyans=0 0.1 0.1 -0.05

Using selective color in FFmpeg causes rendering on my current PC to slow down for each color that is modified as reflected by my overall CPU usage. I currently believe that this is a memory bandwidth issue on my machine but will not know until I upgrade to something with a bit more power.

How to extract and convert closed caption files the hard way.

I will be using the terms “closed captions” and “subtitles” interchangeably in this post because it isn’t always possible to know if the source binary image based SUP file you have has either closed captions, which include both descriptive text and dialog, or subtitles, which contain only dialog, in them.

I’ve been watching a few TV shows on Hulu and believe that at least Cloak & Dagger as well as Stichers had their subtitles ripped from an m2ts file using either HdBr Stream Extractor v9 or MeGUI, likely from Blu-ray, and converted using Subtitle Edit. How do I believe that this is the case?

More often than not I will see a sentence that is in italics that will have two or more words touching each other near the middle of the sentence because the distance between the letters in pixels is much smaller than normal letters. Why it doesn’t happen as often across the entire sentence is beyond me at this time, then again I have a massive replace list. Ten and eleven pixels work well for most Blu-ray content. Subtitle Edit likes DVD subtitles to be around 6-8 pixels apart because the letters are lower resolution. Your mileage will vary.

You can adjust Subtitle Edit to look for letters closer together or further apart based on the number of pixels you tell it are in a space, but this is a global setting for each input file and cannot be modified specifically to adjust for italics because everything is in an image based format, specifically a SUP file. For example if you modify it to look for letters/blocks closer together then you will likely have a lot of individual characters instead of words. If you modify it to have letters/blocks further apart you will merge a lot of words together.

Thiscanbea badthing. I t c a n a l s o b e a b a d t h in g.

Subtitle edit has two methods for OCR.
1) Tesseract. This does a decent job but I no longer use it as it has problems with some fonts and italics.

2) Binary Image Compare. Blu-ray MPEG-TS and DVD’s MPEG-PS containers use images for playback of video closed captioning. This is what I use and what I think that Hulu also uses. It is also the recommended option for Subtitle Edit.

Binary Image Compare has to be trained to look at the many different fonts that you can come across right down to the letter, number, punctuation, and symbol level. The process to teach it what each letter looks like, typically multiple times for the same exact letter early on, is onerous and will crush your soul. When it comes across a “block” of information it asks you what it is and if it is italic or not. You can expand the block to fit quotes and the like, but you cannot shrink it from what it originally detects. Sometimes it will detect “rt” as a single block so you have to add “rt” as a letter. This is most common with italics, but depending on the font it can also affect normal letters and numbers.

When Subtitle Edit comes across a word it doesn’t know it will ask you to do one of a few things.

A) Add to names/noise list (case sensitive)
This is for things like Hogwarts or WebRTC.

B) Add to user dictionary
This is for adding words that are case insensitive that are not in it’s default dictionary.

C) Add pair to OCR replace list
A lower case L looks the same as an upper case “I” in most sans-serif fonts. You will need to use this to fix things like.
Iower (with uppercase i's)
lower (with lower case L's)

This can also fix words that are too close together.
This can be a

D) Google it.

Best practice is to have Subtitle Edit just rack up the words it doesn’t know so you can bang the majority of the duplicates out after the first full run. Also set the Max. error% value to 1.0 percent for higher accuracy. Run again to catch some more and then run it until there is nothing left to fix. Don’t be surprised if a few more words pop up on the second or third pass.

I’ve added a lot of characters and words to the database from multiple TV series and movies as some of them use unique fonts and have both unique words and names in them. I use the website frequently because what Subtitle Edit provides in it’s interface is very limited.

In some cases Subtitle Edit will fail to show a letter or will detect it incorrectly. If you see this then you can simply click on the line of text that has the problem in the main window, navigate to the specific character, and update it accordingly. I do not recommend updating an I to be an l or an l to be an I. If you do that you will be playing whack-a-mole forever. Set it, forget it, and add it to the “Add pair to OCR replace list” on the fly as it fails onward.

Do not be surprised if your subtitles are not properly aligned. I currently use either Easy Subtitles Synchronizer or Subtitle Edit to fix this problem depending upon my mood. In a few rare cases I had to tear down the text based subtitle that Subtitle Edit created, remove the portions that didn’t line up at all, and then add them back by hand. Always make a backup before editing. Your mileage may vary.

And last but not least, don’t forget punctuation, specifically when it is in an MPEG-PS VOB file from a DVD. A buddy of mine who gives lectures on advanced SED usage helped me with this almost indecipherable, at least to me, SED filter because regex in Subtitle Edit is insufficient for my needs.

s/\([[:alpha:]]\) ,/\1,/g
s/\([[:alpha:]]\) ,\([[:alpha:]]\)/\1'\2/g
s/\([[:alpha:]]\)' \([[:alpha:]]\)/\1'\2/g
s/\([[:alnum:]]\) \./\1./g
s/ , /, /g
s/ . /. /g

Subtitle Edit, in my experience, is not suitable for automation and requires a lot of hand’s on work to get things right. If Hulu is using Subtitle Edit then I feel that they either likely don’t know better, assume that subtitles that they receive are perfect, or they don’t give a shit. I’m not sure which one is worse.

Please don’t take my word about subtitle automation being sub-optimal. Give the following a whirl and compare it against what you created via Subtitle Edit’s GUI.

"C:\Program Files\Subtitle Edit\SubtitleEdit" /convert "inputfile.sup" SubRip


FFmpeg and how to use it wrong


I’ve been in the streaming media industry since 2008 and have seen a lot of misinformation regarding both FFmpeg and libx264. In this post I hope to help shed some light on what does and does not work.

Streaming media, at it’s core, requires three basic things.
1) Constant frame rate.
2) An even keyframe distance which is also known as a Group of Pictures or GOP.
3) A bitrate based encode.

Things that are nice to have are.
4) Finding a better bitrate for your content.
5) Hitting your target bitrate.
6) Audio encoding without A/V drift.
7) Proper encoding for your target audiences.

I have a basic rule when encoding content and that is to never trust the input. Are you sure that the frame rate is constant? Were you told that the content is progressive and not interlaced? Were you given information about keyframe distance or what color space the video is in? Can you trust that any of that is accurate? I can’t and you shouldn’t.

Section one – Constant frame rate

Constant frame rate is important because players like to have the PTS/DST timestamps they are decoding generated like clockwork. If they are not in the correct order you can have playback problems like content jumping forwards, backwards, and even possible problems with basic playback. To achieve proper playback with FFmpeg you need to use two options.

-r is used to specify the output frame rate[1]. This must be the same as the input frame rate to eliminate judder. This is used in conjunction with the -vsync parameter using the 1 option which will retime the PTS/DTS timestamps accordingly[2]. Depending upon the content you get do not be surprised if frames are duplicated and/or dropped during encoding. If that happens then if possible contact the content creator and ask them to fix their source content. It is not uncommon for FFmpeg to duplicate the first frame.

Section two – Keyframe distance

Ensuring that your keyframe distance is always the same you can use the -g parameter[3]. I go a bit beyond what is required for regular desktop playback and use the no-scenecut option in conjunction with the -g parameter. x264 will, by default, create a keyframe when it detects a scene change. It will also set the default maximum GOP value to 250 and the minimum GOP value to 25. Using the no-scenecut option will turn off scene detection for that codec. Setting --scenecut -1 is not a valid option or if it is I have found it nowhere in either x264’s or FFmpeg’s documentation.

Note that NTSC content is stupid. 23.976fps is inaccurate and should always be written as 24000/1001. 29.970fps should always be written as 30000/1001. 59.94fps should always be written as 60000/1001. The examples below are inaccurate on purpose for ease of reading.

If you inspect an output file with MediaInfo and did not use the no-scenecut option you will see scenecut=40. When done properly that will be zero scenecut=0. If this option is not used then keyframes will be misaligned for ABR content and segment sizes will be unpredictable.

You can also use the FFmpeg -sc_threshold 0 parameter to disable scene detection and is video codec neutral. This is equivalent to the no-scenecut option provided by libx264.

Section three – bitrate

I have seen people attempt to create VOD content and perform live streaming using Constant Rate Factor which is also known as CRF. If you do not specify a bitrate for x264 then it will default to CRF 23. If you do not specify a preset it will default to medium. If you do not specify a profile it will default to high.

I like to make sure that my content uses all of the bells and whistles for delivering bitrate based content including -buffsize, -maxrate, and even -minrate. Note that -minrate has no effect with x264. It is in my script in case I decide to use a different codec and it supports a minimum bitrate. [Note to self: update script to use -sc_threshold 0 ]

Section four – finding a better bitrate.

If you want to take the guesswork out of finding a better bitrate[4] then it is best to analyze the file to find one[5]. I now use CRF 23 to find a better global bitrate for whatever I am encoding and make sure to use the same encoding settings as my output file with the exception that I use the veryfast preset and the baseline profile. This is vital to finding a better bitrate.

Section five – Hitting your target bitrate.

Below is a sample of my script that I use for two pass encoding[6]. Note that almost everything in the script is a variable. Those values are inserted after the media is analyzed and a better bitrate is detected as described in the bitrate detection section above. I also perform audio conversion separate from video conversion as encoding audio at the same time can slow down this process.
ffmpeg -i $inputfile $scan -pix_fmt $colorspace -vf "crop=$w1:$h1:$x1:$y1,scale=$fixedwidth:$fixedheight" -vsync 1 -sn -map $vtrack -r $fps -threads 0 -vcodec libx264 -b:v:$vtrack $averagevideobitrate -bufsize $buffer -maxrate $maximumvideobitrate -minrate $minimumvideobitrate -an -pass 1 -preset $newpreset -profile:v $defaultprofile -g $gop $tune -x264opts no-scenecut -map_metadata -1 -f mp4 -y $outputfile-video.mp4

ffmpeg -i $inputfile $scan -pix_fmt $colorspace -vf "crop=$w1:$h1:$x1:$y1,scale=$fixedwidth:$fixedheight" -vsync 1 -sn -map $vtrack -r $fps -threads 0 -vcodec libx264 -b:v:$vtrack $averagevideobitrate -bufsize $buffer -maxrate $maximumvideobitrate -minrate $minimumvideobitrate -an -pass 2 -preset $newpreset -profile:v $defaultprofile -g $gop $tune -x264opts no-scenecut -map_metadata -1 -f mp4 -y $outputfile-video.mp4

The same values are used for the second pass to ensure that target bitrate is hit. If you do not use the same parameters in both passes then you will always miss your target bitrate.

This is an example of bad two pass encoding where different values are used, between the two passes neither frame rate nor GOP are defined, and your PTS/DTS timestamps will be the same as the input. You will never hit your target bitrate using this method.
ffmpeg -y -i 1080p-input.mp4 -c:v libx264 -b:v 5000k -pass 1 -f mp4 NUL && \
ffmpeg -i 1080p-input.mp4 -c:v libx264 -b:v 5000k -maxrate 5000k -bufsize 5000k -pass 2 1080p-output.mp4


Never reuse your first pass analysis when creating Adaptive Bitrate (ABR) content. Ever.
ffmpeg -y -i 1080p-input.mp4 -c:v libx264 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 3000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 1 -f mp4 NUL && \

ffmpeg -i 1080p-input.mp4 -c:v libx264 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 3000k -maxrate 3300k -bufsize 3000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 1080p-output.mp4

ffmpeg -i 1080p-input.mp4 -c:v libx264 -s 1280x720 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 1500k -maxrate 1650k -bufsize 1500k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 720p_output.mp4

ffmpeg -i 1080p-input.mp4 -c:v libx264 -s 640x360 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 1000k -maxrate 1100k -bufsize 1000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 360p-output.mp4

My personal experience using -b_strategy 2 did not work out so well and actually lowered the quality of my content. Your mileage may vary. Using -bf 3 will force three B-frames to be used. This is the default in the medium preset. In addition the medium preset uses three reference frames for content. This is easy for today’s players to decode.

This is two pass encoding done right while also converting audio to stereo AAC. I include the pix_fmt yuv420p color space because if you convert a piece of content that has, say, an incompatible color space (See also desktop Windows Media content) or is using the color range of computer RGB (0-255) and not broadcast RGB (16-235), then your H.264 video may not play back as expected.
ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -vsync 1 -vcodec libx264 -r 23.976 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v high -tune film -g 48 -x264opts no-scenecut -pass 1 -acodec aac -b:a 192k -ac 2 -ar 48000 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4

ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -vsync 1 -vcodec libx264 -r 23.976 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v high -tune film -g 48 -x264opts no-scenecut -pass 2 -acodec aac -b:a 192k -ac 2 -ar 48000 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4

Note that I add the audio bitrate to the video bitrate to calculate the bufsize value. I also multiply the target bitrate by 1.25 for the maxrate value. Why? This provides the encoder the liberty to allocate less data to low motion scenes and more data to higher action scenes. If you were to use a 10x value for your maxrate value the network signature would look a lot like CRF but boy will your content look great. I do not recommend this.

This brings up a few questions regarding quality, compression, and two pass encoding.

1) Is there a visible difference in output between CRF 23 when using the veryfast preset and the baseline profile versus the medium preset and the high profile?

You would think that as CRF 23 is being used that both output videos would be the same quality. Unfortunately this does not appear to be the case and Moscow State Universities’ Video Quality Metric Tool confirms this when analyzing the two output files via SSIM. I would use the Netflix VMAF tool but it is segfaulting as of the time I am writing this if it is both included with FFmpeg and is analyzing content via SSIM.

On a side note B-frames are killing me with regards to quality. While the output size of the high profile is smaller so too is it’s bit per pixel density. I interpret this as both good and bad. Compression, at least in this case, costs quality but it does make the file smaller.

Veryfast preset with the baseline profile using CRF 23:

Size      == 888 MiB
"Bitrate" == 1023 kb/s
BPP       == 0.104

Medium preset with the high profile using CRF 23:

Size      == 833 MiB
"Bitrate" == 958 kb/s
BPP       == 0.098

This lowered SSIM quality to an average of 0.97495 between the two files.

2) Does using the 1080p two pass mbtree[7] file and it’s two pass log file for encoding other pieces of content degrade quality?

Yes. Now why did I run that test? Because a lot of people reuse the first pass files for their other outputs in their ABR stack as shown earlier in this article. I have never agreed with that so I did a direct compare with a properly encoded two pass file and then used the 1080p mbtree and log file to output a 480p file.

To bring this home let us take a look at what happens when you use the proper two pass log files versus what happens when you use the wrong ones. In this instance the source content was 1080p and was used to generate the ffmpeg2pass-0.log.mbtree file and the ffmpeg2pass-0.log file. A second two pass encode was used to create a 480p output from the same 1080p source just like you would do when creating ABR content.

The two pass log file size for the 1080p mbtree file weighed in at 1.40GB while the 480p mbtree file weighed in at 287MB.

This is the second pass of the 480p output using the proper mbtree and log files.

frame=174434 fps=104 q=-1.0 Lsize= 907956kB time=02:01:15.23 bitrate=1022.4kbits/s dup=0 drop=1 speed=4.32x
video:905937kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.222847%
[libx264 @ 000000000305b2e0] frame I:3635 Avg QP:18.51 size: 31828
[libx264 @ 000000000305b2e0] frame P:47427 Avg QP:22.94 size: 9521
[libx264 @ 000000000305b2e0] frame B:123372 Avg QP:24.57 size: 2922
[libx264 @ 000000000305b2e0] consecutive B-frames: 4.3% 1.9% 7.1% 86.7%
[libx264 @ 000000000305b2e0] mb I I16..4: 23.8% 52.0% 24.2%
[libx264 @ 000000000305b2e0] mb P I16..4: 2.1% 8.6% 2.7% P16..4: 32.3% 12.3% 8.0% 0.0% 0.0% skip:34.1%
[libx264 @ 000000000305b2e0] mb B I16..4: 0.1% 0.7% 0.2% B16..8: 35.7% 4.2% 0.9% direct: 2.0% skip:56.2% L0:44.0% L1:49.2% BI: 6.9%
[libx264 @ 000000000305b2e0] 8x8 transform intra:60.9% inter:72.3%
[libx264 @ 000000000305b2e0] coded y,uvDC,uvAC intra: 65.0% 60.8% 30.4% inter: 16.0% 14.5% 0.5%
[libx264 @ 000000000305b2e0] i16 v,h,dc,p: 44% 30% 10% 16%
[libx264 @ 000000000305b2e0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 20% 18% 6% 7% 8% 8% 7% 8%
[libx264 @ 000000000305b2e0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 25% 15% 5% 7% 7% 7% 5% 6%
[libx264 @ 000000000305b2e0] i8c dc,h,v,p: 58% 21% 16% 5%
[libx264 @ 000000000305b2e0] Weighted P-Frames: Y:13.2% UV:3.7%
[libx264 @ 000000000305b2e0] ref P L0: 52.5% 16.0% 20.8% 9.4% 1.4%
[libx264 @ 000000000305b2e0] ref B L0: 84.5% 12.2% 3.3%
[libx264 @ 000000000305b2e0] ref B L1: 94.6% 5.4%
[libx264 @ 000000000305b2e0] kb/s:1020.08

This is the second pass of the 480p output using incorrect 1080p mbtree and log files.

frame=174434 fps=106 q=-1.0 Lsize= 907905kB time=02:01:15.23 bitrate=1022.3kbits/s dup=0 drop=1 speed=4.41x
video:905883kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.223274%
[libx264 @ 00000000026b00a0] frame I:3635 Avg QP:20.20 size: 26166
[libx264 @ 00000000026b00a0] frame P:46748 Avg QP:22.62 size: 9715
[libx264 @ 00000000026b00a0] frame B:124051 Avg QP:24.55 size: 3050
[libx264 @ 00000000026b00a0] consecutive B-frames: 4.0% 1.4% 6.8% 87.8%
[libx264 @ 00000000026b00a0] mb I I16..4: 22.9% 52.5% 24.5%
[libx264 @ 00000000026b00a0] mb P I16..4: 2.3% 9.0% 3.2% P16..4: 31.0% 11.7% 7.4% 0.0% 0.0% skip:35.5%
[libx264 @ 00000000026b00a0] mb B I16..4: 0.1% 0.7% 0.2% B16..8: 33.7% 4.3% 0.9% direct: 2.4% skip:57.7% L0:41.7% L1:50.1% BI: 8.2%
[libx264 @ 00000000026b00a0] 8x8 transform intra:60.0% inter:69.7%
[libx264 @ 00000000026b00a0] coded y,uvDC,uvAC intra: 62.2% 59.8% 30.0% inter: 15.6% 14.1% 0.8%
[libx264 @ 00000000026b00a0] i16 v,h,dc,p: 41% 28% 11% 20%
[libx264 @ 00000000026b00a0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 21% 17% 6% 7% 8% 8% 7% 8%
[libx264 @ 00000000026b00a0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 28% 16% 5% 7% 7% 7% 5% 5%
[libx264 @ 00000000026b00a0] i8c dc,h,v,p: 59% 21% 16% 5%
[libx264 @ 00000000026b00a0] Weighted P-Frames: Y:8.8% UV:2.1%
[libx264 @ 00000000026b00a0] ref P L0: 52.2% 16.4% 21.2% 9.4% 0.9%
[libx264 @ 00000000026b00a0] ref B L0: 84.3% 12.3% 3.3%
[libx264 @ 00000000026b00a0] ref B L1: 94.6% 5.4%
[libx264 @ 00000000026b00a0] kb/s:1020.02

Take note that a lot of data was pulled out of the I frames and allocated to P and B frames when the 1080p ffmpeg2pass-0.log.mbtree file and the ffmpeg2pass-0.log file were used instead of the ones that were generated for the output 480p content. This lowered SSIM quality to an average of 0.97553 when using the wrong two pass files. Viewing some of the frame differences in MSU VQMT made my eyes hurt.

Did you notice that the bit rate of the 480p file that was encoded using the veryfast preset, the baseline profile, and CRF 23 is very close (within the bounds of CAVLC and CAVAC entropy encoding) to the bitrate of the two pass encode that used the medium preset and the high profile? Two pass encoding puts the bits back.

Section six – Audio

In the audio portion of the two lines above you will see a few filters[8].

-af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" helps to keep your audio lined up with the beginning of your video. It is common for a container to have the beginning of the video and the beginning of the audio start at different points. By using this your container should have little to no audio drift or offset as it will pad the audio with silence or trim audio with negative PTS timestamps if the audio does not actually start at the beginning of the video.

Section seven – Proper encoding for your target audiences.

During my nine years in the streaming media industry I have seen companies like Research in Motion state emphatically that their Blackberry phones do not support the high profile. Just because a manufacturer states that specific profiles are not supported does not mean that they won’t work. RealNetwork’s Helix Producer would create, by default, content using the high profile. I had never had a problem delivering high profile content to those phones via RTSP.

Generally speaking I limit reference frames to three for compatibility purposes. If you decide to go above that make sure that you research which device or devices support larger reference frame distances. Note that using the animation tuning option for x264 will double your reference frames unless it is set to one[9].

Let’s finish off with one final example using single pass encoding.[10].

ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -deinterlace -vf "scale=640:360" -vsync 1 -vcodec libx264 -r 29.970 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v main -tune film -g 60 -x264opts no-scenecut -acodec aac -b:a 192k -ac 2 -ar 44100 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4

I hope that this article helps to debunk misinformation that is rampant on the Internet regarding the usage of FFmpeg and what x264 options are valid to create compliant streaming media VOD content.


[1] Force the output frame rate using the -r parameter.

[2] Vsync parameter for FFmpeg.

[3] The -g option is described below.

[4] The articles below have several recommendations on bitrate.

[5] I have an article on how to calculate a better bitrate.

The article above is featured in an article by Jan Ozer. It is also referenced in his book Video Encoding by the Numbers in Chapter 7: Choosing Data Rate.

[6] Please note that both the first pass and the second pass are identical sans the portion that identifies which pass is being used.

[7] Detailed information about the macroblock tree file can be found in the following forum by the person who designed it.

[8] Audio references can be found below.


async, min_hard_comp, and first_pts.

[9] x264 tuning values.

[10] Encoding options not included in the article above or insufficiently detailed.

Video options:

-i is for designating the input.

-deinterlace should only be used if your content is interlaced and is announced as either interlaced or MBAFF. It is recommended to deliver only progressive content to web based players.

-vf "scale=640:360" is a video filter that will scale the output video to a different resolution.

-vcodec libx264 specifies the x264 video codec. You can substitute -c:v for -vcodec if you wish.

-b:v: 1024k specifies a video bitrate of 1024kbps.

-bufsize 1216k specifies the buffer. This is a best practice for RTSP delivery and streaming media in general.

-maxrate 1280k specifies the maximum bitrate allowed.

-preset veryfast is one of several presets available for H.264 video. Those include ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo. It is disrecommended to use anything higher than the medium preset for streaming media.

-profile:v baseline is one of several profiles available for H.264 video. Those include baseline, main, high, high10, high422, and high444. Hardware devices, specifically older mobile phones, rarely state support for any of the high profile options even though they may work. You should include the :v portion at the end of the profile to specify that the profile is for video as some audio codecs also have audio profiles.

Note that x264 has eradicated the extended profile.

Additional detail on the inner workings of presets, please reference the following page.

-tune film is one of several tuning options available for H.264 video. Those include animation, grain, stillimage, psnr, ssim, fastdecode, zerolatency. Animation should not be used with streaming media as it will double the number of reference frames defined in in the preset option.

More on preset, profile and tuning can be found here.

and here.

The libx264 option ratetol=0.01 will force a very strict constant bitrate, so much so that libx264 will complain and adjust accordingly. This is optional and not shown above as constant bitrate content is dead to me.

-f mp4 defines that the format will be an MP4 container.

-y outputfile.mp4 will state that if outputfile.mp4 exists that it will be overwritten. This is required if you perform two pass encoding and do not redirect the first pass output to a null device.

Audio options

-acodec aac invokes the use of the internal AAC codec. You can substitute -c:a for -acodec if you wish. You no longer need to use the -strict experimental option with this codec.

-b:a 192k states the the total bitrate of the audio should be 192Kbps. Apple recommends a minimum bitrate of 64Kbps per channel.

-ac 2 forces the audio to be stereo. This is a best practice for streaming media so that you can reach the most players, however you can use additional channels if one or more of your target devices support it.

-ar 44100 forces the frequency to be 44.1k which is compatible with Flash players. The player may downsample audio to 44.1k, 22.05k, or 11.025k. Do not use different audio frequencies with ABR content.

To help translate options between FFmpeg and libx264 please reference the following site.

How to create ABR content with FFmpeg in one pass

I was once informed that it would be nigh impossible to create ABR content in one pass using FFmpeg.  Challenge accepted!

I remembered that statement when I wanted to calculate a bit per pixel density encoding matrix for different video resolutions.  The problem with doing that is that every source video is different, even throughout the entire video, and I did not want to encode multiple different video files multiple times to generate this matrix.  Desperation, being the mother of invention, decided to intervene on my behalf.  Once I figured out how to perform single pass ABR encoding I decided to perform some ABR encoding using CRF 21 to find an approximate bit per pixel density.  As expected this yielded a fair amount of data as each one minute section of the source video I encoded was different.  From that I validated a trend that I had been seeing for a while.

To retain a relatively sane visual quality, the lower the resolution of the video the higher the bit per pixel density should be.  I did some work and came up with some data.  If you run a one minute CRF encode against each minute in your source 1080p video you will be provided with a bit per pixel density matrix.  Using the matrix below, assuming you do not want to make one of your own, you should be able to find an approximate bitrate to use for your other resolutions. Please note that the 1080p bit per pixel density shown below is post encode and is not the source file. Best practice for a full encode is to perform a single pass encode against your 1080p source to generate a bit per pixel density that you can then use to encode your ABR content.

RES     BPP      RES     BPP      RES     BPP      RES      BPP
360p    0.277    480p    0.226    720p    0.185    1080p    0.161
360p    0.244    480p    0.207    720p    0.176    1080p    0.155
360p    0.208    480p    0.169    720p    0.139    1080p    0.120
360p    0.215    480p    0.164    720p    0.128    1080p    0.111
360p    0.194    480p    0.158    720p    0.131    1080p    0.117
360p    0.164    480p    0.136    720p    0.115    1080p    0.106
360p    0.136    480p    0.110    720p    0.091    1080p    0.082
360p    0.152    480p    0.117    720p    0.092    1080p    0.080
360p    0.160    480p    0.120    720p    0.093    1080p    0.079
360p    0.134    480p    0.108    720p    0.089    1080p    0.079
360p    0.126    480p    0.100    720p    0.081    1080p    0.071
360p    0.125    480p    0.097    720p    0.078    1080p    0.069
360p    0.118    480p    0.091    720p    0.074    1080p    0.065
360p    0.103    480p    0.084    720p    0.070    1080p    0.065
360p    0.103    480p    0.083    720p    0.068    1080p    0.063
360p    0.110    480p    0.085    720p    0.068    1080p    0.062
360p    0.105    480p    0.082    720p    0.066    1080p    0.061
360p    0.094    480p    0.074    720p    0.061    1080p    0.057
360p    0.100    480p    0.075    720p    0.059    1080p    0.054
360p    0.077    480p    0.062    720p    0.051    1080p    0.049
360p    0.078    480p    0.060    720p    0.050    1080p    0.048
360p    0.077    480p    0.061    720p    0.049    1080p    0.044
360p    0.072    480p    0.055    720p    0.044    1080p    0.041
360p    0.056    480p    0.045    720p    0.038    1080p    0.041
360p    0.063    480p    0.052    720p    0.043    1080p    0.040
360p    0.051    480p    0.042    720p    0.035    1080p    0.038
360p    0.046    480p    0.038    720p    0.033    1080p    0.037

Yes, entropy encoding is interesting.  You probably recognized in the matrix above that the numbers are not as uniform across the resolutions as we would like.  Using that matrix as a guideline I came up with some calculations that I have provided below and did some rounding of the bit per pixel density and then rounded the bitrate.

1920*1080*23.976/1024*0.070  ==  3398.598kbps
1280*720*23.976/1024*0.080   ==  1726.272kbps
854*480*23.976/1024*0.100    ==  959.78925kbps
480*360*23.976/1024*0.125    ==  505.74375kbps
426*240*23.976/1024*0.133    ==  318.38254875kbps
284*160*23.976/1024*0.150    ==  159.59025kbps

With all of that said, 1080p comes out to about 3400kbps, 720p comes out to about 1725kbps, 480p comes out to about 960kbps, 360p comes out to about, 510kbps, 240p comes out to about 320kbps, and 160p comes out to about 160kbps.

Now, how did I do it?  With likely the longest FFmpeg command line I have ever assembled.  I have broken out the different outputs to their own separate lines for readability.

ffmpeg.exe -i sourcefile.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=1920:1080" -b:v 3400k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 1080p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=1280:720" -b:v 1725k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 720p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=854:480" -b:v 960k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 480p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=480:360" -b:v 510k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 360p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=426:240" -b:v 320k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 240p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=284:160" -b:v 160k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 160p.mp4

Or the long version if you prefer:

ffmpeg.exe -i sourcefile.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=1920:1080” -b:v 3400k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 1080p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=1280:720” -b:v 1725k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 720p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=854:480” -b:v 960k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 480p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=480:360” -b:v 510k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 360p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=426:240” -b:v 320k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 240p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=284:160” -b:v 160k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 160p.mp4 -strict experimental -acodec aac -vn -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -f mp4 AudioOnly.mp4

Now go forth and streamline your production environment, unless you plan on performing two pass encoding then you are on your own, or not if you build your own bit per pixel density encoding matrix from your unique and varied content.

How to get a live u-Law WAV stream to Cisco VOIP servers (Updated 2017-07-23)

I’m probably going to get some of the specifics on the Cisco VOIP server a bit wrong, but the following is what I remember when deconstructing multiple customer descriptions over the years who did not yet know how to set up their Cisco VOIP server with on hold audio. As scary as it may seem I think that I was better at setting up the on hold music using Helix Server than anyone working at Cisco was or any of their customers were even though Cisco, for many years, recommended Helix Server until it was discontinued. Then again it was my job to know these sorts of things.

In a typical scenario a Cisco engineer and one of their customers would get on a phone call with me on how to configure Helix Server for streaming their on hold audio. For everything else Cisco I am currently a knuckle dragging troglodyte.

When I was working at RealNetworks supporting Helix Server we had a high volume of customers using Cisco VOIP phone systems and they all needed two things:

1) Looped on demand u-Law WAV files. Helix Server supported this using it’s Simulated Live Transfer Agent (SLTA).

The Cisco VOIP server has the ability for you to upload audio files, which then get converted to u-Law WAV files, for when a customer is on hold in a cost center (customer service, billing, legal, etc…) so that they can have customized music or advertisements for a product the customer might be interested in. I was never fully sure why they used SLTA for this unless they had a super cheap Cisco server that didn’t have that function, if they had more cost centers than the Cisco server supported, or even worse if they didn’t know that they had that functionality on their Cisco server.

Cisco did not seem to have any really good documentation on how to make the u-Law WAV files you needed for SLTA, but this forum post works. Sadly FFmpeg, my encoding tool of choice, supports looping images only. There is currently no audio equivalent that I know of.

Loop over the input stream. Currently it works only for image streams. This option is used for automatic FFserver testing. This option is deprecated, use -loop 1.

I have a few theoretical hacks to get looped content working, but they are both difficult to set up and are also very unstable. In other words they are not ready for deployment in an enterprise environment that demands high uptime. If I find a free solution that is stable I will post it here.

2) Live u-Law WAV stream. Helix Server did not support this. I performed extensive and exhaustive testing and it was unable to properly repacketize an incoming live u-Law stream to either unicast RTSP or multicast SDP no matter the input method. I was hoping that this would get fixed, however our group was laid off before that could happen.

Cisco used to have an audio capture card in their hardware VOIP servers that customers would use to pipe their satellite Muzak feed into (stereo or mono din input if I remember correctly), but that was discontinued because now they apparently only provide an image that goes into a VM so there is no capture card. Their customers had to settle for SLTA.

With that said I can provide option number two for companies that have a Cisco VOIP server that they use. For a proper live u-Law WAV delivery configuration you need to know a few things:

A) A little bit about FFmpeg or at least a willingness to learn. You can download  already compiled versions for Windows over at Zeranoe’s website. If you are on Linux you can either compile FFmpeg yourself or head to the FFmpeg website itself for some static Linux builds.

B) How to create a u-Law file for testing a pseudo live feed.

C) How to create a multicast SDP file using FFmpeg using the u-Law file created above.

D) How to modify the multicast SDP file to work with VLC as a player. This is the first test to see if you have things right for live streaming.

E) How to connect via DirectShow on Windows or ALSA on Linux to an audio source. This is the final step in testing that your device works. If you start here then you may never really know if your device is working, if the SDP file is working, or if you even created the output correctly.

You will learn all of the above in this article.

I just finished converting an MP3 file to u-Law using the following command line:

ffmpeg -i in.mp3 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f wav -y out.wav

You can deliver a pseudo live non looping feed of that file:

ffmpeg -re -i out.wav -f rtp rtp://

You can also use the source file if you want:

ffmpeg -re -i in.mp3 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

FFmpeg is nice in that it dumps the SDP information for the RTP stream to the command prompt even though no SDP file is created:

o=- 0 0 IN IP4
s=Your File Metadata
c=IN IP4
t=0 0
a=tool:libavformat 57.23.100
m=audio 9008 RTP/AVP 0

Sadly connecting to that SDP output occasionally stutters then cuts out when listening to the stream with either VLC, QuickTime for the PC, or RealPlayer. If you read through all of the RFCs you might get an idea of the complexity of the RTP/SDP specifications.

Or not. I don’t know about you but reading those RFCs puts me right to sleep. A slightly easier to digest article on SDP structure can be found here. The article is in regards as to why your technically 100 percent compliant SDP file doesn’t work with Helix Server. To use your SDP file with that server you are required to add optional flags.

Sadly the mostly working SDP file that FFmpeg creates is missing one important item:

a=rtpmap:0 PCMU/9008/1

The “rtpmap” attribute is used to connect or map the audio that is defined in the “m” or “media” section to the network RTP output as well as define the codec (payload type) and the number of audio channels in use if it is an audio stream. This is sort of important for devices, players, or receivers to know what to listen for and how to decode it, especially when there may be two or more streams described in the SDP file.

Playing that modified SDP file fixes everything, at least for VLC and QuickTime:

o=- 0 0 IN IP4
s=Your File Metadata
c=IN IP4
t=0 0
a=tool:libavformat 55.0.100
m=audio 21414 RTP/AVP 0
a=rtpmap:0 PCMU/9008/1

Please note that if you have multiple live streams running that you need to have each SDP file and each encoder configured to use a different port number for the audio. I make sure increase the port number by two in each SDP. For example 21414, 21416, 21418, etc…

Now that you have something that works with a file let us now try with a live source. On Windows you will need to have FFmpeg connect via DirectShow. To find the list of DirectShow devices on your computer the command line shown below will help

ffmpeg -list_devices true -f dshow -i dummy

Now feel free to try it with your audio device.

ffmpeg -f dshow -i audio=”Microphone (HD Pro Webcam C920)” -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

The line above works well for me, especially as FFmpeg now supports crossbar devices.

If you are on Linux you may want to use ALSA to connect to your live feed, but again you need to find the device you want to use first. This will show you the ALSA devices your system has:

$ arecord -L

$ ffmpeg -f alsa -i default:CARD=U0x46d0x809 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

On a side note you will probably want to host your SDP file or files on a robust web server or perhaps even behind a load balancer. From the logs that I have parsed over the years the Cisco VOIP server retrieves to the SDP file every time a person was put back into the queue. The highest number of connections I recall seeing was around 3,000 per second ,so people that have to support a high volume call center or a large corporation should make themselves well prepared for this behavior by putting up a web server dedicated to delivery their SDP files or several web servers behind a load balancer.

The only way this DDoS effect could be either mitigated or resolved is if the Cisco VOIP server was modified to grab the multicast information in the SDP file, retain it for use among the clients, and then check the multicast SDP file every minute or so in case the structure of the audio feed changed or was updated along with the associated SDP file. Frankly I just don’t see that happening.

And for those few who are interested in what a Scalable Multicast u-Law SDP file that is generated from Helix Server looks like, or for some reason the SDP file format I describe above doesn’t work for you, then look no further than the output below:

o=- 275648743 275648743 IN IP4
s=War Pigs+Luke’s Wall
i=Black Sabbath
c=IN IP4
t=0 0
a=ASMRuleBook:string;”#($Bandwidth >= 0),Stream0Bandwidth = 64000;”
m=audio 21414 RTP/AVP 0
c=IN IP4
a=rtpmap:0 PCMU/8000/1
a=ASMRuleBook:string;”marker=0, AverageBandwidth=64000, Priority=9, timestampdelivery=true;”

2017-07-23 Update:
If you are wanting to deliver your live stream through a streaming server and have your Cisco server pick up an RTSP feed from that streaming server instead then please take a look at the following command. This method is both easier and more reliable than the direct SDP method shown above.

$ ffmpeg -f dshow -i audio=”Microphone (HD Pro Webcam C920)” -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtsp rtsp://username:password@[server_address]:[port]/live/audiostream

Dear Netflix

As long as you are busy re-encoding your content, can you please fix Star Trek: Voyager? It makes my eyes bleed.

The method that I use when converting content is to never trust what you have been told by the content provider, but to instead analyze every piece of content that is to be converted even if it is in the same series from the same publisher using the same media type.

I use the command line version of MediaInfo and some output from FFmpeg to get things done. I prefer Bash shell scripting as it is what I am most familiar with.

Get your information from MediaInfo:
Mediainfo $inputfile > info.tmp

Capture the frames per second from the video:
fps=$(cat info.tmp | grep Frame | grep [Rr]ate | grep -v [Mm]ode | cut -d “:” -f2 | tr -d ” fps” | head -1)

If the FPS reports as either empty or Variable then force a framerate that works. If I know that the content came from Europe I force it to 25fps whereas if it came from the US I force it to 23.976fps. You may need to review your content post encode to make sure you did not introduce telecine judder.

Check to see if your content is interlaced, progressive, or uses the MBAFF method of interlacing:
scan=$(cat info.tmp | grep “\(Interlaced\|Progressive\|MBAFF\)” | head -1 | cut -d “:” -f2 | tr -d ” “)

If the content is in an MPEG Program Stream container, it reports as being 29.970fps, and does not announce if it is interlaced, progressive, or MBAFF then the content is actually 23.976fps using soft telecine.
if [ “$fps” == “29.970” ] && [ “$scan1” == “” ] && [ “$mpegps” == “MPEG-PS” ] ;
fps=$(echo “23.976”)
scan1=$(echo “Progressive”)

The odds are high that your media group received content from your provider in an MPEG-PS VOB container and did not look for interlaced content.

Detecting everything mentioned above will ensure that fewer frames are being encoded, it eliminates telecine judder, you don’t have to worry about encoding interlacing artifacts, it allows for a more optimized bit per pixel density, and it will help with providing higher video quality for the customer.

In addition, order of operations can be important when encoding content. I always deinterlace content if necessary before I force the detected or overridden FPS, crop the content, resize or scale the content, and then rotate the content. An example from my script is as follows:
ffmpeg -fpsprobesize $gop -i $inputfile -pix_fmt yuv420p $totaltime -vsync 1 -sn -vcodec libx264 -map $vtrack $scan -r $fps -vf “crop=$w1:$h1:$x1:$y1,scale=$fixedwidth:$fixedheight$fixrotation” -threads 0 -b:v:$vtrack $averagevideobitrate -bufsize $buffer -maxrate $maximumvideobitrate -minrate $minimumvideobitrate -strict experimental -acodec aac -map $audio -b:a:$audio $audiobitrate -ac 2 -ar $audiofrequency -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -pass 1 -preset $newpreset -profile:v $defaultprofile -qmin 0 -qmax 63 -keyint_min $minkeyframe -g $gop $newtune -x264opts no-scenecut -map_metadata -1 -f mp4 -y $outputfile

Now go forth and encode.

Intelligent video encoding

I have been saying this for a few years now. Netflix has finally gotten on the bandwagon.

I worked at RealNetworks for over six years and became their onsite encoding expert for creating H.264 video with AAC audio in an MP4 container using FFmpeg after just three years. Our group was laid off when their Helix Streaming Media Server, which I supported, was discontinued.

I have converted most of my Blu-ray and DVD content, including one HD-DVD, to MP4 files and have found, just as the article says, that not all video is created equal. Why? Movement is expensive. In addition, grain is movement. Please do not get me started on encoding artifacts in the source media. NeatVideo, if you know how to use it, can help with both grain and encoding artifacts without having to resort to sharpening. The use of sharpening is, in my opinion, the refuge of the inept unless the source is so low quality that it looks like a blur. Even then use sparingly only if it is absolutely needed. If you want a challenge run NeatVideo against the movie Fight Club.

As an example, encode for yourself both a high action video and some low action video using x264 using a CRF value of 21 with the veryfast preset and the baseline profile. When you are finished use MediaInfo to look at the bit per pixel density (BPP) of the output video. The action video will have a much higher bitrate and BPP density than the low action video. As such you should target what the video requires.

My procedure for finding a decent bitrate is as follows:

1) Encode the video using the veryfast preset and the baseline profile to grab what the bit per pixel density is at CRF 21.

2) Perform a two pass encode with the medium preset and the high444 profile using the BPP value found in the video. You will see that both the initial CRF encoded video and the two pass video are about the same size and have, obviously, the same BPP density. The output “CRF” value, as reported by FFmpeg, will be about 19.4 due to compression. I have covered this before. Don’t take my word for it, use the Moscow University Video Quality Measurement Tool.

The reason for the medium preset is that mobile devices and other hardware decoders (Roku, Apple TV, etc…) all have limitations on playing H.264 video content that has more than three reference frames. To date I have found no device that cannot handle the high444 profile, which prioritizes the luma (Y’) channel over chrominance (Cb Cr) even though manufacturers state that they only support the main profile with CABAC. The only devices that I have not tested were the old school Blackberry phones.

On a side note, use the information that MediaInfo puts out as well as what FFmpeg puts out to find out what the width, height, and FPS of the source is as well as what the source audio frequency and bitrate are. If you know what you are doing you can detect telecine content in MPEG-PS containers (VOB) so that you do not duplicate frames when encoding. In addition, forcing the frame rate to what the source media says it is will keep the framerate solid. Advanced class is performing automatic crop detection (beware “The Right Stuff” and “Tron Legacy”), and audio normalization if your hearing is poor like mine is.

How will this affect your production workflow? If you decide to implement then not much. All that you need to do is perform a test encode to find the BPP density and then have your MBR content encoded to the same BPP density. If you are converting a series do a test convert of a few episodes and find the right bitrate for you.

Extreme encoding settings, quality, and size.

I’ve been meaning do some output quality testing and have finally gotten around to doing so. Because I like to have my content able to be streamed via RTSP, RTMP, and HTTP (HLS or DASH) I encode to bitrate as RTSP can be sensitive to bitrate fluctuation. I do my testing using CRF 21 for consistency of output and speed. To do this testing I used the MSU Video Quality Measurement Tool which will put out bad frames, a spreadsheet, and even a video showing you the differences between one video and another.

My typical encode is done using the medium preset. It uses a distance of three reference frames which is compatible with hardware decoders.[1] I will also encode using the high444 profile which while technically unsupported by mobile phones but does in fact work. To date I have had zero problems with those settings when I tested multiple handsets from multiple manufactures during my time at RealNetworks supporting their former product Helix Server.

When I am going to encode to bitrate I do a first pass using CRF so that I can get a better idea of what the bit per pixel density is but I will encode using the veryfast preset and the baseline profile. When I perform my two pass encoding I encode to the bit per pixel density that the CRF file reported using MediaInfo. If you look at the first pass of a two pass encode it will be a smaller size than the second pass as the second pass puts back the bits lost due to the compression used on the first pass. This behavior got me thinking.

The tests that I just ran were:
1) Encode using the veryfast preset and the baseline profile using CRF 21.

2) Encode using the medium preset, the high444 profile using CRF 21 and the following options:
-x264opts b-adapt=2:direct=auto:me=tesa:subme=11:aq-mode=2:aq-strength=1.0:fast_pskip=0:rc_lookahead=72:partitions=p8x8:trellis=2:weightp=2:merange=64:bframes=8"

I took the files and then remuxed them into AVI as MSUVQMT was having issues with the MP4 container.

ffmpeg -i input.mp4 -vcodec copy -an intput.avi

Note that the input file framerate was 23.976fps and the output framerate became 47.952fps. Did this invalidate my test?[2] Possibly, but MediaInfo only looks at a small part of the video stream. If your video mixes 29.970fps interlaced content with 23.976fps content then it will know nothing of the 23.976fps content later in the video stream. Yes, I have seen this issue happen with several MPEG-PS files.

After remuxing the files and running them through MSUVQMT I was not surprised to see that there were no quality differences between the baseline file and the high444 profile. The SSIM that was reported in the spreadsheet that MSVQMT reported was “AVG: 0.97723”, which I feel is inline with entropy encoding, and the only other thing different was the size of the video stream.

The baseline file, as reported by MediaInfo, is as follows:
ID                                       : 0
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : Baseline@L3.0
Format settings, CABAC                   : No
Format settings, ReFrames                : 1 frame
Codec ID                                 : avc1
Duration                                 : 1mn 0s
Bit rate                                 : 1 459 Kbps
Width                                    : 854 pixels
Height                                   : 322 pixels
Display aspect ratio                     : 2.35:1
Frame rate mode                          : Variable
Frame rate                               : 47.952 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.111
Stream size                              : 10.4 MiB (99%)
Writing library                          : x264 core 142 r2479 dd79a61
Encoding settings                        : cabac=0 / ref=1 / deblock=1:-1:-1 / analyse=0x1:0x111 / me=hex / subme=2 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=0 / me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=0 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=8 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=0 / weightp=0 / keyint=120 / keyint_min=12 / scenecut=40 / intra_refresh=0 / rc_lookahead=10 / rc=crf / mbtree=1 / crf=21.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00

The high444 profile with the extra x264 options looks like this:

ID                                       : 0
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L3.0
Format settings, CABAC                   : Yes
Format settings, ReFrames                : 4 frames
Codec ID                                 : avc1
Duration                                 : 59s 997ms
Bit rate                                 : 1 364 Kbps
Width                                    : 854 pixels
Height                                   : 322 pixels
Display aspect ratio                     : 2.35:1
Frame rate mode                          : Variable
Frame rate                               : 47.952 fps
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.103
Stream size                              : 9.76 MiB (99%)
Writing library                          : x264 core 142 r2479 dd79a61
Encoding settings                        : cabac=1 / ref=3 / deblock=1:-1:-1 / analyse=0x3:0x10 / me=tesa / subme=11 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=64 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=-3 / threads=8 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=8 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=120 / keyint_min=12 / scenecut=40 / intra_refresh=0 / rc_lookahead=72 / rc=crf / mbtree=1 / crf=21.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=2:1.00

Note the Bit Per Pixel density is lower on the more compressed version. This is expected because the video stream is smaller due to higher compression. As noted above the bits are put back and your Bit Per Pixel density is returned to what is expected when using two pass encoding.

What did I learn here? Video quality is directly affected by bitrate while compression merely makes the video stream smaller with no visible increase in quality. With two pass encoding to the target Bit Per Pixel density the quality will be higher at the same bitrate but may have some differences. For example I converted the fight scene from They Live many years ago using two similar bitrate based methods and they did not come out the same. You can see that video on YouTube here.

The question we are left with is how much time do I really want to spend making the file just a bit smaller but at the exact same quality? Me, not that much time.

1) I will always remember that three reference frames are the maximum distance by remembering a scene in Monty Python and the Holy Grail.

…And Saint Attila raised the hand grenade up on high, saying, “O LORD, bless this Thy hand grenade that with it Thou mayest blow Thine enemies to tiny bits, in Thy mercy.” And the LORD did grin and the people did feast upon the lambs and sloths and carp and anchovies and orangutans and breakfast cereals, and fruit bats and large chu… [At this point, the friar is urged by Brother Maynard to “skip a bit, brother”]… And the LORD spake, saying, “First shalt thou take out the Holy Pin, then shalt thou count to three, no more, no less. Three shall be the number thou shalt count, and the number of the counting shall be three. Four shalt thou not count, neither count thou two, excepting that thou then proceed to three. Five is right out. Once the number three, being the third number, be reached, then lobbest thou thy Holy Hand Grenade of Antioch towards thy foe, who being naughty in My sight, shall snuff it.”

2) 23.976 * 2 == 47.952
ffprobe.exe sw4-gout-test-crf-baseline.avi
ffprobe version N-67742-g3f07dd6 Copyright (c) 2007-2014 the FFmpeg developers
built on Nov 16 2014 22:10:05 with gcc 4.9.2 (GCC)
configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-zlib
libavutil      54. 13.100 / 54. 13.100
libavcodec     56. 12.101 / 56. 12.101
libavformat    56. 13.100 / 56. 13.100
libavdevice    56.  3.100 / 56.  3.100
libavfilter     5.  2.103 /  5.  2.103
libswscale      3.  1.101 /  3.  1.101
libswresample   1.  1.100 /  1.  1.100
libpostproc    53.  3.100 / 53.  3.100
Input #0, avi, from 'sw4-gout-test-crf-baseline.avi':
encoder         : Lavf56.13.100
Duration: 00:01:00.02, start: 0.000000, bitrate: 1469 kb/s
Stream #0:0: Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 854x322 [SAR 920:1037 DAR 40:17], 1459 kb/s, 47.95 fps, 23.98 tbr, 47.95 tbn, 47.95 tbc

Star Wars Episode 4

I have three versions of Star Wars episode 4 and four images in the screenshots below. This should provide an overview of the challenges involved in performing color correction. Clockwise from the top left.

1) RAW VOB file from the Star Wars Ep 4 “GOUT” edition.

2) GOUT modified to MP4 in Sony Vegas with no filters. Vegas hates MPEG-PS audio tracks and that VOB reports time incorrectly.

3) Despecialized version 2.5 by Harmy.

4) Editdroid’s version from the 1993 Laserdisc in VOB format.

You will note that the GOUT version in Vegas looks washed out. This is caused by a colorspace issue stemming from the NTSC color space (16-235) versus the RGB color space (0-255). I can change the levels in Vegas to make it look exactly the same as it does outside of Vegas with one of the built in presets.

The color pallet used in GOUT is the same as the Laserdisc because they both came from the same master. I would love to get my hands on the 1985 laserdisc release but that thing is beyond rare.

The Despecialized edition suffers from the f’ing Hollywood look with teal & orange slathered all over it as well as oversaturated colors. To fix issues like that I have to skew cyan towards blue to fix some of it. Desaturating yellow and red helps to fix the New Jersey fake tan look in most movies. Green is occasionally oversaturated. Couple all of that with lighness adjustments for cyan, yellow, magenta, red, green, and blue and things begin to get complicated quickly. Wait, levels are sometimes off as well. Add that to the mix.

I will be using the Editdroid version as my source and I am hoping to alter the color palette to be more inline with GOUT. Preliminary results at this point do not look promising at all. It looks like Harmy used the AAV ColorLab plugin, which I have, to modify colors. Sadly that plugin seems to have the side affect, at least on my machine, of screwing up some shades of orange like traffic cones and the orange Pinto in The Blues Brothers. My monitor is color balanced using a Spyder3Pro.

From the research that I have done there is no longer such thing as a “correct” version of Star Wars that a mere mortal like I can get their hands on. My hope is that Disney will fix the color issues in any reissues it may put out, but that is a pipe dream at best.