Intelligent video encoding

I have been saying this for a few years now. Netflix has finally gotten on the bandwagon.

I worked at RealNetworks for over six years and became their onsite encoding expert for creating H.264 video with AAC audio in an MP4 container using FFmpeg after just three years. Our group was laid off when their Helix Streaming Media Server, which I supported, was discontinued.

I have converted most of my Blu-ray and DVD content, including one HD-DVD, to MP4 files and have found, just as the article says, that not all video is created equal. Why? Movement is expensive. In addition, grain is movement. Please do not get me started on encoding artifacts in the source media. NeatVideo, if you know how to use it, can help with both grain and encoding artifacts without having to resort to sharpening. The use of sharpening is, in my opinion, the refuge of the inept unless the source is so low quality that it looks like a blur. Even then use sparingly only if it is absolutely needed. If you want a challenge run NeatVideo against the movie Fight Club.

As an example, encode for yourself both a high action video and some low action video using x264 using a CRF value of 21 with the veryfast preset and the baseline profile. When you are finished use MediaInfo to look at the bit per pixel density (BPP) of the output video. The action video will have a much higher bitrate and BPP density than the low action video. As such you should target what the video requires.

My procedure for finding a decent bitrate is as follows:

1) Encode the video using the veryfast preset and the baseline profile to grab what the bit per pixel density is at CRF 21.

2) Perform a two pass encode with the medium preset and the high444 profile using the BPP value found in the video. You will see that both the initial CRF encoded video and the two pass video are about the same size and have, obviously, the same BPP density. The output “CRF” value, as reported by FFmpeg, will be about 19.4 due to compression. I have covered this before. Don’t take my word for it, use the Moscow University Video Quality Measurement Tool.

The reason for the medium preset is that mobile devices and other hardware decoders (Roku, Apple TV, etc…) all have limitations on playing H.264 video content that has more than three reference frames. To date I have found no device that cannot handle the high444 profile, which prioritizes the luma (Y’) channel over chrominance (Cb Cr) even though manufacturers state that they only support the main profile with CABAC. The only devices that I have not tested were the old school Blackberry phones.

On a side note, use the information that MediaInfo puts out as well as what FFmpeg puts out to find out what the width, height, and FPS of the source is as well as what the source audio frequency and bitrate are. If you know what you are doing you can detect telecine content in MPEG-PS containers (VOB) so that you do not duplicate frames when encoding. In addition, forcing the frame rate to what the source media says it is will keep the framerate solid. Advanced class is performing automatic crop detection (beware “The Right Stuff” and “Tron Legacy”), and audio normalization if your hearing is poor like mine is.

How will this affect your production workflow? If you decide to implement then not much. All that you need to do is perform a test encode to find the BPP density and then have your MBR content encoded to the same BPP density. If you are converting a series do a test convert of a few episodes and find the right bitrate for you.


7 thoughts on “Intelligent video encoding

  1. Interesting!

    From step 2, how does one encode “using the BPP value found in the video”? Is there an ffmpeg flag which allows this?


    • Oscar,

      Bit per pixel density is not a flag in FFmpeg. You need to find the BPP density using MediaInfo and then do some math to get an actual bitrate. I use the command line version of MediaInfo to get this number.

      Output width of the video * Output height of the video * Output frames per second of the video * Bit Per Pixel Density = Bitrate of video

      For example:

      1920 x 1080 * 23.976 * 0.100 == 4971663.36 bits per second
      1920 x 1080 * 23.976 * 0.100 / 1024 == 4855.14 Kbps

      Jan Ozer has a visual example of using MediaInfo for bit per pixel calculations that can be found here:

      I hope that helps.


  2. Your comment about using High444 profile caught my attention. What is your reasoning for using High444 profile with YUV 4:2:0 sources?
    I checked out the MediaInfo log in your older blog post and noticed that even in the (allegedly) High444-encoded video MediaInfo reports only HP with 4:2:0 subsampling. That would imply that High444 profile didn’t actually get used. Are you certain you are really using High444?


    • As I am using the yuv420p colorspace all output will be constrained to that color space so you are one hundred percent correct that my encoding method does not use the yuv444p color space. I encode using the high444 profile as it has a higher perceptive quality to me at the same bitrate than the regular high profile. I have found no documentation with regards to libx264 not supporting the high444 profile within the yuv420 colorspace. FFmpeg throws no errors and I take all warnings and errors seriously. The documentation for libav states to use -pix_fmt yuv420p for compatibility with older players.

      Encoding to lossless uses the yuv444p colorspace.

      I effectively grew up in the optometrist office and learned at the early age of seven that the eye is more sensitive to light and dark than it is to color. As such using the high444 profile focuses on making sure that the luma channel has no chroma subsampling at the expense of the color channels. The yuv colorspace contains the luma channel and the two color channels. All my encoding is doing is ensuring that the luma channel looks better to my eyes. Your eyes may see things differently.

      Go forth and encode a video using the yuv444p colorspace and play it back on a hardware device like a Roku or the hardware based viewing device of your choice. My testing with a Roku 3 had abysmal results. The video showed nothing more than a green screen with odd corruption at the top of the video and on the left hand side of the video but the audio played back with no issue. Chrome browser can watch the video via a direct HTTP URL whereas Firefox cannot. IE just wants to download the file.

      I have not tested output using bt.709.

      The command line below is what I used for validating the issue that my Roku 3 has decoding the yuv444p colorspace.

      ffmpeg.exe -fpsprobesize 120 -i input.vob -pix_fmt yuv444p -vsync 1 -sn -vcodec libx264 -map 0:1 -r 23.976 -vf crop=720:480:0:0,scale=720:480: -threads 8 -crf 21 -strict experimental -acodec aac -map 0:2 -b:a:0:2 360000 -ac 2 -ar 48000 -af aresample=async=1:min_hard_comp=0.100000:first_pts=0 -preset veryfast -profile:v high444 -keyint_min 12 -g 120 -x264opts no-scenecut -tune film -map_metadata -1 -f mp4 -y output.mp4

      MediaInfo information below regarding the output file.

      Complete name : output.mp4
      Format : MPEG-4
      Format profile : Base Media
      Codec ID : isom (isom/iso2/avc1/mp41)
      File size : 1 007 MiB
      Duration : 2h 1mn
      Overall bit rate mode : Variable
      Overall bit rate : 1 162 Kbps
      Writing application : Lavf57.23.100

      ID : 1
      Format : AVC
      Format/Info : Advanced Video Codec
      Format profile : High 4:4:4 Predictive@L3
      Format settings, CABAC : Yes
      Format settings, ReFrames : 4 frames
      Codec ID : avc1
      Codec ID/Info : Advanced Video Coding
      Duration : 2h 1mn
      Bit rate : 867 Kbps
      Width : 720 pixels
      Height : 480 pixels
      Display aspect ratio : 16:9
      Original display aspect ratio : 16:9
      Frame rate mode : Constant
      Frame rate : 23.976 (23976/1000) fps
      Color space : YUV
      Chroma subsampling : 4:4:4
      Bit depth : 8 bits
      Scan type : Progressive
      Bits/(Pixel*Frame) : 0.105
      Stream size : 751 MiB (75%)
      Writing library : x264 core 148 r2638 7599210
      Encoding settings : cabac=1 / ref=1 / deblock=1:-1:-1 / analyse=0x3:0x113 / me=hex / subme=2 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=0 / me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=6 / threads=8 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=1 / keyint=120 / keyint_min=12 / scenecut=0 / intra_refresh=0 / rc_lookahead=10 / rc=crf / mbtree=1 / crf=21.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00

      ID : 2
      Format : AAC
      Format/Info : Advanced Audio Codec
      Format profile : LC
      Codec ID : 40
      Duration : 2h 1mn
      Bit rate mode : Variable
      Bit rate : 288 Kbps
      Maximum bit rate : 360 Kbps
      Channel(s) : 2 channels
      Channel positions : Front: L R
      Sampling rate : 48.0 KHz
      Frame rate : 46.875 fps (1024 spf)
      Compression mode : Lossy
      Stream size : 251 MiB (25%)
      Default : Yes
      Alternate group : 1


  3. Hmmm… I think you may be experiencing a placebo effect here. I don’t think you’re actually getting the effect that you think you’re getting.

    H.264 High 4:4:4 Predictive Profile allows the codec to internally use the YUV 4:4:4 colorspace (no subsampling of any plane), but also supports 4:2:2 and 4:2:0 chroma subsampling. When you specify High444 profile with -pix_fmt yuv420p, you are forcing the internal H.264 colorspace to stay YUV 4:2:0 – same as your source – which negates any advantage of using High444 profile. It effectively becomes the same as High profile. In fact, x264 may even override your High444 parameter and just use High profile instead when it detects 4:2:0 input.

    On the other hand, when you state -pix_fmt yuv444p you force FFmpeg/x264 to upscale the chroma planes from quarter res (4:2:0) to full res (4:4:4), but you’re not actually adding any visual detail. This doesn’t preserve luma or chroma fidelity any more than it would if you just kept your source as 4:2:0 – in fact, it makes the codec’s job more difficult since it now has to spend precious bits on coding (unnecessarily) full resolution chroma planes. If your eyes are seeing a positive improvement, I suspect it’s just placebo effect. If you actually encode the same video with the same CRF value (e.g. 26) but different profile/colorspace (High/4:2:0 vs High444/4:4:4), you will find that the High444 video usually comes out larger in size. This is an indicator that the video doesn’t compress as well (due to additional chroma resolution), which in ratecontrol mode translates to lower quality at the same BPP. As further proof, encoding the video in CBR mode with SSIM analysis enabled will likely show a consistently lower SSIM score for High444 profile.

    My recommendation would be not to use High444 profile.


    • I’ll check to see if the CRF file size changes and run it back through MSVQMT for a quick metric test.

      Thank you for your input. I always appreciate the opportunity to learn.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s