FFmpeg and how to use it wrong

Futurama-Zoidbert-in-theater-your-encoding-is-bad-and-you-should-feel-bad.png

I’ve been in the streaming media industry since 2008 and have seen a lot of misinformation regarding both FFmpeg and libx264. In this post I hope to help shed some light on what does and does not work.

Streaming media, at it’s core, requires three basic things.
1) Constant frame rate.
2) An even keyframe distance which is also known as a Group of Pictures or GOP.
3) A bitrate based encode.

Things that are nice to have are.
4) Finding a better bitrate for your content.
5) Hitting your target bitrate.
6) Audio encoding without A/V drift.
7) Proper encoding for your target audiences.

I have a basic rule when encoding content and that is to never trust the input. Are you sure that the frame rate is constant? Were you told that the content is progressive and not interlaced? Were you given information about keyframe distance or what color space the video is in? Can you trust that any of that is accurate? I can’t and you shouldn’t.

Section one – Constant frame rate

Constant frame rate is important because players like to have the PTS/DST timestamps they are decoding generated like clockwork. If they are not in the correct order you can have playback problems like content jumping forwards, backwards, and even possible problems with basic playback. To achieve proper playback with FFmpeg you need to use two options.

-r is used to specify the output frame rate[1]. This must be the same as the input frame rate to eliminate judder. This is used in conjunction with the -vsync parameter using the 1 option which will retime the PTS/DTS timestamps accordingly[2]. Depending upon the content you get do not be surprised if frames are duplicated and/or dropped during encoding. If that happens then if possible contact the content creator and ask them to fix their source content. It is not uncommon for FFmpeg to duplicate the first frame.

Section two – Keyframe distance

Ensuring that your keyframe distance is always the same you can use the -g parameter[3]. I go a bit beyond what is required for regular desktop playback and use the no-scenecut option in conjunction with the -g parameter. x264 will, by default, create a keyframe when it detects a scene change. It will also set the default maximum GOP value to 250 and the minimum GOP value to 25. Using the no-scenecut option will turn off scene detection for that codec. Setting --scenecut -1 is not a valid option or if it is I have found it nowhere in either x264’s or FFmpeg’s documentation.

Note that NTSC content is stupid. 23.976fps is inaccurate and should always be written as 24000/1001. 29.970fps should always be written as 30000/1001. 59.94fps should always be written as 60000/1001. The examples below are inaccurate on purpose for ease of reading.

If you inspect an output file with MediaInfo and did not use the no-scenecut option you will see scenecut=40. When done properly that will be zero scenecut=0. If this option is not used then keyframes will be misaligned for ABR content and segment sizes will be unpredictable.

You can also use the FFmpeg -sc_threshold 0 parameter to disable scene detection and is video codec neutral. This is equivalent to the no-scenecut option provided by libx264.

Section three – bitrate

I have seen people attempt to create VOD content and perform live streaming using Constant Rate Factor which is also known as CRF. If you do not specify a bitrate for x264 then it will default to CRF 23. If you do not specify a preset it will default to medium. If you do not specify a profile it will default to high.

I like to make sure that my content uses all of the bells and whistles for delivering bitrate based content including -buffsize, -maxrate, and even -minrate. Note that -minrate has no effect with x264. It is in my script in case I decide to use a different codec and it supports a minimum bitrate. [Note to self: update script to use -sc_threshold 0 ]

Section four – finding a better bitrate.

If you want to take the guesswork out of finding a better bitrate[4] then it is best to analyze the file to find one[5]. I now use CRF 23 to find a better global bitrate for whatever I am encoding and make sure to use the same encoding settings as my output file with the exception that I use the veryfast preset and the baseline profile. This is vital to finding a better bitrate.

Section five – Hitting your target bitrate.

Below is a sample of my script that I use for two pass encoding[6]. Note that almost everything in the script is a variable. Those values are inserted after the media is analyzed and a better bitrate is detected as described in the bitrate detection section above. I also perform audio conversion separate from video conversion as encoding audio at the same time can slow down this process.
—————————————-
ffmpeg -i $inputfile $scan -pix_fmt $colorspace -vf "crop=$w1:$h1:$x1:$y1,scale=$fixedwidth:$fixedheight" -vsync 1 -sn -map $vtrack -r $fps -threads 0 -vcodec libx264 -b:v:$vtrack $averagevideobitrate -bufsize $buffer -maxrate $maximumvideobitrate -minrate $minimumvideobitrate -an -pass 1 -preset $newpreset -profile:v $defaultprofile -g $gop $tune -x264opts no-scenecut -map_metadata -1 -f mp4 -y $outputfile-video.mp4

ffmpeg -i $inputfile $scan -pix_fmt $colorspace -vf "crop=$w1:$h1:$x1:$y1,scale=$fixedwidth:$fixedheight" -vsync 1 -sn -map $vtrack -r $fps -threads 0 -vcodec libx264 -b:v:$vtrack $averagevideobitrate -bufsize $buffer -maxrate $maximumvideobitrate -minrate $minimumvideobitrate -an -pass 2 -preset $newpreset -profile:v $defaultprofile -g $gop $tune -x264opts no-scenecut -map_metadata -1 -f mp4 -y $outputfile-video.mp4
—————————————-

The same values are used for the second pass to ensure that target bitrate is hit. If you do not use the same parameters in both passes then you will always miss your target bitrate.

This is an example of bad two pass encoding where different values are used, between the two passes neither frame rate nor GOP are defined, and your PTS/DTS timestamps will be the same as the input. You will never hit your target bitrate using this method.
—————————————-
ffmpeg -y -i 1080p-input.mp4 -c:v libx264 -b:v 5000k -pass 1 -f mp4 NUL && \
ffmpeg -i 1080p-input.mp4 -c:v libx264 -b:v 5000k -maxrate 5000k -bufsize 5000k -pass 2 1080p-output.mp4

—————————————-

Never reuse your first pass analysis when creating Adaptive Bitrate (ABR) content. Ever.
—————————————-
ffmpeg -y -i 1080p-input.mp4 -c:v libx264 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 3000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 1 -f mp4 NUL && \

ffmpeg -i 1080p-input.mp4 -c:v libx264 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 3000k -maxrate 3300k -bufsize 3000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 1080p-output.mp4

ffmpeg -i 1080p-input.mp4 -c:v libx264 -s 1280x720 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 1500k -maxrate 1650k -bufsize 1500k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 720p_output.mp4

ffmpeg -i 1080p-input.mp4 -c:v libx264 -s 640x360 -preset medium -g 60 -keyint_min 60 -sc_threshold 0 -bf 3 -b_strategy 2 -b:v 1000k -maxrate 1100k -bufsize 1000k -c:a aac -b:a 64k -ac 1 -ar 44100 -pass 2 360p-output.mp4
—————————————-

My personal experience using -b_strategy 2 did not work out so well and actually lowered the quality of my content. Your mileage may vary. Using -bf 3 will force three B-frames to be used. This is the default in the medium preset. In addition the medium preset uses three reference frames for content. This is easy for today’s players to decode.

This is two pass encoding done right while also converting audio to stereo AAC. I include the pix_fmt yuv420p color space because if you convert a piece of content that has, say, an incompatible color space (See also desktop Windows Media content) or is using the color range of computer RGB (0-255) and not broadcast RGB (16-235), then your H.264 video may not play back as expected.
—————————————-
ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -vsync 1 -vcodec libx264 -r 23.976 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v high -tune film -g 48 -x264opts no-scenecut -pass 1 -acodec aac -b:a 192k -ac 2 -ar 48000 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4

ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -vsync 1 -vcodec libx264 -r 23.976 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v high -tune film -g 48 -x264opts no-scenecut -pass 2 -acodec aac -b:a 192k -ac 2 -ar 48000 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4
—————————————-

Note that I add the audio bitrate to the video bitrate to calculate the bufsize value. I also multiply the target bitrate by 1.25 for the maxrate value. Why? This provides the encoder the liberty to allocate less data to low motion scenes and more data to higher action scenes. If you were to use a 10x value for your maxrate value the network signature would look a lot like CRF but boy will your content look great. I do not recommend this.

This brings up a few questions regarding quality, compression, and two pass encoding.

1) Is there a visible difference in output between CRF 23 when using the veryfast preset and the baseline profile versus the medium preset and the high profile?

You would think that as CRF 23 is being used that both output videos would be the same quality. Unfortunately this does not appear to be the case and Moscow State Universities’ Video Quality Metric Tool confirms this when analyzing the two output files via SSIM. I would use the Netflix VMAF tool but it is segfaulting as of the time I am writing this if it is both included with FFmpeg and is analyzing content via SSIM.

On a side note B-frames are killing me with regards to quality. While the output size of the high profile is smaller so too is it’s bit per pixel density. I interpret this as both good and bad. Compression, at least in this case, costs quality but it does make the file smaller.

Veryfast preset with the baseline profile using CRF 23:

Size      == 888 MiB
"Bitrate" == 1023 kb/s
BPP       == 0.104

Medium preset with the high profile using CRF 23:

Size      == 833 MiB
"Bitrate" == 958 kb/s
BPP       == 0.098

This lowered SSIM quality to an average of 0.97495 between the two files.

2) Does using the 1080p two pass mbtree[7] file and it’s two pass log file for encoding other pieces of content degrade quality?

Yes. Now why did I run that test? Because a lot of people reuse the first pass files for their other outputs in their ABR stack as shown earlier in this article. I have never agreed with that so I did a direct compare with a properly encoded two pass file and then used the 1080p mbtree and log file to output a 480p file.

To bring this home let us take a look at what happens when you use the proper two pass log files versus what happens when you use the wrong ones. In this instance the source content was 1080p and was used to generate the ffmpeg2pass-0.log.mbtree file and the ffmpeg2pass-0.log file. A second two pass encode was used to create a 480p output from the same 1080p source just like you would do when creating ABR content.

The two pass log file size for the 1080p mbtree file weighed in at 1.40GB while the 480p mbtree file weighed in at 287MB.

This is the second pass of the 480p output using the proper mbtree and log files.

frame=174434 fps=104 q=-1.0 Lsize= 907956kB time=02:01:15.23 bitrate=1022.4kbits/s dup=0 drop=1 speed=4.32x
video:905937kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.222847%
[libx264 @ 000000000305b2e0] frame I:3635 Avg QP:18.51 size: 31828
[libx264 @ 000000000305b2e0] frame P:47427 Avg QP:22.94 size: 9521
[libx264 @ 000000000305b2e0] frame B:123372 Avg QP:24.57 size: 2922
[libx264 @ 000000000305b2e0] consecutive B-frames: 4.3% 1.9% 7.1% 86.7%
[libx264 @ 000000000305b2e0] mb I I16..4: 23.8% 52.0% 24.2%
[libx264 @ 000000000305b2e0] mb P I16..4: 2.1% 8.6% 2.7% P16..4: 32.3% 12.3% 8.0% 0.0% 0.0% skip:34.1%
[libx264 @ 000000000305b2e0] mb B I16..4: 0.1% 0.7% 0.2% B16..8: 35.7% 4.2% 0.9% direct: 2.0% skip:56.2% L0:44.0% L1:49.2% BI: 6.9%
[libx264 @ 000000000305b2e0] 8x8 transform intra:60.9% inter:72.3%
[libx264 @ 000000000305b2e0] coded y,uvDC,uvAC intra: 65.0% 60.8% 30.4% inter: 16.0% 14.5% 0.5%
[libx264 @ 000000000305b2e0] i16 v,h,dc,p: 44% 30% 10% 16%
[libx264 @ 000000000305b2e0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 20% 18% 6% 7% 8% 8% 7% 8%
[libx264 @ 000000000305b2e0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 25% 15% 5% 7% 7% 7% 5% 6%
[libx264 @ 000000000305b2e0] i8c dc,h,v,p: 58% 21% 16% 5%
[libx264 @ 000000000305b2e0] Weighted P-Frames: Y:13.2% UV:3.7%
[libx264 @ 000000000305b2e0] ref P L0: 52.5% 16.0% 20.8% 9.4% 1.4%
[libx264 @ 000000000305b2e0] ref B L0: 84.5% 12.2% 3.3%
[libx264 @ 000000000305b2e0] ref B L1: 94.6% 5.4%
[libx264 @ 000000000305b2e0] kb/s:1020.08

This is the second pass of the 480p output using incorrect 1080p mbtree and log files.

frame=174434 fps=106 q=-1.0 Lsize= 907905kB time=02:01:15.23 bitrate=1022.3kbits/s dup=0 drop=1 speed=4.41x
video:905883kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.223274%
[libx264 @ 00000000026b00a0] frame I:3635 Avg QP:20.20 size: 26166
[libx264 @ 00000000026b00a0] frame P:46748 Avg QP:22.62 size: 9715
[libx264 @ 00000000026b00a0] frame B:124051 Avg QP:24.55 size: 3050
[libx264 @ 00000000026b00a0] consecutive B-frames: 4.0% 1.4% 6.8% 87.8%
[libx264 @ 00000000026b00a0] mb I I16..4: 22.9% 52.5% 24.5%
[libx264 @ 00000000026b00a0] mb P I16..4: 2.3% 9.0% 3.2% P16..4: 31.0% 11.7% 7.4% 0.0% 0.0% skip:35.5%
[libx264 @ 00000000026b00a0] mb B I16..4: 0.1% 0.7% 0.2% B16..8: 33.7% 4.3% 0.9% direct: 2.4% skip:57.7% L0:41.7% L1:50.1% BI: 8.2%
[libx264 @ 00000000026b00a0] 8x8 transform intra:60.0% inter:69.7%
[libx264 @ 00000000026b00a0] coded y,uvDC,uvAC intra: 62.2% 59.8% 30.0% inter: 15.6% 14.1% 0.8%
[libx264 @ 00000000026b00a0] i16 v,h,dc,p: 41% 28% 11% 20%
[libx264 @ 00000000026b00a0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 21% 17% 6% 7% 8% 8% 7% 8%
[libx264 @ 00000000026b00a0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 28% 16% 5% 7% 7% 7% 5% 5%
[libx264 @ 00000000026b00a0] i8c dc,h,v,p: 59% 21% 16% 5%
[libx264 @ 00000000026b00a0] Weighted P-Frames: Y:8.8% UV:2.1%
[libx264 @ 00000000026b00a0] ref P L0: 52.2% 16.4% 21.2% 9.4% 0.9%
[libx264 @ 00000000026b00a0] ref B L0: 84.3% 12.3% 3.3%
[libx264 @ 00000000026b00a0] ref B L1: 94.6% 5.4%
[libx264 @ 00000000026b00a0] kb/s:1020.02

Take note that a lot of data was pulled out of the I frames and allocated to P and B frames when the 1080p ffmpeg2pass-0.log.mbtree file and the ffmpeg2pass-0.log file were used instead of the ones that were generated for the output 480p content. This lowered SSIM quality to an average of 0.97553 when using the wrong two pass files. Viewing some of the frame differences in MSU VQMT made my eyes hurt.

Did you notice that the bit rate of the 480p file that was encoded using the veryfast preset, the baseline profile, and CRF 23 is very close (within the bounds of CAVLC and CAVAC entropy encoding) to the bitrate of the two pass encode that used the medium preset and the high profile? Two pass encoding puts the bits back.

Section six – Audio

In the audio portion of the two lines above you will see a few filters[8].

-af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" helps to keep your audio lined up with the beginning of your video. It is common for a container to have the beginning of the video and the beginning of the audio start at different points. By using this your container should have little to no audio drift or offset as it will pad the audio with silence or trim audio with negative PTS timestamps if the audio does not actually start at the beginning of the video.

Section seven – Proper encoding for your target audiences.

During my nine years in the streaming media industry I have seen companies like Research in Motion state emphatically that their Blackberry phones do not support the high profile. Just because a manufacturer states that specific profiles are not supported does not mean that they won’t work. RealNetwork’s Helix Producer would create, by default, content using the high profile. I had never had a problem delivering high profile content to those phones via RTSP.

Generally speaking I limit reference frames to three for compatibility purposes. If you decide to go above that make sure that you research which device or devices support larger reference frame distances. Note that using the animation tuning option for x264 will double your reference frames unless it is set to one[9].

Let’s finish off with one final example using single pass encoding.[10].

ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -deinterlace -vf "scale=640:360" -vsync 1 -vcodec libx264 -r 29.970 -threads 0 -b:v: 1024k -bufsize 1216k -maxrate 1280k -preset medium -profile:v main -tune film -g 60 -x264opts no-scenecut -acodec aac -b:a 192k -ac 2 -ar 44100 -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -f mp4 -y outputfile.mp4

I hope that this article helps to debunk misinformation that is rampant on the Internet regarding the usage of FFmpeg and what x264 options are valid to create compliant streaming media VOD content.

—————————————-

[1] Force the output frame rate using the -r parameter.
https://ffmpeg.org/ffmpeg.html#Description


[2] Vsync parameter for FFmpeg.
https://ffmpeg.org/ffmpeg.html#Advanced-options


[3] The -g option is described below.
https://sites.google.com/site/linuxencoding/x264-ffmpeg-mapping


[4] The articles below have several recommendations on bitrate.
https://support.google.com/youtube/answer/1722171?hl=en
https://www.wowza.com/docs/how-to-encode-source-video-for-wowza-streaming-cloud#vbitrate


[5] I have an article on how to calculate a better bitrate.
https://videoblerg.wordpress.com/2015/12/16/intelligent-video-encoding/

The article above is featured in an article by Jan Ozer. It is also referenced in his book Video Encoding by the Numbers in Chapter 7: Choosing Data Rate.
https://streaminglearningcenter.com/blogs/per-title-encoding-its-everywhere.html


[6] Please note that both the first pass and the second pass are identical sans the portion that identifies which pass is being used.
https://trac.ffmpeg.org/wiki/Encode/H.264


[7] Detailed information about the macroblock tree file can be found in the following forum by the person who designed it.
https://forum.doom9.org/showthread.php?t=148686


[8] Audio references can be found below.

aresample.
https://ffmpeg.org/ffmpeg-filters.html#aresample-1

async, min_hard_comp, and first_pts.
https://ffmpeg.org/ffmpeg-resampler.html#Resampler-Options


[9] x264 tuning values.
https://superuser.com/questions/564402/explanation-of-x264-tune


[10] Encoding options not included in the article above or insufficiently detailed.

Video options:

-i is for designating the input.

-deinterlace should only be used if your content is interlaced and is announced as either interlaced or MBAFF. It is recommended to deliver only progressive content to web based players.

-vf "scale=640:360" is a video filter that will scale the output video to a different resolution.
https://ffmpeg.org/ffmpeg-filters.html#Filtering-Introduction

-vcodec libx264 specifies the x264 video codec. You can substitute -c:v for -vcodec if you wish.

-b:v: 1024k specifies a video bitrate of 1024kbps.

-bufsize 1216k specifies the buffer. This is a best practice for RTSP delivery and streaming media in general.

-maxrate 1280k specifies the maximum bitrate allowed.

-preset veryfast is one of several presets available for H.264 video. Those include ultrafast, superfast, veryfast, faster, fast, medium, slow, slower, veryslow, placebo. It is disrecommended to use anything higher than the medium preset for streaming media.

-profile:v baseline is one of several profiles available for H.264 video. Those include baseline, main, high, high10, high422, and high444. Hardware devices, specifically older mobile phones, rarely state support for any of the high profile options even though they may work. You should include the :v portion at the end of the profile to specify that the profile is for video as some audio codecs also have audio profiles.
https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC#Profiles

Note that x264 has eradicated the extended profile.
https://www.ffmpeg.org/ffmpeg-codecs.html#Audio-Encoders

Additional detail on the inner workings of presets, please reference the following page.
http://dev.beandog.org/x264_preset_reference.html

-tune film is one of several tuning options available for H.264 video. Those include animation, grain, stillimage, psnr, ssim, fastdecode, zerolatency. Animation should not be used with streaming media as it will double the number of reference frames defined in in the preset option.

More on preset, profile and tuning can be found here.
https://wiki.libav.org/Encoding/h264

and here.
http://www.chaneru.com/Roku/HLS/X264_Settings.htm#preset

The libx264 option ratetol=0.01 will force a very strict constant bitrate, so much so that libx264 will complain and adjust accordingly. This is optional and not shown above as constant bitrate content is dead to me.

-f mp4 defines that the format will be an MP4 container.

-y outputfile.mp4 will state that if outputfile.mp4 exists that it will be overwritten. This is required if you perform two pass encoding and do not redirect the first pass output to a null device.

Audio options

-acodec aac invokes the use of the internal AAC codec. You can substitute -c:a for -acodec if you wish. You no longer need to use the -strict experimental option with this codec.

-b:a 192k states the the total bitrate of the audio should be 192Kbps. Apple recommends a minimum bitrate of 64Kbps per channel.

-ac 2 forces the audio to be stereo. This is a best practice for streaming media so that you can reach the most players, however you can use additional channels if one or more of your target devices support it.

-ar 44100 forces the frequency to be 44.1k which is compatible with Flash players. The player may downsample audio to 44.1k, 22.05k, or 11.025k. Do not use different audio frequencies with ABR content.

To help translate options between FFmpeg and libx264 please reference the following site.
http://www.chaneru.com/Roku/HLS/X264_Settings.htm

30 thoughts on “FFmpeg and how to use it wrong

  1. This is just wonderful and so helpful ! This is by far the most comprehensible piece on encoding I’ve yet to find over the web. A lot of stuff you’ll find around seems to have been posted by the author with the primary intent of boosting his ego and usually of a very little help to people who are eager to learn. But your piece is just gold. Everything is so well detailed and just having all these varied examples of command lines with the correct syntax is extraordinary to have. And your explanation couldn’t be more empowering. You’re offering a deep understanding of what each parameters are in a way that allow people to grow out of the hit and miss mode and constant doubts. Thank you so much!

    Like

    • Thank you for sharing that. Oddly enough I ran across that post at some point along the line.

      I’ve moved to using the “-sc_threshold 0” option as that is codec neutral. In addition I do not recall seeing lower case i-frames in my output when using either no-scenecut or sc_threshold and checking my content using a bash shell script that drives ffprobe. I will also look occasionally at the first pass log file for lower case i frames. Clean as a whistle.

      ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $inputfile | grep -E ‘I|i’ | cut -d ‘,’ -f2 > $inputfile-idrframes.log

      The FFmpeg codec neutral setting for a minimum keyframe distance is “-keyint_min”. I have yet to need that which is why I did not include it. It is also buried in the documentation.
      https://www.ffmpeg.org/ffmpeg-codecs.html#Codec-Options

      Like

  2. I do not understand your answer here. The point is that you DO want I-frames on scene changes, but not IDR-frames (keyframes). To achieve this, you will need to set min-keyint to your desired GOP length, then force keyframes at that same rate.

    If you use no-scenecut all scene changes will use P-frames which are not as efficient. Those should be I-frames.

    You can use this tool to quickly check the GOP structure of h264 MP4 files: https://lulebo.github.io

    IDR frames will be red, and non-IDR I-frames will be orange. In a correctly coded file, you will see orange I-frames on scenecuts, and red IDR-frames on GOP boundaries.

    Like

    • I want IDR frames to be at an even cadence. I do not want keyframes on scene changes. I do not recall either stating or implying that. If I did then please point it out and I will update my post accordingly.

      I’ve been in the streaming media industry since 2008. I have helped more than one company make their encoders compliant with streaming content through Akamai which requires keyframes at an even cadence. Using the “no-scenecut” option fixed their problems entirely once IDR frames were evenly spaced.

      See also the following requirements for Akamai.
      https://learn.akamai.com/en-us/webhelp/media-services-on-demand/akamai-media-services-on-demand-encoder-best-practices/GUID-5239477C-FF22-4265-88F2-C88D11A0005A.html

      I have gone so far as to parse my 2 pass logs looking for non-IDR frames. To date I have found exactly zero when I am using either “no-scenecut” or the codec neutral “-sc_threshold 0” option. I have added the codec neutral “-keyint_min” option and it had zero effect other than showing up in my encoding settings a bit different. For example:

      This command:
      ffmpeg -i inputfile.mp4 -pix_fmt yuv420p -vsync 1 -sn -threads 0 -vcodec libx264 -r 24000/1001 -keyint_min 48 -g 48 -sc_threshold 0 -b:v 1024k -bufsize 1024k -maxrate 1024k -an -preset veryfast -profile:v baseline -tune film -map_metadata -1 -f mp4 -y outputfile.mp4

      outputs this as reported by MediaInfo for my two second GOP.
      “keyint=48 / keyint_min=25 / scenecut=0”

      As you can see the minimum keyframe distance, when set to the same distance as the GOP, does not create a minimum keyframe distance equal to the GOP. From what I can see the “min-keyint” and “-keyint_min” options are, at least in my experience, a placebo when using libx264. Don’t take my word for it and test it yourself.

      The website you provided looks to be a combination of MediaInfo as it extracts the encoding information from it and a program called “MPEG file bitrate viewer.” I am intimately familiar with both of them.

      I run the following bash shell script to validate that IDR frames are in cadence for both personal and professional use.
      ffprobe -select_streams v -show_frames -show_entries frame=pict_type -of csv $inputfile | grep -n I | cut -d ‘:’ -f 1 > $inputfile-keyframes.log

      If you have a better way of ensuring only IDR frames show up and zero i-frames show up then please show me. To reiterate, I have seen precisely zero i-frames in my content.

      Like

  3. Pingback: RTMP & Transcoding – Johann Savalle

  4. I did not mean that you said in your post you wanted keyframes. I meant you SHOULD want I-frames. I-frames and keyframes(IDR-frames) are not the same. On a scene change, the best thing to do is to encode it as an I-frame. But for streaming, it shouldn’t be a keyframe. That is why you should keep scene change detection ON, not turn it off.

    And if you use x264 codec the keyint_min and keyint CAN NOT be set to the same value. keyint_min will be set to half your keyint value. That is why it doesn’t work as expected when not turning off scene change detection.

    The page does not use any other software. It is a MP4 and h.264 bitstream parser I wrote myself.

    Like

  5. The article states “When done properly that will be zero scenecut=0. If this option is not used then keyframes will be misaligned for ABR content and segment sizes will be unpredictable.”

    This is simply not true. Scenecut detection can be used and still keep an even keyframe distance. There are two types of I-frames. Normal I-frames, and IDR-frames (commonly known as keyframes).

    If you keep scene change detection on, and use something like this, you will get a correctly coded file for streaming with keyframes evenly spaced every 48 frames. As a bonus the scene change detection will work its magic and give you nice I-frames on scene changes without messing up the GOP length:

    -x264opts min-keyint=48 -force_key_frames “expr:eq(n,0)+eq(n, prev_forced_n+48)”

    Liked by 1 person

      • Well, this article and your previous answers do not convince me you were intimately familiar with the difference, since you have been arguing against my point the whole time.

        Anyway, I do not know of a codec neutral way to do it. Not all codecs have the same functionality. You will have to look up how to create a fixed GOP length for the specific codec used. My method here is specifically for the x264 codec.

        Like

      • Um. I do get the difference between lower case i frames and upper case I frames which are known as IDR (Instantaneous Decoding Refresh) frames. I understand why you, or anyone else, would like to use both forms of I frames. I hope that this response dissuades you from believing that I am unfamiliar with both types of I frames.

        In most cases I am leaning towards codec neutral options for frame rate, keyframe distance, and bitrate to name a few. While I will accommodate some codec specific options and could very easily implement your formula:
        -x264opts min-keyint=48 -force_key_frames “expr:eq(n,0)+eq(n, prev_forced_n+48)”

        I choose not to because, again, I am doing every thing I can reasonably do to create a codec neutral script. I am willing to make small bitrate and quality sacrifices because of this.

        With newer codec types coming out I am going to eventually move forward with AV1 from the Open Codec Alliance because I have a great dislike for the licensing around HEVC / H265.

        You disagree with how I convert my content. I respect that. I neither judge people, insult people, or belittle their knowledge when they create streaming media. If their content follows at least the top three items at the top of this article I am happy. If it does not then I am happy provide constructive criticism in a positive and empowering way.

        I am also thankful that you shared your method for including both i frames and IDR frames while maintaining IDR keyframe distance when using the x264 implementation of the H.264 specification. I may do some testing with regards to quality and bitrate with your method, but that project is currently not a high priority to me.

        Go forth and be well, Carl. I again thank you for your insights.

        Like

  6. Thank you for great post,

    I need your help with VOD content, can we work on that ?

    please email me to step a meeting and find best way to work to together,

    Like

  7. @navilor

    Fabulous article, very insightful.

    I have a question, if I may. You have stated:-
    Never reuse your first pass analysis when creating Adaptive Bitrate (ABR) content. Ever.

    Does that statement only relate to when EACH output representation is a DIFFERENT RESOLUTION? (e.g. 1920×1080, 1280×720, 640×360).

    Would it be OK, with your expertise on this, to use the same FIRST PASS if several representations all have the same output resolution. e.g. if creating multiple outputs at 1280×720 but at different bitrates as the only difference. And if that is OK, then would it matter which bitrate is used when creating the first pass?

    I’d appreciate your wisdom on this…

    Like

    • If any of the encoding parameters between outputs is different then you should not use the first pass analysis file from that encode. This includes, but is not limited to, different resolutions, bitrates, presets, profiles, and tuning options.

      Like

  8. This is brilliant – really helpful! I think, finally, I’m working out what’s going wrong with my animations! Essentially, I’m creating an animation in two stages: creating an mp4 of the complex background and then overlaying the simpler animation (as an image sequence). The two don’t line up. If I’ve read you right, that’s because ffmpeg and H264 mess about with dropping frames and such. Is that right?

    If so, do you know of a format that doesn’t do this and would allow me to accurately line up my animation with the video? Or is the only way to use two image sequences?

    Like

    • Format has nothing to do with timing or frames dropping. This is a problem with your source. You should use a non linear editor to get everything done in one pass which will force things to be lined up correctly. Make sure that they are both at the same frame rate.

      Fix your source or FFmpeg, as you have found, will fix it for you.

      Like

  9. Hi, i’m using this code:
    -c:v copy -acodec aac -ac 2

    I uploading my file into OneDrive, but i noticed some video not play the video. I check the video mediainfo both are same format as MPEG-4.

    Not working video details:
    Format : MPEG-4 at 3 180 kb/s
    Length : 546 MiB for 24 min 1 s 577 ms
    Video #0 : AVC at 3 044 kb/s
    Aspect : 1280 x 720 (1.778) at 23.976 fps

    Working video details:
    Format : MPEG-4 at 6 311 kb/s
    Length : 2.01 GiB for 45 min 33 s 110 ms
    Video #0 : AVC at 6 190 kb/s
    Aspect : 1920 x 1080 (1.778) at 25.000 fps

    Any idea please, or it problem with OneDrive?

    Like

    • I always do things like this on my native file system. I also playback content from my native file system.

      Break it into two files.
      ffmpeg -i inputfile.ext -vcodec copy -an -f mp4 video.mp4
      ffmpeg -i inputfile.ext -acodec aac -b:a 192k -ac 2 -ar 48000 -f mp4 audio.mp4

      Both of those should play. If they do then do the following.
      ffmpeg -i video.mp4 -i audio.mp4 -codec copy -f mp4 muxed.mp4

      Like

      • Does your command need to do 2 times work?

        From what i understand is 1st we convert the video to mp4, then audio to mp4?

        After extract video and audio, we muxed them into single video file as mp4, correct me if i’m wrong.

        Btw, this is my current command for convert multiple mkv from single click bat file:
        ffmpeg -y -loglevel panic -i “input/%~1.mkv” -vcodec copy -an -f mp4 “output/%~1.mp4”

        Can give me the full command directly without need to muxed? Thanks.

        Like

      • There might be a problem with your file. Doing one codec at a time will find out if it is a problem with the audio stream and/or video stream.

        Like

    • No. I do not use a browser to convert content. My entire blog is based on using the command line.

      My current rate for help is $160 per hour and is non-negotiable. If you need further assistance then contact me privately.

      Like

      • The article above is designed to allow a person to play back files in browsers using the HTML5 video tag. Not all browsers will perform well with 4K content. Some older browsers require Adobe Flash Player to decode content. You will need to create Adaptive Bitrate (ABR) content to accommodate for bandwidth and players.

        Like

Leave a comment