How to create ABR content with FFmpeg in one pass

I was once informed that it would be nigh impossible to create ABR content in one pass using FFmpeg.  Challenge accepted!

I remembered that statement when I wanted to calculate a bit per pixel density encoding matrix for different video resolutions.  The problem with doing that is that every source video is different, even throughout the entire video, and I did not want to encode multiple different video files multiple times to generate this matrix.  Desperation, being the mother of invention, decided to intervene on my behalf.  Once I figured out how to perform single pass ABR encoding I decided to perform some ABR encoding using CRF 21 to find an approximate bit per pixel density.  As expected this yielded a fair amount of data as each one minute section of the source video I encoded was different.  From that I validated a trend that I had been seeing for a while.

To retain a relatively sane visual quality, the lower the resolution of the video the higher the bit per pixel density should be.  I did some work and came up with some data.  If you run a one minute CRF encode against each minute in your source 1080p video you will be provided with a bit per pixel density matrix.  Using the matrix below, assuming you do not want to make one of your own, you should be able to find an approximate bitrate to use for your other resolutions. Please note that the 1080p bit per pixel density shown below is post encode and is not the source file. Best practice for a full encode is to perform a single pass encode against your 1080p source to generate a bit per pixel density that you can then use to encode your ABR content.

RES     BPP      RES     BPP      RES     BPP      RES      BPP
360p    0.277    480p    0.226    720p    0.185    1080p    0.161
360p    0.244    480p    0.207    720p    0.176    1080p    0.155
360p    0.208    480p    0.169    720p    0.139    1080p    0.120
360p    0.215    480p    0.164    720p    0.128    1080p    0.111
360p    0.194    480p    0.158    720p    0.131    1080p    0.117
360p    0.164    480p    0.136    720p    0.115    1080p    0.106
360p    0.136    480p    0.110    720p    0.091    1080p    0.082
360p    0.152    480p    0.117    720p    0.092    1080p    0.080
360p    0.160    480p    0.120    720p    0.093    1080p    0.079
360p    0.134    480p    0.108    720p    0.089    1080p    0.079
360p    0.126    480p    0.100    720p    0.081    1080p    0.071
360p    0.125    480p    0.097    720p    0.078    1080p    0.069
360p    0.118    480p    0.091    720p    0.074    1080p    0.065
360p    0.103    480p    0.084    720p    0.070    1080p    0.065
360p    0.103    480p    0.083    720p    0.068    1080p    0.063
360p    0.110    480p    0.085    720p    0.068    1080p    0.062
360p    0.105    480p    0.082    720p    0.066    1080p    0.061
360p    0.094    480p    0.074    720p    0.061    1080p    0.057
360p    0.100    480p    0.075    720p    0.059    1080p    0.054
360p    0.077    480p    0.062    720p    0.051    1080p    0.049
360p    0.078    480p    0.060    720p    0.050    1080p    0.048
360p    0.077    480p    0.061    720p    0.049    1080p    0.044
360p    0.072    480p    0.055    720p    0.044    1080p    0.041
360p    0.056    480p    0.045    720p    0.038    1080p    0.041
360p    0.063    480p    0.052    720p    0.043    1080p    0.040
360p    0.051    480p    0.042    720p    0.035    1080p    0.038
360p    0.046    480p    0.038    720p    0.033    1080p    0.037

Yes, entropy encoding is interesting.  You probably recognized in the matrix above that the numbers are not as uniform across the resolutions as we would like.  Using that matrix as a guideline I came up with some calculations that I have provided below and did some rounding of the bit per pixel density and then rounded the bitrate.

1920*1080*23.976/1024*0.070  ==  3398.598kbps
1280*720*23.976/1024*0.080   ==  1726.272kbps
854*480*23.976/1024*0.100    ==  959.78925kbps
480*360*23.976/1024*0.125    ==  505.74375kbps
426*240*23.976/1024*0.133    ==  318.38254875kbps
284*160*23.976/1024*0.150    ==  159.59025kbps

With all of that said, 1080p comes out to about 3400kbps, 720p comes out to about 1725kbps, 480p comes out to about 960kbps, 360p comes out to about, 510kbps, 240p comes out to about 320kbps, and 160p comes out to about 160kbps.

Now, how did I do it?  With likely the longest FFmpeg command line I have ever assembled.  I have broken out the different outputs to their own separate lines for readability.

ffmpeg.exe -i sourcefile.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=1920:1080" -b:v 3400k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 1080p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=1280:720" -b:v 1725k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 720p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=854:480" -b:v 960k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 480p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=480:360" -b:v 510k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 360p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=426:240" -b:v 320k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 240p.mp4

-pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf "scale=284:160" -b:v 160k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af "aresample=async=1:min_hard_comp=0.100000:first_pts=0" -map_metadata -1 -f mp4 160p.mp4

Or the long version if you prefer:

ffmpeg.exe -i sourcefile.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=1920:1080” -b:v 3400k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 1080p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=1280:720” -b:v 1725k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 720p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=854:480” -b:v 960k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 480p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=480:360” -b:v 510k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 360p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=426:240” -b:v 320k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 240p.mp4 -pix_fmt yuv420p -r 23.976 -vcodec libx264 -vf “scale=284:160” -b:v 160k -preset veryfast -profile:v baseline -keyint_min 24 -g 48 -x264opts no-scenecut -strict experimental -acodec aac -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -map_metadata -1 -f mp4 160p.mp4 -strict experimental -acodec aac -vn -b:a 96k -af “aresample=async=1:min_hard_comp=0.100000:first_pts=0” -f mp4 AudioOnly.mp4

Now go forth and streamline your production environment, unless you plan on performing two pass encoding then you are on your own, or not if you build your own bit per pixel density encoding matrix from your unique and varied content.


How to get a live u-Law WAV stream to Cisco VOIP servers (Updated 2017-07-23)

I’m probably going to get some of the specifics on the Cisco VOIP server a bit wrong, but the following is what I remember when deconstructing multiple customer descriptions over the years who did not yet know how to set up their Cisco VOIP server with on hold audio. As scary as it may seem I think that I was better at setting up the on hold music using Helix Server than anyone working at Cisco was or any of their customers were even though Cisco, for many years, recommended Helix Server until it was discontinued. Then again it was my job to know these sorts of things.

In a typical scenario a Cisco engineer and one of their customers would get on a phone call with me on how to configure Helix Server for streaming their on hold audio. For everything else Cisco I am currently a knuckle dragging troglodyte.

When I was working at RealNetworks supporting Helix Server we had a high volume of customers using Cisco VOIP phone systems and they all needed two things:

1) Looped on demand u-Law WAV files. Helix Server supported this using it’s Simulated Live Transfer Agent (SLTA).

The Cisco VOIP server has the ability for you to upload audio files, which then get converted to u-Law WAV files, for when a customer is on hold in a cost center (customer service, billing, legal, etc…) so that they can have customized music or advertisements for a product the customer might be interested in. I was never fully sure why they used SLTA for this unless they had a super cheap Cisco server that didn’t have that function, if they had more cost centers than the Cisco server supported, or even worse if they didn’t know that they had that functionality on their Cisco server.

Cisco did not seem to have any really good documentation on how to make the u-Law WAV files you needed for SLTA, but this forum post works. Sadly FFmpeg, my encoding tool of choice, supports looping images only. There is currently no audio equivalent that I know of.

Loop over the input stream. Currently it works only for image streams. This option is used for automatic FFserver testing. This option is deprecated, use -loop 1.

I have a few theoretical hacks to get looped content working, but they are both difficult to set up and are also very unstable. In other words they are not ready for deployment in an enterprise environment that demands high uptime. If I find a free solution that is stable I will post it here.

2) Live u-Law WAV stream. Helix Server did not support this. I performed extensive and exhaustive testing and it was unable to properly repacketize an incoming live u-Law stream to either unicast RTSP or multicast SDP no matter the input method. I was hoping that this would get fixed, however our group was laid off before that could happen.

Cisco used to have an audio capture card in their hardware VOIP servers that customers would use to pipe their satellite Muzak feed into (stereo or mono din input if I remember correctly), but that was discontinued because now they apparently only provide an image that goes into a VM so there is no capture card. Their customers had to settle for SLTA.

With that said I can provide option number two for companies that have a Cisco VOIP server that they use. For a proper live u-Law WAV delivery configuration you need to know a few things:

A) A little bit about FFmpeg or at least a willingness to learn. You can download  already compiled versions for Windows over at Zeranoe’s website. If you are on Linux you can either compile FFmpeg yourself or head to the FFmpeg website itself for some static Linux builds.

B) How to create a u-Law file for testing a pseudo live feed.

C) How to create a multicast SDP file using FFmpeg using the u-Law file created above.

D) How to modify the multicast SDP file to work with VLC as a player. This is the first test to see if you have things right for live streaming.

E) How to connect via DirectShow on Windows or ALSA on Linux to an audio source. This is the final step in testing that your device works. If you start here then you may never really know if your device is working, if the SDP file is working, or if you even created the output correctly.

You will learn all of the above in this article.

I just finished converting an MP3 file to u-Law using the following command line:

ffmpeg -i in.mp3 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f wav -y out.wav

You can deliver a pseudo live non looping feed of that file:

ffmpeg -re -i out.wav -f rtp rtp://

You can also use the source file if you want:

ffmpeg -re -i in.mp3 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

FFmpeg is nice in that it dumps the SDP information for the RTP stream to the command prompt even though no SDP file is created:

o=- 0 0 IN IP4
s=Your File Metadata
c=IN IP4
t=0 0
a=tool:libavformat 57.23.100
m=audio 9008 RTP/AVP 0

Sadly connecting to that SDP output occasionally stutters then cuts out when listening to the stream with either VLC, QuickTime for the PC, or RealPlayer. If you read through all of the RFCs you might get an idea of the complexity of the RTP/SDP specifications.

Or not. I don’t know about you but reading those RFCs puts me right to sleep. A slightly easier to digest article on SDP structure can be found here. The article is in regards as to why your technically 100 percent compliant SDP file doesn’t work with Helix Server. To use your SDP file with that server you are required to add optional flags.

Sadly the mostly working SDP file that FFmpeg creates is missing one important item:

a=rtpmap:0 PCMU/9008/1

The “rtpmap” attribute is used to connect or map the audio that is defined in the “m” or “media” section to the network RTP output as well as define the codec (payload type) and the number of audio channels in use if it is an audio stream. This is sort of important for devices, players, or receivers to know what to listen for and how to decode it, especially when there may be two or more streams described in the SDP file.

Playing that modified SDP file fixes everything, at least for VLC and QuickTime:

o=- 0 0 IN IP4
s=Your File Metadata
c=IN IP4
t=0 0
a=tool:libavformat 55.0.100
m=audio 21414 RTP/AVP 0
a=rtpmap:0 PCMU/9008/1

Please note that if you have multiple live streams running that you need to have each SDP file and each encoder configured to use a different port number for the audio. I make sure increase the port number by two in each SDP. For example 21414, 21416, 21418, etc…

Now that you have something that works with a file let us now try with a live source. On Windows you will need to have FFmpeg connect via DirectShow. To find the list of DirectShow devices on your computer the command line shown below will help

ffmpeg -list_devices true -f dshow -i dummy

Now feel free to try it with your audio device.

ffmpeg -f dshow -i audio=”Microphone (HD Pro Webcam C920)” -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

The line above works well for me, especially as FFmpeg now supports crossbar devices.

If you are on Linux you may want to use ALSA to connect to your live feed, but again you need to find the device you want to use first. This will show you the ALSA devices your system has:

$ arecord -L

$ ffmpeg -f alsa -i default:CARD=U0x46d0x809 -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtp rtp://

On a side note you will probably want to host your SDP file or files on a robust web server or perhaps even behind a load balancer. From the logs that I have parsed over the years the Cisco VOIP server retrieves to the SDP file every time a person was put back into the queue. The highest number of connections I recall seeing was around 3,000 per second ,so people that have to support a high volume call center or a large corporation should make themselves well prepared for this behavior by putting up a web server dedicated to delivery their SDP files or several web servers behind a load balancer.

The only way this DDoS effect could be either mitigated or resolved is if the Cisco VOIP server was modified to grab the multicast information in the SDP file, retain it for use among the clients, and then check the multicast SDP file every minute or so in case the structure of the audio feed changed or was updated along with the associated SDP file. Frankly I just don’t see that happening.

And for those few who are interested in what a Scalable Multicast u-Law SDP file that is generated from Helix Server looks like, or for some reason the SDP file format I describe above doesn’t work for you, then look no further than the output below:

o=- 275648743 275648743 IN IP4
s=War Pigs+Luke’s Wall
i=Black Sabbath
c=IN IP4
t=0 0
a=ASMRuleBook:string;”#($Bandwidth >= 0),Stream0Bandwidth = 64000;”
m=audio 21414 RTP/AVP 0
c=IN IP4
a=rtpmap:0 PCMU/8000/1
a=ASMRuleBook:string;”marker=0, AverageBandwidth=64000, Priority=9, timestampdelivery=true;”

2017-07-23 Update:
If you are wanting to deliver your live stream through a streaming server and have your Cisco server pick up an RTSP feed from that streaming server instead then please take a look at the following command. This method is both easier and more reliable than the direct SDP method shown above.

$ ffmpeg -f dshow -i audio=”Microphone (HD Pro Webcam C920)” -acodec pcm_mulaw -b:a 64 -ac 1 -ar 8000 -f rtsp rtsp://username:password@[server_address]:[port]/live/audiostream

Intelligent video encoding

I have been saying this for a few years now. Netflix has finally gotten on the bandwagon.

I worked at RealNetworks for over six years and became their onsite encoding expert for creating H.264 video with AAC audio in an MP4 container using FFmpeg after just three years. Our group was laid off when their Helix Streaming Media Server, which I supported, was discontinued.

I have converted most of my Blu-ray and DVD content, including one HD-DVD, to MP4 files and have found, just as the article says, that not all video is created equal. Why? Movement is expensive. In addition, grain is movement. Please do not get me started on encoding artifacts in the source media. NeatVideo, if you know how to use it, can help with both grain and encoding artifacts without having to resort to sharpening. The use of sharpening is, in my opinion, the refuge of the inept unless the source is so low quality that it looks like a blur. Even then use sparingly only if it is absolutely needed. If you want a challenge run NeatVideo against the movie Fight Club.

As an example, encode for yourself both a high action video and some low action video using x264 using a CRF value of 21 with the veryfast preset and the baseline profile. When you are finished use MediaInfo to look at the bit per pixel density (BPP) of the output video. The action video will have a much higher bitrate and BPP density than the low action video. As such you should target what the video requires.

My procedure for finding a decent bitrate is as follows:

1) Encode the video using the veryfast preset and the baseline profile to grab what the bit per pixel density is at CRF 21.

2) Perform a two pass encode with the medium preset and the high444 profile using the BPP value found in the video. You will see that both the initial CRF encoded video and the two pass video are about the same size and have, obviously, the same BPP density. The output “CRF” value, as reported by FFmpeg, will be about 19.4 due to compression. I have covered this before. Don’t take my word for it, use the Moscow University Video Quality Measurement Tool.

The reason for the medium preset is that mobile devices and other hardware decoders (Roku, Apple TV, etc…) all have limitations on playing H.264 video content that has more than three reference frames. To date I have found no device that cannot handle the high444 profile, which prioritizes the luma (Y’) channel over chrominance (Cb Cr) even though manufacturers state that they only support the main profile with CABAC. The only devices that I have not tested were the old school Blackberry phones.

On a side note, use the information that MediaInfo puts out as well as what FFmpeg puts out to find out what the width, height, and FPS of the source is as well as what the source audio frequency and bitrate are. If you know what you are doing you can detect telecine content in MPEG-PS containers (VOB) so that you do not duplicate frames when encoding. In addition, forcing the frame rate to what the source media says it is will keep the framerate solid. Advanced class is performing automatic crop detection (beware “The Right Stuff” and “Tron Legacy”), and audio normalization if your hearing is poor like mine is.

How will this affect your production workflow? If you decide to implement then not much. All that you need to do is perform a test encode to find the BPP density and then have your MBR content encoded to the same BPP density. If you are converting a series do a test convert of a few episodes and find the right bitrate for you.