By Grady Player, Manager, Software Dev Engineering, and Pavel Koshevoy, Software Engineer
Today, the vast majority of live and streaming content is either 720p or 1080p. Despite the widespread availability of affordable 4K screens, broadcasters are still struggling with the complex challenges of producing and distributing 4K on an infrastructure designed to handle much lower bitrate HD streams.
Many broadcast companies and production studios, who not that long ago made significant investments in HD, are carefully evaluating the ROI of delivering 4K versus the benefits achieved. Those entering the 4K arena typically only make higher quality content, such as select movies and shows, available at an extra cost. Part of the reticence may also be found in a recent Brightcove study indicating that the move to 4K is only worth the investment if at least 80% of endpoints can consume 4K content. Current estimates are that about one-fifth of viewers have 4K TVs.
For broadcasters, now is the perfect time to gain experience with new formats, compression technologies, and workflows, even as standards continue to evolve and technologies advance. As announced at the NAB Show this year, Verizon Media has begun offering 4K services to our broadcast customers. In this blog, we look at some of the technical challenges we faced around encoding and packaging 4K for live sporting events.
Testing 4K with live events
Given that few broadcast and cable networks are equipped to handle 4K, one of the best ways to gain 4K production, and distribution experience is with OTT streaming of live sporting events. Live 4K streaming is still rare, but that is starting to change. Fox streamed the FIFA Women's World Cup France 2019 in 4K. Streaming a live event is challenging; all of the streaming infrastructure, including ingest, encoding, delivery, and playback must work simultaneously and often scale to meet unpredictable audience demand. Doing this in 4K makes production even harder. But streaming live sports offers broadcasters a way to test and experiment with 4K.
With OTT streaming, major infrastructure changes are not required to offer 4K streams to viewers around the world, either on a subscription basis or with advertising support. For Verizon Media, our network's 82 Tbps capacity is equipped to handle the extra demands of 4K. And for the first time, audiences have devices and bandwidth that can handle the 15–30 Mbps required for 4K streaming.
So what's the holdup? The creation of 4K content does not appear to be the problem. Even though a great deal of content is already being captured in 4K, the reality is that only a small portion of that content makes it to viewers in 4K. The most significant challenges holding back widespread 4K content distribution include the need for new workflows, evolving encoding and colorspace standards and bandwidth.
16x increase in data
Let's start with bandwidth. A typical HD feed has 1920 x 1080 px resolution at 30fps with 8-bit color. In contrast, a 4K UHD feed has 3840 x 2160 resolution with 10-bit color. And for live sporting events, the frame rate doubles to 60fps. From a feed processing perspective, this represents a 16x increase in data density. Each frame in the 4K feed is four times larger, and there are twice as many of them at 60fps. And then because it's 10-bit, each pixel requires twice as many bytes to represent efficiently for processing, resulting in the 16x increase. This massive increase in density has sparked a rise in popularity of the more efficient HEVC (H.265) codec. HEVC delivers the same quality as H.264 at half the bitrate but at the cost of being much more processor intensive. Ultimately, 4K at 60fps (4K60p) uses about 3–4 times more data in HEVC than 1080p at 30fps in H.264, both with 8-bit color.
Making a move to 4K means that this 16x larger stream must be processed and encoded in real time to keep pace with live events. In our case, encoding of the 4K broadcast stream is handled by our Slicer application that runs at the venue or a centralized facility. Once the Slicer receives the feed, it is responsible for HEVC encoding, colorspace mapping, and creation of the highest-bit rate ABR rendition in 4-second chunks. The Slicer then sends the audio/video data along with encoding parameters to cloud-based encoders that create lower bit rate renditions. When players request these assets, manifest servers generate personalized manifests on the fly, including the insertion of ads during designated ad breaks. It's worth noting that the Slicer requires almost no change to a broadcaster's workflow since the only requirement is to send the 4K stream to the Slicer application running on the broadcaster's hardware — everything beyond that is handled on our end.
For all this to work seamlessly, the encoder must be up to the task. This required a new development effort specifically to accommodate 4K60p incoming feeds. Initially, the plan was that broadcasters would send 4K feeds to the Slicer using an 8-bit color space, or 4K60p SDR. Currently, our video distribution pipeline uses 8-bit encoding, in part because the majority of playback devices can only handle 8-bit, and it's important to support the greatest number of players possible. Down the road, we're looking at a variety of options to enable 4K60p HDR, such as offering a 10-bit stream alongside the 8-bit stream.
During the development of the 4K Slicer, the fact that we're still early in the 4K game became abundantly clear. There was very little consistency in the test 4K streams we were receiving from our broadcast partners. Some were in 8-bit 4K SDR, while others were in 10-bit 4K HDR10, and 4K HLG (Hybrid Log Gamma) HDR. There was little consistency. We even received feeds that used 1080i upscaled to 4K, which is not part of the standard. To accommodate variability in the colorspace, the Slicer needed the ability to map 4K HDR to 4K SDR as accurately and as quickly as possible, in addition to running the HEVC codec. In the case of upscaled 1080i, however, that needed to be fixed upstream. Ideally, all the color mapping would take place within the broadcast operations center, but since this isn't always the case, this processing often falls to the Slicer.
Keeping up with real time
As we're working with live streams, a significant challenge we encountered was ensuring that the Slicer's HEVC decoder and encoder could keep pace with the massive 4K60p 10-bit live feed without dropping frames or falling behind. Designed for use on Linux-based servers, the Slicer already had HEVC encoding capability based on FFmpeg, a multimedia framework used to encode and decode media files. However, the software HEVC decoder built into FFmpeg libraries required significant CPU resources to decode 10-bit 4K HEVC in real time. Even with a high-performance server with 16 physical cores (32 logical cores), we weren't always able to keep up with real time using the FFmpeg software decoder, resulting in occasional dropped frames.
Problems like encoding a large video stream involve running the same instructions over large sets of data. This type of work is well suited to the latest NVIDIA GPUs with decoding/encoding chips, which for data center applications allow large numbers of decoding sessions to run in parallel. For the Slicer, we turned to a solution based on NVIDIA GPUs that incorporate the NVDEC chip. A hardware-based decoder available on Pascal and newer NVIDIA GPUs, NVDEC provides fully-accelerated hardware-based video decoding. Now, with this platform in place, we're able to decode 10-bit 4K60p HEVC faster than real time.
The next challenge was how to convert from 10-bit 4K60p HLG or HDR10 to 8-bit 4k60p SDR — a task that is not as simple as chopping off the two least-significant bits from each pixel. This is because the image rendered on the screen wouldn't look as it was intended with such a simplistic approach. Instead, each pixel has to be transformed from the HDR color space to the SDR color space — a computationally expensive task. And all this has to happen on a live feed.
As a starting point, we looked at the filters available in the FFmpeg libraries, including colorspace, and zscale. However, we quickly found out that the colorspace filter does not work with HLG so the only viable option was zscale, which does support HDR/HLG. The zscale filter is a wrapper for the zimg library, which requires a C++11 compiler. That was a hurdle to overcome on its own because a number of our broadcast customers use older operating systems, subscribing to the notion that if it isn't broken, don't fix (or upgrade) it. For some broadcast customers, we needed to provide support for the older Linux distribution CentOS 6 which doesn't have a C++11 compiler. We were able to work around the problem by using some of the more modern development toolsets available from the CentOS Linux Software Collections, while still maintaining backward compatibility.
Faster processing with color lookup tables
Unfortunately, when processing 10-bit HLG 4k60p content, the zscale filter ran approximately 36x slower than real time in our testing. Part of the problem is that the HLG colorspace isn't linear, so you can't simply run a linear transformation matrix on it to convert it to something else. To correct the pixels using the correct gamma calculations, you have to use a log function, which in computers is slow. This process is also conditional, and some range of the pixel values are transformed one way, and anything above that range is transformed a different way. This means that a branching operation is required for every pixel. Branching involves jumping, and in computer terms jumping is slow.
To speed things up, we devised an approach that minimizes slow operations and maximizes fast ones. First, we generate a representative sampling of the color space. This sampling is stored as a small image of 512 pixels by 537 pixels. We then take that image and put it through the zscale filter to produce an output image. We can now use the output image as a color lookup table (CLUT) to go from input pixel values to the output pixel values. We need to look up the pixel values eight times and then perform a trilinear interpolation to produce an accurate approximation of the zscale output. This process is much faster than doing the zscale math of the full image because we replace the branching logic and log function calls with a table lookup and blending operations. In our CLUT implementation, we were careful to allow blending to be implemented using bit shift rather than division operations. Testing showed that with this approach, we were able to speed up HDR to SDR conversion significantly, but we were still approximately 3.5x slower than real time.
The next step toward a workable solution was to port the CLUT filter implementation to NVIDIA's CUDA C API so that the pixel lookup and interpolation could take place on a GPU. CUDA allows developers to use a C-like language to develop high-performance algorithms that can be accelerated by running on thousands of parallel execution units running on GPUs. With the move to NVIDIA GPUs, we were again able to process 4K60p video faster than real time.
Delivering 4K content
The move to 4k is a big one, especially in terms of the amount of data involved — and that is the main factor holding back adoption. Plenty of content is created in 4K and the number of large 4K TVs in homes is growing rapidly as prices continue to drop. The problem is how to get the content out to consumers. The answer involves OTT streaming coupled with a service that minimizes disruption in existing workflows. By overcoming challenges involved with encoding and packaging 4K60p, we're enabling our broadcast customers to deliver live 4K streams out to viewers at scale with our entire feature set of advanced playback options including per viewer session management.