Harkeerat Bedi, Research Scientist, Verizon Media, and Scott Yeager, Software Engineer, Verizon Media
To keep up with the growing volume of media content, the Verizon Media Platform has invested in expanding our global cache footprint. In 2019 alone, we added more than 25 Tbps of capacity, added seven global PoPs, and close to 900 last-mile connections. While effective at improving performance, raw capacity by itself is not enough, nor is it a sustainable business model for meeting the ever-growing global demand for streaming content.
To maximize the availability of our network capacity, we invest equally in technologies, processes, and tools that keep operational and infrastructure costs in check. Our research team continually pushes the boundary of caching technologies, applying and refining processes to give our network operators granular control over how, when, and where content is cached.
Modern Caching Strategies
The goal with any caching strategy is to keep the most popular content in the cache while quickly and efficiently removing the less popular content. Over the years, researchers and software developers have devised countless strategies intended to solve the caching challenge. These range from relatively simple to extremely complex. Some of the more popular strategies include:
It would certainly be convenient if there were a single caching strategy to rule all situations. However, such a solution has yet to be developed, and the effectiveness of a particular strategy can vary greatly depending on server and disk sizes, traffic patterns, and any number of other factors. In our case, based on extensive testing, we've found that LRU offers the best compromise between hit-rate and disk I/O, providing 60% fewer writes than FIFO while maintaining high hit rates. Additionally, for the disk sizes used in our CDN, LRU performs on par with more complex policies like S4LRU (Quadruply-segmented LRU). You can get more detail in this paper we published last year at the Passive and Active Measurement Conference (PAM) held in Puerto Varas, Chile.
Evolving Caching Strategy with Hybrid LRU
Even though LRU works very well overall for our environment, we're always looking for ways to drive innovation and improve performance for our customers. This has led to a new capability we recently added to our platform called Hybrid LRU. It's called hybrid because it adds a layer of abstraction on top of LRU. If we don't use the hybrid functionality, the system continues to operate normally, so it's very easy to understand and activate or deactivate.
What we're doing with the hybrid approach is tweaking the LRU system to give us more control over specific pieces of content. By control, we mean having the ability to explicitly store some content for a longer or shorter duration based on predefined settings.
This is important due to changes occurring across the video streaming landscape and, in particular, the rapid growth in live streaming. Our network alone has hosted hundreds of thousands of live events, many of which are delivered to millions of concurrent viewers. Despite the massive popularity of such events, once a live event stream is completed, it's not likely to be re-streamed at any significant volume. With Hybrid LRU, we can specify a shorter cache period, freeing up valuable cache resources for other media and content.
We are experimenting with locking down certain content and providing a best-effort assurance that it will remain in our cache. This can be particularly useful for live video streams where content has a limited shelf life, but may still be in high demand for a few hours following a live event, after which it becomes a normal piece of video on demand content. This functionality can also be used in conditions where a content provider explicitly wants to lock some content for a specific period of time so that it does not hit their origin servers.
Hybrid LRU also allows us to store some content for a longer duration of time. This is useful if the origin is located in a remote part of the world, for example, which can lead to poor QoE when the CDN does not have requested content in its cache. In such cases, a new client request would trigger a cache miss that the origin will need to fill, potentially resulting in rebuffering. By aging this content slower, it will stay in the cache longer and reduce the number of such origin fills.
Hybrid LRU Usage Parameters
Hybrid LRU consists of two tunable parameters that give us the ability to either delay or speed-up the eviction or removal of specific content from our caches:
Aging rate defines the rate of increase in eviction score over time. It's basically a scaling function that operators can use to make a piece of content age faster or slower. The default value for aging rate is 10, so changing this value to 200, for example, will accelerate the aging of the video file by 20 times (200/10 = 20). The value could also be changed to five to age a piece of content at half the speed of the default.
The Time to Live (TTL) parameter reduces the age of an item by a specified amount. It works by giving an extremely low eviction score to an item for the duration set by this variable. This forces an item to stay in the cache for a specified duration since it was last accessed. The default is 0 seconds, which means no special preference.
How these tunable parameters work to adjust how long content stays in cache is shown in the charts below. It's useful to think of these parameters as knobs or dials that can be precisely adjusted to match content demands. The charts show how objects age over time on server caches while they are waiting to be accessed.
First, let's look at Aging Rate. Traditional LRU objects age at the same rate over time. But as we turn up the aging rate dial, items begin to age faster over time. Similarly, when we turn the dial in the opposite direction, items age slower, compared to LRU. Turn the dial enough, and the slow aging items never actually exceed the "eviction threshold," as shown in figure one. With this control, we can either remove items sooner to free up space or keep items on disk longer as needed to reduce origin pulls or for other reasons.
In contrast to Aging Rate, TTL lets us change the cacheability of a particular item. For the duration set using the TTL function, an item does not age while it's on the disk, so it is less likely (even very unlikely) to get evicted. After the TTL expires, the item can then begin to age either in the traditional LRU manner or with fast aging or slow aging (depending on how it has been configured by the operator). In the figure below, TTL with slow aging kept an item on disk to the point where it didn't exceed the cache eviction threshold. At the opposite end, TTL ensured that a live video stream was cached for at least the duration of the event, but after that was quickly removed from disk using fast aging.
In most cases, changing the aging rate value is the preferred method for adjusting the timing for when content is evicted from the cache because it can easily adapt to the amount of traffic on a disk. TTL, on the other hand, is more aggressive and can effectively lock out a portion of a disk until the content is released. However, as these examples illustrate, the two controls can be used together to reliably achieve the desired effect.
Forward Looking Caching Strategies
A broad caching strategy such as LRU is like a big hammer, treating all content regardless of type or file size equally. If a file doesn't get a hit within a certain amount of time, it gets deleted from the cache. Meanwhile, other files (like one-time live video streams/events) that are very unlikely to get hits in the future, sit in the cache taking up space. Hybrid LRU adds a level of refinement with the goal to reduce unnecessary cache footprint and improve cache hit ratio. It's like using a small hammer or screwdriver to more accurately control what files should stay in the cache and which should be removed.
Currently, Hybrid LRU is experimental, and requires an operator to adjust the eviction time frames for content. Looking forward, we're performing research to understand whether request profiles and other factors can be leveraged to make adjustments automatically. Live events, for instance, have much different profiles – thousands of requests for the same file segments coming in at the same time – than video on demand files. We're also looking at making adjustments based on file size – do you want to keep large files on disk to minimize network traffic, or keep smaller files on hand to optimize for cache hit ratio?
Even though we're confident in the performance and maturity of our caching system and strategies, the need to optimize finite resources remains an important and ongoing effort.