Katherine Merrick, Principal Product Manager
The Flash Crowd and Your Live Stream
As streaming services battle for a limited number of viewers and shrinking attention spans, live events, which are a proven driver of audience engagement, have become an important factor in a publisher's content strategy. However, as much as live streaming can reliably deliver audiences, reliably streaming live events at scale comes with a set of challenges. Content delivery networks (CDNs) can help provide scalability on-demand; however, even the CDNs themselves must be optimized for live streaming.
Perhaps the most obvious live streaming challenge is the "flash crowd" — this phenomenon occurs when many viewers enter into a live stream at once — hungry to catch the kickoff or overtime action. Following typical audience behavior that we've observed by streaming more than 100,000 sporting events, during the NBA finals game 6, viewership grew rapidly from almost nothing at tipoff to a peak of 2.04 million viewers in the 3rd quarter. Viewership jumped from less than 10,000 sessions to over 1 million in the first hour and another 1.5 million after halftime, at times adding upwards of 100,000 new viewers per minute. This kind of rapid scale puts pressure on any CDN. But delivering live video is even more challenging. Any disruption can lead to an interruption in playback.
In this article, we take a look at flash crowds and other challenges, and then explore how Verizon Media leverages its many years of experience in the CDN space to solve some of the common pain points for our customers, making it the right choice for delivering high-quality live events — no matter if it's sports, concerts, politics…or perhaps the first landing on Mars.
Why Live Stream Caching is Different
Delivering live events over the internet involves using one or more ABR (Adaptive bitrate) streaming formats, such as MPEG-DASH, Apple HLS, Adobe HDS, Microsoft Smooth Streaming, and typically relies on standard HTTP web servers as origins and CDNs to distribute the content at scale.
Here at Verizon Media, our large global CDN has many PoPs and more than 88+ Tbps capacity, so we can easily scale to handle large traffic spikes and flash crowds. However, capacity and scale are only part of the equation. What is especially important in the context of live streaming is how well the CDN can interact with both the origin server and the clients while scaling for large viewing audiences that come all at once.
The live traffic profile is unique and distinctly different from anything else, even from VOD, because during live events the encoder is continually publishing new media segments (typical duration is 2–10 seconds) out to the origin server, and the CDN is always fetching that newly-released content and propagating it through the network. This process takes a non-zero amount of time; therefore some latency cannot be avoided. However, it is crucial that the CDN is extremely efficient and smart about how it fills the cache and how it handles client requests during and, even more importantly, before initiating the cache fill process. Ideally, the CDN should be able to keep the load on the origin server down to an absolute minimum, while avoiding adding too much extra latency to the entire media pipeline. This ensures that client-side end users enjoy smooth and continuous playback.
Our CDN has a wide range of features that allow us to maximize origin offload, as well as improve the end-user experience before, during, and after a cache fill process is completed.
Live Stream Cache Optimizations
As shown in the graphic below, our Media Platform employs a series of optimizations, many of which are tunable, to achieve fast and reliable delivery for live streaming. We'll explain why they are important and how they work in the following sections.
First and foremost, Origin Shield is an extra caching layer between the CDN edge servers and the origin. We create a virtual origin in one of the PoPs that, in turn, manages all the requests from all the other PoP locations. When a CDN edge server gets a request from a user and can't satisfy the request from the cache, the edge server fetches the object from the shield PoP rather than pulling from the customer origin directly. As a global CDN, we offer customers the option of assigning a single PoP as the shield or assign a shield PoP per region (U.S., EU, Asia, etc.).
Origin Shield helps us protect the origin server in case of large traffic spikes and flash crowds. However, it may not be enough to deal with the unique traffic profile that comes with live streaming.
Partial Cache Sharing
In live streaming, a typical pattern is for multiple clients to request a segment of the stream that isn't in the cache yet. There are a couple of ways a CDN can go about dealing with these requests. First, it can simply send multiple, simultaneous cache fill requests through to the origin (one per each new client's request), which helps minimize latency and optimize the end-user experience. A second option is to send a single cache fill request that serves the first client without delay, but keep the others waiting until the full file is loaded into cache (this method aims to minimize the load on the origin).
Unfortunately, neither of these options represents a particularly great solution.
Instead, our approach strikes a balance between these two options by allowing a single cache fill that's already inflight to be shared among multiple clients requesting the same piece of content that is already partially in cache. Partial Cache Sharing allows other clients to piggyback off of a pre-existing cache fill, so the video content can be delivered to multiple clients simultaneously as soon as it starts loading into the cache. The result: faster start-up times, lower latency, and reduced origin load.
Cache Fill Wait Time
There is an interval between the time the client requests the video file and when it starts loading into the CDN PoP. This point in time is very small (it may happen in only a few milliseconds), but the live streaming flash crowd makes it a very significant challenge because it could be made up of hundreds or even thousands of requests. In this case, the Partial Cache Sharing feature described above would not have started yet. Typically, this is considered a corner case, but it's more likely to occur with live streaming due to flash crowds. It is at this critical point in time that the CDN could overwhelm the origin by passing too many requests at once.
To prevent this problem, multiple requests for the same file are pooled,and only a single request is made to the origin. Cache Fill Wait Time acts as a virtual waiting room to further improve origin offload and cope with flash crowds. When the HTTP response headers for the single request arrive and that one request starts to receive the file from the origin, the cache can then be shared with all the waiting pooled users. The actual "wait time" (the number of milliseconds) is highly configurable and can be fine-tuned based on specific origin capabilities and customer needs.
Spawn Sub-Request for Miss
When multiple users request the same un-cached content as discussed above, there's a risk that the first client is on a slow device, like a smartphone on a 3G connection. This hurts all the other clients because a cache normally fills at the rate at which the client can absorb the content. To avoid this scenario, we can decouple our cache fill from the potentially slow/failed first client and fill from the origin faster (at our maximum speed). This capability is also more reliable because now the cache fill continues even if the initial client disconnects, or if something causes the connection to drop. We describe this behavior as Spawn Sub-Request For Miss. This feature also triggers a cache fill for the entire piece of content, satisfying different byte-range requests with only one trip to the origin server. Spawn Sub-Request For Miss and Cache Fill Wait Time complement each other in their use, working together to accelerate live streaming and improve metrics such as video time to start and rebuffering.
Other Live Streaming CDN Optimizations
As the viewership of a live stream rapidly expands, the cache servers that previously easily handled the load at 500K viewers are suddenly overwhelmed when the viewership triples or quadruples in a few minutes. Additionally, viewers may be concentrated around a specific geographic area, which is typically the case for a popular sporting or political event. For many live sports broadcasts or championships, viewer concentration is likely to be significantly higher in markets surrounding the participating teams.
When this happens, segments of the live stream need to be quickly replicated to other servers within the impacted PoPs to help spread the load.
Hot Filing refers to the process of automatically detecting and replicating extremely popular content such as segments of a live stream to multiple cache servers in a PoP to handle great demand. This is a common practice among content delivery networks, but it's the speed at which these propagations can happen that ultimately matters. This is an area of ongoing focus for Verizon Media. We recently lowered our replication speed from 5 seconds to about 1–2 seconds. Apart from live streams, we can also make other content hot, such as advertisements, within a live stream.
Capacity and bandwidth
Capacity and bandwidth refer to the extra capacity on tap to meet the unpredictable demands of live streams. Just as there is no substitute for cubic inches when it comes to muscle cars, there's no substitute for bandwidth with CDNs. Putting these and other cache optimization strategies in play require that the network has the capacity and bandwidth to handle large scale live streaming, while still balancing the load placed on it by other users.
Currently, more than 80% of the content on our network is video, with a good portion of that traffic devoted to live streams. We have delivered over 125,000 managed live events on our network. And as content quality continues to improve along with the growing popularity of live streams, we're on track to hit 100 Tbps of network capacity by the end of 2019. Our network features more than 140 global PoPs along with 5,000 interconnects or last-mile connections.
Everything Working Together
Live streaming's heavy demands will push your technology to the limit. Delivering a smooth stream to thousands or even millions requires special caching configurations. The combination of Origin Shield, Partial Cache Sharing, Cache Fill Wait Time, Spawn Sub Request for Miss and Hot Filing are a powerful set of features that can be tailored to your unique live streaming infrastructure and demands. They empower our CDN to deliver the best possible performance for live streaming events regardless of whether the object is already in cache, or it's only partially in cache, or the cache fill has not started yet and still pending — and even in the situation when the request happens to be truly the first client's request for a unique piece of content.
The CDN is an essential component in the live video infrastructure. Its system of distributed servers delivers content to your users as it considers both geographic and network locations, as well as the origin itself to deliver content in the fastest, most reliable way possible. However, the techniques to optimize the CDN for live delivery differ considerably from other applications that also benefit from a CDN, including video on demand (VOD). With the right cache optimizations, however, and plenty of headroom, the CDN is more than up to the task of coping with the fluctuations and variability inherent in live streaming.
Our CDN offers mature, well-proven content distribution capabilities coupled with optimizations that minimize load on the origin server while delivering live streams to viewers at scale. Our live video caching optimizations, many of which are tunable for individual customers, work together to protect viewer demands from overwhelming your video infrastructure.