All posts by Renan Dincer

Cloudflare Calls: millions of cascading trees all the way down

Post Syndicated from Renan Dincer original https://blog.cloudflare.com/cloudflare-calls-anycast-webrtc


Following its initial announcement in September 2022, Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Cloudflare Calls lets developers build real-time audio/video apps using WebRTC, and it abstracts away the complexity by turning the Cloudflare network into a singular SFU. In this post, we dig into how we make this possible.

WebRTC growing pains

WebRTC is the only way to send UDP traffic out of a web browser – everything else uses TCP.

As a developer, you need a UDP-based transport layer for applications demanding low latency and real-time feedback, such as audio/video conferencing and interactive gaming. This is because unlike WebSocket and other TCP-based solutions, UDP is not subject to head-of-line blocking, a frequent topic on the Cloudflare Blog.

When building a new video conferencing app, you typically start with a peer-to-peer web application using WebRTC, where clients exchange data directly. This approach is efficient for small-scale demos, but scalability issues arise as the number of participants increases. This is because the amount of data each client must transmit grows substantially, following an almost exponential increase relative to the number of participants, as each client needs to send data to n-1 other clients.

Selective Forwarding Units (SFUs) play pivotal roles in scaling WebRTC applications. An SFU functions by receiving multiple media or data flows from participants and deciding which streams should be forwarded to other participants, thus acting as a media stream routing hub. This mechanism significantly reduces bandwidth requirements and improves scalability by managing stream distribution based on network conditions and participant needs. Even though it hasn’t always been this way from when video calling on computers first became popular, SFUs are often found in the cloud, rather than home computers of clients, because of superior connectivity offered in a data center.

A modern audio/video application thus quickly becomes complicated with the addition of this server side element. Since all clients connect to this central SFU server, there are numerous things to consider when you’re architecting and scaling a real-time application:

  • How close is the SFU server location(s) to the end user clients, how is a client assigned to a server?
  • Where is the SFU hosted, and if it’s hosted in the cloud, what are the egress costs from VMs?
  • How many participants can fit in a “room”? Are all participants sending and receiving data? With cameras on? Audio only?
  • Some SFUs require the use of custom SDKs. Which platforms do these run on and are they compatible with the application you’re trying to build?
  • Monitoring/reliability/other issues that come with running infrastructure

Some of these concerns, and the complexity of WebRTC infrastructure in general, has made the community look in different directions. However, it is clear that in 2024, WebRTC is alive and well with plenty of new and old uses. AI startups build characters that converse in real time, cars leverage WebRTC to stream live footage of their cameras to smartphones, and video conferencing tools are going strong.

WebRTC has been interesting to us for a while. Cloudflare Stream implemented WHIP and WHEP WebRTC video streaming protocols in 2022, which remain the lowest latency way to broadcast video. OBS Studio implemented WHIP broadcasting support as have a variety of software and hardware vendors alongside Cloudflare. In late 2022, we launched Cloudflare Calls in closed beta. When we blogged about it back then, we were very impressed with how WebRTC fared, and spoke to many customers about their pain points as well as creative ideas the existing browser APIs can foster. We also saw other WebRTC-based apps like Clubhouse rise in popularity and Twitter Spaces play a role in popular culture. Today, we see real-time applications of a different sort. Many AI projects have impressive demos with voice/video interactions. All of these apps are built with the same WebRTC APIs and system architectures.

We are confident that Cloudflare Calls is a new kind of WebRTC infrastructure you should try. When we set out to build Cloudflare Calls, we had a few ideas that we weren’t sure would work, but were worth trying:

  • Build every WebRTC component on Anycast with a single IP address for DTLS, ICE, STUN, SRTP, SCTP, etc.
  • Don’t force an SDK – WebRTC APIs by themselves are enough, and allow for the most novel uses to shine, because best developers always find ways to hit the limits of SDKs.
  • Deploy in all 310+ cities Cloudflare operates in – use every Cloudflare server, not just a subset
  • Exchange offer and answer over HTTP between Cloudflare and the WebRTC client. This way there is only a single PeerConnection to manage.

Now we know this is all possible, because we made it happen, and we think it’s the best experience a developer can get with pure WebRTC.

Is Cloudflare Calls a real SFU?

Cloudflare is in the business of having computers in numerous places. Historically, our core competency was operating a caching HTTP reverse proxy, and we are very good at this. With Cloudflare Calls, we asked ourselves “how can we build a large distributed system that brings together our global network to form one giant stateful system that feels like a single machine?”

When using Calls, every PeerConnection automatically connects to the closest Cloudflare data center instead of a single server. Rather than connecting every client that needs to communicate with each other to a single server, anycast spreads out connections as much as possible to minimize last mile latency sourced from your ISP between your client and Cloudflare.

It’s good to minimize last mile latency because after the data enters Cloudflare’s control, the underlying media can be managed carefully and routed through the Cloudflare backbone. This is crucial for WebRTC applications where millisecond delays can significantly impact user experience. To give you a sense about latency between Cloudflare’s data centers and end-users, about 95% of the Internet connected population is within 50ms of a Cloudflare data center. As I write this, I am about 20ms away, but in the past, I have been lucky enough to be connected to a **great** home Wi-Fi network less than 1ms away in Manhattan. “But you are just one user!” you might be thinking, so here is a chart from Cloudflare Radar showing recent global latency measurements:

This setup allows more opportunities for packets lost to be replied with retransmissions closer to users, more opportunities for bandwidth adjustments.

Eliminating SFU region selection

A traditional challenge in WebRTC infrastructure involves the manual selection of Selective Forwarding Units (SFUs) based on geographic location to minimize latency. Some systems solve this problem by selecting a location for the SFU after the first user joins the “room”. This makes routing inefficient when the rest of the participants in the conversation are clustered elsewhere. The anycast architecture of Calls eliminates this issue. When a client initiates a connection, BGP dynamically determines the closest data center. Each selected server only becomes responsible for the PeerConnection of the clients closest to it.

One might see this is actually a simpler way of managing servers, as there is no need to maintain a layer of WebRTC load balancing for traffic or CPU capacity between servers. However, anycast has its own challenges, and we couldn’t take a laissez-faire approach.

Steps to establishing a PeerConnection

One of the challenging parts in assigning a server to a client PeerConnection is supporting dual stack networking for backwards compatibility with clients that only support the old version of the Internet Protocol, IPv4.

Cloudflare Calls uses a single IP address per protocol, and our L4 load balancer directs packets to a single server per client by using the 4-tuple {client IP, client port, destination IP, destination port} hashing. This means that every ICE connectivity check packet arrives at different servers: one for IPv4 and one for IPv6.

ICE is not the only protocol used for WebRTC; there is also STUN and TURN for connectivity establishment. Actual media bits are encrypted using DTLS, which carries most of the data during a session.

DTLS packets don’t have any identifiers in them that would indicate they belong to a specific connection (unlike QUIC’s connection ID field), so every server should be able to handle DTLS packets and get the necessary certificates to be able to decrypt them for processing. DTLS encryption is negotiated at the SDP layer using the HTTPS API.

The HTTPS API for Calls also lands on a different server than DTLS and ICE connectivity checks. Since DTLS packets need information from the SDP exchanged using the HTTPS API, and ICE connectivity checks depend on the HTTPS API for userFragment and password fields in the connectivity check packets, it would be very useful for all of these to be available in one server. Yet in our setup, they’re not.

Fippo and Gustavo of WebRTCHacks complained (gracefully noted) about slow replies to ICE connectivity checks in their great article as they were digging into our WHIP implementation right around our announcement in 2022:

Looking at the Wireshark dumps we see a surprisingly large amount of time pass between the first STUN request and the first STUN response – it was 1.8 seconds in the screenshot below.

In other tests, it was shorter, but still 600ms long.

After that, the DTLS packets do not get an immediate response, requiring multiple attempts. This ultimately leads to a call setup time of almost three seconds – way above the global average of 800ms Fippo has measured previously (for the complete handshake, 200ms for the DTLS handshake). For Cloudflare with their extensive network, we expected this to be way below that average.

Gustavo and Fippo observed our solution to this problem of different parts of the WebRTC negotiation landing on different servers. Since Cloudflare Calls unbundles the WebRTC protocol to make the entire network act like a single computer, at this critical moment, we need to form consensus across the network. We form consensus by configuring every server to handle any incoming PeerConnection just in time. When a packet arrives, if the server doesn’t know about it, it quickly learns about the negotiated parameters from another server, such as the ufrag and the DTLS fingerprint from the SDP, and responds with the appropriate response.

Getting faster

Even though we’ve sped up the process of forming consensus across the Cloudflare network, any delays incurred can still have weird side effects. For example, up until a few months ago, delays of a few hundred milliseconds caused slow connections in Chrome.

A connectivity check packet delayed by a few hundred milliseconds signals to Chrome that this is a high latency network, even though every other STUN message after that was replied to in less than 5-10ms. Chrome thus delays sending a USE-CANDIDATE attribute in the responses for a few seconds, degrading the user experience.

Fortunately, Chrome also sends DTLS ClientHello before USE-CANDIDATE (behavior we’ve seen only on Chrome), so to help speed up Chrome, Calls uses DTLS packets in place of STUN packets with USE-CANDIDATE attributes.

After solving this issue with Chrome, PeerConnections globally now take about 100-250ms to get connected. This includes all consensus management, STUN packets, and a complete DTLS handshake.

Sessions and Tracks are the building blocks of Cloudflare’s SFU, not rooms

Once a PeerConnection is established to Cloudflare, we call this a Session. Many media Tracks or DataChannels can be published using a single Session, which returns a unique ID for each. These then can be subscribed to over any other PeerConnection anywhere around the world using the unique ID. The tracks can be published or subscribed anytime during the lifecycle of the PeerConnection.

In the background, Cloudflare takes care of scaling through a fan-out architecture with cascading trees that are unique per track. This structure works by creating a hierarchy of nodes where the root node distributes the stream to intermediate nodes, which then fan out to end-users. This significantly reduces the bandwidth required at the source and ensures scalability by distributing the load across the network. This simple but powerful architecture allows developers to build anything from 1:1 video calls to large 1:many or many:many broadcasting scenarios with Calls.

There is no “room” concept in Cloudflare Calls. Each client can add as many tracks into a PeerConnection as they’d like. The limit is the bandwidth available between Cloudflare and the client, which is practically limited by the client side every time. The signaling or the concept of a “room” is left to the application developer, who can choose to pull as many tracks as they’d like from the tracks they have pushed elsewhere into a PeerConnection. This allows developers to move participants into breakout rooms and then back into a plenary room, and then 1:1 rooms while keeping the same PeerConnection and MediaTracks active.

Cloudflare offers an unopinionated approach to bandwidth management, allowing for greater control in customizing logic to suit your business needs. There is no active bandwidth management or restriction on the number of tracks. The WebRTC Stats API provides a standardized way to access data on packet loss and possible congestion, enabling you to incorporate client-side logic based on this information. For instance, if poor Wi-Fi connectivity leads to degraded service, your front-end could inform the user through a notice and automatically reduce the number of video tracks for that client.

“NACK shield” at the edge

The Internet can’t guarantee timely and orderly delivery of packets, leading to the necessity of retransmission mechanisms, particularly in protocols like TCP. This ensures data eventually reaches its destination, despite possible delays. Real-time systems, however, need special consideration of these delays. A packet that is delayed past its deadline for rendering on the screen is worthless, but a packet that is lost can be recovered if it can be retransmitted within a very short period of time, on the order of milliseconds. This is where NACKs come to play.

A WebRTC client receiving data constantly checks for packet loss. When one or more packets don’t arrive at the expected time or a sequence number discontinuity is seen on the receiving buffer, a special NACK packet is sent back to the source in order to ask for a packet retransmission.

In a peer-to-peer topology, if it receives a NACK packet, the source of the data has to retransmit packets for every participant. When an SFU is used, the SFU could send NACKs back to source, or keep a complex buffer for each client to handle retransmissions.

This gets more complicated with Cloudflare Calls, since both the publisher and the subscriber connect to Cloudflare, likely to different servers and also probably in different locations. In addition, there is a possibility of other Cloudflare data centers in the middle, either through Argo, or just as part of scaling to many subscribers on the same track.

It is common for SFUs to backpropagate NACK packets back to the source, losing valuable time to recover packets. Calls goes beyond this and can handle NACK packets in the location closest to the user, which decreases overall latency. The latency advantage gives more chance for the packet to be recovered compared to a centralized SFU or no NACK handling at all.

Since there is possibly a number of Cloudflare data centers between clients, packet loss within the Cloudflare network is also possible. We handle this by generating NACK packets in the network. With each hop that is taken with the packets, the receiving end can generate NACK packets. These packets are then recovered or backpropagated to the publisher to be recovered.

Cloudflare Calls does TURN over Anycast too

Separately from the SFU, Calls also offers a TURN service. TURN relays act as relay points for traffic between WebRTC clients like the browser and SFUs, particularly in scenarios where direct communication is obstructed by NATs or firewalls. TURN maintains an allocation of public IP addresses and ports for each session, ensuring connectivity even in restrictive network environments.

Cloudflare Calls’ TURN service supports a few ports to help with misbehaving middleboxes and firewalls:

  • TURN-over-UDP over port 3748 (standard), and also port 53
  • TURN-over-TCP over ports 3748 and 80
  • TURN-over-TLS over ports 5349 and 443

TURN works the same way as Calls, available over anycast and always connecting to the closest datacenter.

Pricing and how to get started

Cloudflare Calls is now in open beta and available in your Cloudflare Dashboard. Depending on your use case, you can set up an SFU application and/or a TURN service with only a few clicks.

To kick off its open beta phase, Calls is available at no cost for a limited time. Starting May 15, 2024, customers will receive the first terabyte each month for free, with any usage beyond that charged at $0.05 per real-time gigabyte. Beta customers will be provided at least 30 days to upgrade from the free beta to a paid subscription. Additionally, there are no charges for in-bound traffic to Cloudflare. For volume pricing, talk to your account manager.

Cloudflare Calls is ideal if you are building new WebRTC apps. If you have existing SFUs or TURN infrastructure, you may still consider using Calls alongside your existing infrastructure. Building a bridge to Calls from other places is not difficult as Cloudflare Calls supports standard WebRTC APIs and acts like just another WebRTC peer.

We understand that getting started with a new platform is difficult, so we’re also open sourcing our internal video conferencing app, Orange Meets. Orange Meets supports small and large conference calls by maintaining room state in Workers Durable Objects. It has screen sharing, client-side noise-canceling, and background blur. It is written with TypeScript and React and is available on GitHub.

We’re hiring

We think the current state of Cloudflare Calls enables many use cases. Calls already supports publishing and subscribing to media tracks and DataChannels. Soon, it will support features like simulcasting.

But we’re just scratching the surface and there is so much more to build on top of this foundation.

If you are passionate about WebRTC (and other real-time protocols!!), the Media Platform team building the Calls product at Cloudflare is hiring and would love to talk to you.

Bringing the best live video experience to Cloudflare Stream with AV1

Post Syndicated from Renan Dincer original https://blog.cloudflare.com/av1-cloudflare-stream-beta/

Bringing the best live video experience to Cloudflare Stream with AV1

Bringing the best live video experience to Cloudflare Stream with AV1

Consumer hardware is pushing the limits of consumers’ bandwidth.

VR headsets support 5760 x 3840 resolution — 22.1 million pixels per frame of video. Nearly all new TVs and smartphones sold today now support 4K — 8.8 million pixels per frame. It’s now normal for most people on a subway to be casually streaming video on their phone, even as they pass through a tunnel. People expect all of this to just work, and get frustrated when it doesn’t.

Consumer Internet bandwidth hasn’t kept up. Even advanced mobile carriers still limit streaming video resolution to prevent network congestion. Many mobile users still have to monitor and limit their mobile data usage. Higher Internet speeds require expensive infrastructure upgrades, and 30% of Americans still say they often have problems simply connecting to the Internet at home.

We talk to developers every day who are pushing up against these limits, trying to deliver the highest quality streaming video without buffering or jitter, challenged by viewers’ expectations and bandwidth. Developers building live video experiences hit these limits the hardest — buffering doesn’t just delay video playback, it can cause the viewer to get out of sync with the live event. Buffering can cause a sports fan to miss a key moment as playback suddenly skips ahead, or find out in a text message about the outcome of the final play, before they’ve had a chance to watch.

Today we’re announcing a big step towards breaking the ceiling of these limits — support in Cloudflare Stream for the AV1 codec for live videos and their recordings, available today to all Cloudflare Stream customers in open beta. Read the docs to get started, or watch an AV1 video from Cloudflare Stream in your web browser. AV1 is an open and royalty-free video codec that uses 46% less bandwidth than H.264, the most commonly used video codec on the web today.

What is AV1, and how does it improve live video streaming?

Every piece of information that travels across the Internet, from web pages to photos, requires data to be transmitted between two computers. A single character usually takes one byte, so a two-page letter would be 3600 bytes or 3.6 kilobytes of data transferred.

One pixel in a photo takes 3 bytes, one each for red, green and blue in the pixel. A 4K photo would take 8,294,400 bytes, or 8.2 Megabytes. A video is like a photo that changes 30 times a second, which would make almost 15 Gigabytes per minute. That’s a lot!

To reduce the amount of bandwidth needed to stream video, before video is sent to your device, it is compressed using a codec. When your device receives video, it decodes this into the pixels displayed on your screen. These codecs are essential to both streaming and storing video.

Video compression codecs combine multiple advanced techniques, and are able to compress video to one percent of the original size, with your eyes barely noticing a difference. This also makes video codecs computationally intensive and hard to run. Smartphones, laptops and TVs have specific media decoding hardware, separate from the main CPU, optimized to decode specific protocols quickly, using the minimum amount of battery life and power.

Every few years, as researchers invent more efficient compression techniques, standards bodies release new codecs that take advantage of these improvements. Each generation of improvements in compression technology increases the requirements for computers that run them. With higher requirements, new chips are made available with increased compute capacity. These new chips allow your device to display higher quality video while using less bandwidth.

AV1 takes advantage of recent advances in compute to deliver video with dramatically fewer bytes, even compared to other relatively recent video protocols like VP9 and HEVC.

AV1 leverages the power of new smartphone chips

One of the biggest developments of the past few years has been the rise of custom chip designs for smartphones. Much of what’s driven the development of these chips is the need for advanced on-device image and video processing, as companies compete on the basis of which smartphone has the best camera.

This means the phones we carry around have an incredible amount of compute power. One way to think about AV1 is that it shifts work from the network to the viewer’s device. AV1 is fewer bytes over the wire, but computationally harder to decode than prior formats. When AV1 was first announced in 2018, it was dismissed by some as too slow to encode and decode, but smartphone chips have become radically faster in the past four years, more quickly than many saw coming.

AV1 hardware decoding is already built into the latest Google Pixel smartphones as part of the Tensor chip design. The Samsung Exynos 2200 and MediaTek Dimensity 1000 SoC mobile chipsets both support hardware accelerated AV1 decoding. It appears that Google will require that all devices that support Android 14 support decoding AV1. And AVPlayer, the media playback API built into iOS and tvOS, now includes an option for AV1, which hints at future support. It’s clear that the industry is heading towards hardware-accelerated AV1 decoding in the most popular consumer devices.

With hardware decoding comes battery life savings — essential for both today’s smartphones and tomorrow’s VR headsets. For example, a Google Pixel 6 with AV1 hardware decoding uses only minimal battery and CPU to decode and play our test video:

Bringing the best live video experience to Cloudflare Stream with AV1

AV1 encoding requires even more compute power

Just as decoding is significantly harder for end-user devices, it is also significantly harder to encode video using AV1. When AV1 was announced in 2018, many doubted whether hardware would be able to encode it efficiently enough for the protocol to be adopted quickly enough.

To demonstrate this, we encoded the 4K rendering of Big Buck Bunny (a classic among video engineers!) into AV1, using an AMD EPYC 7642 48-Core Processor with 256 GB RAM. This CPU continues to be a workhorse of our compute fleet, as we have written about previously. We used the following command to re-encode the video, based on the example in the ffmpeg AV1 documentation:

ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -c:v libaom-av1 -crf 30 -b:v 0 -strict -2 av1_test.mkv

Using a single core, encoding just two seconds of video at 30fps took over 30 minutes. Even if all 48 cores were used to encode, it would take at minimum over 43 seconds to encode just two seconds of video. Live encoding using only CPUs would require over 20 servers running at full capacity.

Special-purpose AV1 software encoders like rav1e and SVT-AV1 that run on general purpose CPUs can encode somewhat faster than libaom-av1 with ffmpeg, but still consume a huge amount of compute power to encode AV1 in real-time, requiring multiple servers running at full capacity in many scenarios.

Cloudflare Stream encodes your video to AV1 in real-time

At Cloudflare, we control both the hardware and software on our network. So to solve the CPU constraint, we’ve installed dedicated AV1 hardware encoders, designed specifically to encode AV1 at blazing fast speeds. This end to end control is what lets us encode your video to AV1 in real-time. This is entirely out of reach to most public cloud customers, including the video infrastructure providers who depend on them for compute power.

Encoding in real-time means you can use AV1 for live video streaming, where saving bandwidth matters most. With a pre-recorded video, the client video player can fetch future segments of video well in advance, relying on a buffer that can be many tens of seconds long. With live video, buffering is constrained by latency — it’s not possible to build up a large buffer when viewing a live stream. There is less margin for error with live streaming, and every byte saved means that if a viewer’s connection is interrupted, it takes less time to recover before the buffer is empty.

Stream lets you support AV1 with no additional work

AV1 has a chicken or the egg dilemma. And we’re helping solve it.

Companies with large video libraries often re-encode their entire content library to a new codec before using it. But AV1 is so computationally intensive that re-encoding whole libraries has been cost prohibitive. Companies have to choose specific videos to re-encode, and guess which content will be most viewed ahead of time. This is particularly challenging for apps with user generated content, where content can suddenly go viral, and viewer patterns are hard to anticipate.

This has slowed down the adoption of AV1 — content providers wait for more devices to support AV1, and device manufacturers wait for more content to use AV1. Which will come first?

With Cloudflare Stream there is no need to manually trigger re-encoding, re-upload video, or manage the bulk encoding of a large video library. This is a unique approach that is made possible by integrating encoding and delivery into a single product — it is not possible to encode on-demand using the old way of encoding first, and then pointing a CDN at a bucket of pre-encoded files.

We think this approach can accelerate the adoption of AV1. Consider a video app with millions of minutes of user-generated video. Most videos will never be watched again. In the old model, developers would have to spend huge sums of money to encode upfront, or pick and choose which videos to re-encode. With Stream, we can help anyone incrementally adopt AV1, without re-encoding upfront. As we work towards making AV1 Generally Available, we’ll be working to make supporting AV1 simple and painless, even for videos already uploaded to Stream, with no special configuration necessary.

Open, royalty-free, and widely supported

At Cloudflare, we are committed to open standards and fighting patent trolls. While there are multiple competing options for new video codecs, we chose to support AV1 first in part because it is open source and has royalty-free licensing.

Other encoding codecs force device manufacturers to pay royalty fees in order to adopt their standard in consumer hardware, and have been quick to file lawsuits against competing video codecs. The group behind the open and royalty-free VP8 and VP9 codecs have been pushing back against this model for more than a decade, and AV1 is the successor to these codecs, with support from all the biggest technology companies, both software and hardware. Beyond its technical accomplishments, AV1 is a clear message from the industry that the future of video encoding should be open, royalty-free, and free from patent litigation.

Try AV1 right now with your live stream or live recording

Support for AV1 is currently in open beta. You can try using AV1 on your own live video with Cloudflare Stream right now — just add the ?betaCodecSuggestion=av1 query parameter to the HLS or DASH manifest URL for any live stream or live recording created after October 1st in Cloudflare Stream. Read the docs to get started. If you don’t yet have a Cloudflare account, you can sign up here and start using Cloudflare Stream in just a few minutes.

We also have a recording of a live video, encoded using AV1, that you can watch here. Note that Safari does not yet support AV1.

We encourage you to try AV1 with your test streams, and we’d love your feedback. Join our Discord channel and tell us what you’re building, and what kinds of video you’re interested in using AV1 with. We’d love to hear from you!

Stream now supports SRT as a drop-in replacement for RTMP

Post Syndicated from Renan Dincer original https://blog.cloudflare.com/stream-now-supports-srt-as-a-drop-in-replacement-for-rtmp/

Stream now supports SRT as a drop-in replacement for RTMP

Stream now supports SRT as a drop-in replacement for RTMP

SRT is a new and modern live video transport protocol. It features many improvements to the incumbent popular video ingest protocol, RTMP, such as lower latency, and better resilience against unpredictable network conditions on the public Internet. SRT supports newer video codecs and makes it easier to use accessibility features such as captions and multiple audio tracks. While RTMP development has been abandoned since at least 2012, SRT development is maintained by an active community of developers.

We don’t see RTMP use going down anytime soon, but we can do something so authors of new broadcasting software, as well as video streaming platforms, can have an alternative.

Stream now supports SRT as a drop-in replacement for RTMP

Starting today, in open beta, you can use Stream Connect as a gateway to translate SRT to RTMP or RTMP to SRT with your existing applications. This way, you can get the last-mile reliability benefits of SRT and can continue to use the RTMP service of your choice. It’s priced at $1 per 1,000 minutes, regardless of video encoding parameters.

You can also use SRT to go live on Stream Live, our end-to-end live streaming service to get HLS and DASH manifest URLs from your SRT input, and do simulcasting to multiple platforms whether you use SRT or RTMP.

Stream’s SRT and RTMP implementation supports adding or removing RTMP or SRT outputs without having to restart the source stream, scales to tens of thousands of concurrent video streams per customer and runs on every Cloudflare server in every Cloudflare location around the world.

Go live like it’s 2022

When we first started developing live video features on Cloudflare Stream earlier last year we had to decide whether to reimplement an old and unmaintained protocol, RTMP, or focus on the future and start off fresh by using a modern protocol. If we launched with RTMP, we would get instant compatibility with existing clients but would give up features that would greatly improve performance and reliability. Reimplementing RTMP would also mean we’d have to handle the complicated state machine that powers it, demux the FLV container, parse AMF and even write a server that sends the text “Genuine Adobe Flash Media Server 001” as part of the RTMP handshake.

Stream now supports SRT as a drop-in replacement for RTMP

Even though there were a few new protocols to evaluate and choose from in this project, the dominance of RTMP was still overwhelming. We decided to implement RTMP but really don’t want anybody else to do it again.

Eliminate head of line blocking

A common weakness of TCP when it comes to low latency video transfer is head of line blocking. Imagine a camera app sending videos to a live streaming server. The camera puts every frame that is captured into packets and sends it over a reliable TCP connection. Regardless of the diverse set of Internet infrastructure it may be passing through, TCP makes sure all packets get delivered in order (so that your video frames don’t jump around) and reliably (so you don’t see any parts of the frame missing). However, this type of connection comes at a cost. If a single packet is dropped, or lost in the network somewhere between two endpoints like it happens on mobile network connections or wifi often, it means the entire TCP connection is brought to a halt while the lost packet is found and re-transmitted. This means that if one frame is suddenly missing, then everything that would come after the lost video frame needs to wait. This is known as head of line blocking.

RTMP experiences head of line blocking because it uses a TCP connection. Since SRT is a UDP-based protocol, it does not experience head of line blocking. SRT features packet recovery that is aware of the low-latency and high reliability requirements of video. Similar to QUIC, it achieves this by implementing its own logic for a reliable connection on top of UDP, rather than relying on TCP.

SRT solves this problem by waiting only a little bit, because it knows that losing a single frame won’t be noticeable by the end viewer in the majority of cases. The video moves on if the frame is not re-transmitted right away. SRT really shines when the broadcaster is streaming with less-than-stellar Internet connectivity. Using SRT means fewer buffering events, lower latency and a better overall viewing experience for your viewers.

RTMP to SRT and SRT to RTMP

Comparing SRT and RTMP today may not be that useful for the most pragmatic app developers. Perhaps it’s just another protocol that does the same thing for you. It’s important to remember that even though there might not be a big improvement for you today, tomorrow there will be new video use cases that will benefit from a UDP-based protocol that avoids head of line blocking, supports forward error correction and modern codecs beyond H.264 for high-resolution video.

Switching protocols requires effort from both software that sends video and software that receives video. This is a frustrating chicken-or-the-egg problem. A video streaming service won’t implement a protocol not in use and clients won’t implement a protocol not supported by streaming services.

Starting today, you can use Stream Connect to translate between protocols for you and deprecate RTMP without having to wait for video platforms to catch up. This way, you can use your favorite live video streaming service with the protocol of your choice.

Stream is useful if you’re a live streaming platform too! You can start using SRT while maintaining compatibility with existing RTMP clients. When creating a video service, you can have Stream Connect to terminate RTMP for you and send SRT over to the destination you intend instead.

SRT is already implemented in software like FFmpeg and OBS. Here’s how to get it working from OBS:

Stream now supports SRT as a drop-in replacement for RTMP

Get started with signing up for Cloudflare Stream and adding a live input.

Protocol-agnostic Live Streaming

We’re working on adding support for more media protocols in addition to RTMP and SRT. What would you like to see next? Let us know! If this post vibes with you, come work with the engineers building with video and more at Cloudflare!