There are other ways to proxy live video from a browser to an RTMP endpoint, but what if we wanted to interact with that stream first? And not just interact writing obtuse ffmpeg filters, but just some good ol' HTML and CSS? Let's do that! We'll talk about how you can allow your streamers to go live directly from their browser using headless Chrome and ffmpeg.
Going Live from a Browser...with Another Browser
AI Generated Video Summary
This Talk discusses live chat and live broadcast using WebRTC and RTMP. It explores running WebRTC on a server via Chrome and alternative approaches like using GetUserMedia and the Chrome.tabCapture API. The use of a whole Chrome instance for WebRTC and RTMP broadcast is also discussed, highlighting the pros and cons of this approach. The Talk recommends checking out Nick's talk from All Things RTC for more information.
1. Introduction to Live Broadcast and WebRTC
Hey everybody, my name is Matthew McClure. I'm one of the cofounders of Mux, and we do online video infrastructure for developers. Today we're talking about going live from the browser via another browser. Live chat and live broadcast are different in terms of communication and technology. Live chat uses WebRTC for low latency synchronous communication between browsers, while live broadcast uses RTMP and HLS for one-to-many streaming. We can't turn WebRTC into RTMP in the browser, but we can use a server-side WebRTC implementation. However, this approach may not be the easiest or most flexible for video processing on the server side.
Hey everybody, my name is Matthew McClure. I'm one of the cofounders of Mux, and we do online video infrastructure for developers. So one of our features is an API to live broadcast, and that's where we get a ton of questions from developers on how to help their customers go live. They're in a world where they want to just build an application in the browser, let the user just log in and immediately go live without needing to download third-party software like OBS or something like that to be able to do it. Totally makes sense.
But today we're not talking about just going live from the browser, we're talking about going live from the browser via another browser. This is also probably a bad idea for most use cases, but when you need this kind of thing, this can be a really great path forward. So we covered something similar, or another path to do this at React Summit. So we're going to quickly recap some of these high-level concepts, just to get on the same page. But if you want more information, you might want to check out that talk as well. You can just find it on YouTube.
So common misconception is that live broadcast is the same as live chat. So live chat, you have two browsers that can communicate, or a few browsers, that can communicate directly to each other, sub 500 milliseconds of latency so they can talk synchronously. Live broadcast, on the other hand, is one-to-many. So you have one input stream out to many viewers, and that can be 50 to a million viewers. Latency can be 10 seconds plus, it's fine, because there's not really an expectation to be able to communicate back to that streamer. So because of those constraints, the same technology really doesn't work very well for both of them. For a live chat, it's typically powered by browser technologies like WebRTC or proprietary implementations that can allow you to communicate directly between the streamers so that you have as low a latency as possible. Live broadcast, on the other hand, is powered by technologies like RTMP and HLS. RTMP is kind of an old flash implementation that has become the de facto standard for being able to ingest live content directly into a server, which then will transcode that content and broadcast out via HLS. We won't get the specifics of HLS, but for our purposes, it allows you to download video via git requests on the browser, and you can just scale it as you would any other file transfer, which is really nice.
Okay, so let's just take WebRTC and then turn that into RTMP in the browser, is probably what you're thinking. Unfortunately, no, we can't get quite low enough in the network stack in a browser to be able to do it, so even in our current modern world of WASM and all this other goodies, we just can't quite get there. But let's talk about what technologies we can access. So whatever we're talking about here, it's all involving a server in some way, but the first way is we can take WebRTC and then use a server-side WebRTC implementation. So if you'd asked me a year ago, I'd have said, This is crazy, this has gotten a lot better. Projects like Pyon have come a really long way. It's a per year ago implementation. So this actually isn't that crazy anymore, but it's still not, it's certainly not the easiest way that you can get this done. And if you want to be able to do anything interesting with the video on the server side, via client-side technologies, this would kind of leave you in the cold a little bit.
2. Running WebRTC on a Server via Chrome
To fix the issue of running WebRTC on a server via Chrome, an alternative approach is to use GetUserMedia to capture the microphone and camera, broadcast it to a server via WebSockets, and encode it into RTMP. This involves running one Chrome per input or output stream, which can be resource-intensive. However, open source projects like Jitsi have implemented this method for one-to-many or few-to-many broadcasts. Another approach is to use the Chrome.tabCapture API, which has similar internals to the MediaRecorder API. This allows for running Chrome in headless mode, providing easier multi-tenant access and browser features, but still relying on the MediaRecorder API.
So, to fix that last thing, what if we just took WebRTC, ran it on a server via Chrome? It can be done, but the problem is now you're running Chrome. Or we can take GetUserMedia, which is just a few of the WebRTC APIs that allow you to get like the microphone and camera. We'll broadcast that to a server via WebSockets, and then encode it into RTMP.
So, you might be thinking, how does that work? Let's go back to this headless Chrome thing. If you want more information on that one, you can talk about the other talk I mentioned. Or you can go watch the other talk I mentioned. So WebRTC to a server-side WebRTC via headless Chrome. Kind of cool. You can just have a chat, one-to-one, few-to-few. Have headless Chrome join that chat, broadcast that via RTMP. Really interesting. You want to hide that Chrome in the client of the other chatters, but that Chrome can then lay out the chat interface how it wants, add overlays, anything like that, right there.
So what about these downsides? You have to run one Chrome per input stream or per output stream. And so you have all the orchestration that comes with that. So if you use Chrome as your normal browser, you might notice it's resource-intensive. That also applies on the server. The bigger issue, though, is it's not the most beaten path. A lot of people are doing this. They're just not talking about it. The exception is open source projects like Jitsi, which if you're not familiar is like an open source Zoom competitor. That's how they do one-to-many broadcast, or a few-to-many broadcast.
So there are a few paths to get this done, which come with a few tradeoffs. One is to do this getUserMedia style approach and then broadcast that to a server of web sockets. You might be thinking like, wait, why are we talking about this again? It's not actually getUserMedia. Now we're going to use the Chrome.tabCapture, but it uses a very similar API under the hood. It's the same internals as the MediaRecorder API, which is what we would use in that implementation. So here we take WebRTC, the same process where we have it in the browser, that joins, call the TabCapture API, broadcast that via WebSocket to a server that encodes it in RTP and goes to the rest of the workflow. Those can be on the same server, but that's kind of the high level. The pros are that you can actually run Chrome in headless mode, which means you get much more multi-tenant, much easier multi-tenant access, you get all these browser features, we can use the fancy WebSocket workflow. The downside is it still uses the MediaRecorder API, which is kind of a disaster.
3. WebRTC with Chrome Instance and RTMP Broadcast
If there's no changes in the tab, it won't send any frames which can make encoders barf. The MediaRecorder API for encoding video is inefficient and uses a lot of resources. A better way is to use WebRTC with a whole Chrome instance and broadcast via RTMP. The workflow involves starting the server, capturing the screen with FFmpeg, and streaming it. The pros are no media recorder API and more reliable streaming with FFmpeg. The con is the need for a full Docker container or instance for each stream. You have flexibility in manipulating the stream before sending it out. For more information, check out Nick's talk from All Things RTC.
If there's no changes in the tab, it won't send any frames which can make encoders barf, and it's the MediaRecorder API, that process for encoding video is really inefficient, so it can use a lot of resources, even more than Chrome itself. So this is probably the most common way you see this done in the wild, surprisingly. However, we think there's a better way.
What if we just took WebRTC, had a whole Chrome instance that's actually there, and then broadcast that via RTMP? So, here's a really high-level script. We can scan over this, but the workflow looks something like this. You start your server. This will, like, give you the static files that you want to serve. Give yourself a screen to work with, actually on the server via XPFB. Capture that screen with FFmpeg, and then start streaming it somewhere, and then open Chrome to a page with your stream in it, and then whatever you do from there is up to your imagination. You want to open Chrome in, like, full screen mode, but that's kind of it.
So the pros are that no media recorder API. It's more reliable to stream because you're using FFmpeg, which is designed and built for this. It's a much better tool for this use case. The con is that you need a full Docker container or full instance for each stream. But once you have this working, you can do whatever you want with the stream before you send it out, which is amazing. Also, it's compositing and everything. So, if you want more information on, like, the process of doing this, check out Nick's talk from All Things RTC, muxing slash chromium broadcast demo if you want to see the source. Otherwise, thank you so much, everybody. I really appreciate your time. Ask me if you have any questions.