There are other ways to proxy live video from a browser to an RTMP endpoint, but what if we wanted to interact with that stream first? And not just interact writing obtuse ffmpeg filters, but just some good ol' HTML and CSS? Let's do that! We'll talk about how you can allow your streamers to go live directly from their browser using headless Chrome and ffmpeg.
Going Live from a Browser...with Another Browser
Hey everybody, my name is Matthew McClure. I'm one of the co-founders of Mux and we do online video infrastructure for developers. So one of our features is an api to live broadcast and that's where we get a ton of questions from developers on how to help their customers go live. They're in a world where they want to just build an application in a browser, let the user just log in and immediately go live without needing to download third-party software like OBS or something like that to be able to do it. Totally makes sense. So but today we're not talking about just going live from the browser. We're talking about going live from the browser via another browser. This is also probably a bad idea for most use cases, but when you need this kind of thing, this can be a really great path forward. So we covered something similar or another path to do this at react Summit. So we're going to quickly recap some of these like high-level concepts just to get on the same page, but if you want more information, you might want to check out that talk as well. You can just find it on YouTube. So common misconception is that live broadcast is the same as live chat. So live chat, you have two browsers that can communicate or a few browsers that can communicate directly to each other, sub-500 milliseconds of latency so they can talk synchronously. Live broadcast, on the other hand, is one-to-many. So you have one input stream out to many viewers, and that can be 50 to a million viewers. Latency can be 10 seconds plus. It's fine because there's not really an expectation to be able to communicate back to that streamer. So because of those constraints, the same technology really doesn't work very well for both of them. For live chat, it's typically powered by browser technologies like WebRTC or proprietary implementations that can allow you to communicate directly between the streamers so that you have as low of latency as possible. Live broadcast, on the other hand, is powered by technologies like RTMP and HLS. RTMP is kind of an old Flash implementation that has become the de facto standard for being able to ingest live content directly into a server, which then will transcode that content and then broadcast out via HLS. We won't get into the specifics of HLS, but for our purposes, it allows you to download video via Git requests on the browser, and you can just scale it as you would any other file transfer. It's really nice. Okay, so let's just take WebRTC and then turn that into RTMP in the browser is probably what you're thinking. Unfortunately, no.
We can't get quite low enough in the network stack in a browser to be able to do it. So even in our current modern world of WASM and all this other goodies, we just can't quite get there.
But let's talk about what technologies we can access. So whatever we're talking about here, it's all involving a server in some way. But the first way is we can take WebRTC and then use a server-side WebRTC implementation. So if you'd asked me a year ago, I just said, this is crazy. This has gotten a lot better. Projects like Pyon have come a really long way. It's a per year ago implementation. So this actually isn't that crazy anymore, but it's still not... It's certainly not the easiest way that you can get this done. And if you want to be able to do anything interesting with the video on the server side via client-side technologies, this would kind of leave you in the cold a little bit.
So to fix that last thing, what if we just took WebRTC, ran it on a server via Chrome? It can be done, but the problem is now you're running Chrome. Or we can take GitUserMedia, which is just a few of the WebRTC APIs that allow you to get the microphone and camera. We'll broadcast that to a server via WebSockets and then encode into RTMP. So you might be thinking, how does that work? Let's go back to this headless Chrome thing. If you want more information on that one, you can talk about the other talk I mentioned. Or you can go watch the other talk I mentioned. So WebRTC to a server-side WebRTC via headless Chrome. Kind of cool. You can just have a chat, one-to-one, few-to-few, have headless Chrome join that chat, broadcast that via RTMP. Really interesting. Like, you want to hide that Chrome in the client of the other chatters, but that Chrome can then lay out the chat interface how it wants, add overlays, anything like that right there. So what about these downsides? You have to run one Chrome per input stream or per output stream. And so you have all the orchestration that comes with that.
So if you use Chrome as your normal browser, you might notice it's resource-intensive. That also applies on the server. The bigger issue, though, is it's not the most beaten path. Like, a lot of people are doing this. They're just not talking about it.
The exception is open-source projects like Jitsi, which if you're not familiar, is like an open-source Zoom competitor. That's how they do one-to-many broadcast or a few-to-many broadcast. So there are a few paths to get this done, which come with a few tradeoffs. One is to do this getUserMedia style approach and then broadcast that to a server via WebSockets. You might be thinking, like, wait, why are we talking about this again? It's not actually getUserMedia. Now we're going to use the Chrome.tabCapture, but it uses a very similar api under the hood. It uses the same internals as the MediaRecorder api, which is what we would use in that implementation. So here we take WebRTC, same process where we have it in the browser that joins, call the tab capture api, broadcast that via WebSocket to a server that encodes it in RTP and goes to the rest of the workflow. Those can be on the same server, but that's kind of the high level.
The pros are that you can actually run Chrome in headless mode, which means you get much more multi-tenant, much easier multi-tenant access. You get all these browser features. We can use the fancy WebSocket workflow. The downside is it still uses the MediaRecorder api, which is kind of a disaster. If there's no changes in the tab, it won't send any frames, which can make encoders barf. And it's the MediaRecorder api, that process for encoding video is really inefficient, so it can use a lot of resources, even more than Chrome itself. So this is probably the most common way you see this done in the wild, surprisingly. However, we think there's a better way. Like what if we just took WebRTC, had a whole Chrome instance that's actually there, captured that, and then broadcast that via RTMP. So here's a really high level script. You can scan over this, but the workflow looks something like this.
You could have a Chrome that is in a headless mode, and a headless mode is a feature that was added to Chrome that allows you to run Chrome without good UI. So you can run Chrome and be able to interact with it without having to actually see it. And it's run here in a headless mode. So you can have your Chrome instance. And then you can have a WebSocket server that listens to that Chrome instance. So you can have the WebSocket server listen to the Chrome instance. And then you can have your WebSocket client and your WebSocket server, which is an RTMP server, which is your WebRTC server. And then you can have an RTMP client, and that could be, for example, OBS. And you can send frames from there, from OBS, to your WebRTC server, and then from your WebRTC server to wherever you want to send it, for example, YouTube.
You start your server. This will give you the static files that you want to serve. Give yourself a screen to work with, actually on the server via XBFB. Capture that screen with FFmpeg, and then start streaming it somewhere. And then open Chrome to a page with your stream in it. And then whatever you do from there is up to your imagination. You want to open Chrome in full screen mode, but that's kind of it. So the pros are that no MediaRecorder api. It's more reliable to stream because you're using FFmpeg, which is designed and built for this. It's a much better tool for this use case. The con is that you need a full Docker container, a full instance for each stream. But once you have this working, you can do whatever you want with the stream before you send it out, which is amazing. Also it's compositing and everything. So if you want more information on the process of doing this, check out Nick's talk from All Things RTC. muxing.com and broadcast him if you want to see the source. Otherwise, thank you so much, everybody. I really appreciate your time. Ask me if you have any questions.