Synchronizing multiple streams, particularly multiple live streams, and especially when the audio is critical too, can be incredibly difficult. How, then, could you build an experience where you have a few camera angles and you want a viewer to be able to seamlessly pick one? We'll walk through a hack technique that allows you to do just that without ever having to worry about synchronization.
Remember CSS Sprites? Let's Do That with Video!
Hey everyone, today I want to talk about something a little hacky I've been thinking about lately, but the idea here is that we want to use video sprites in the same way that a lot of you might have used CSS sprites back in the day. So we'll talk about what I mean by a lot of this. But hello, I'm Matt McClure, I'm one of the co-founders of a company called Mux, and I run the developer experience org there. And in a nutshell, we build awesome video infrastructure for developers. So if you're looking for a video API, we're there. So check us out. So okay, taking a step back, what is a sprite? So in a nutshell, this is an image with a bunch of images in it. So the client gets this big combined image, and then can just pick and choose which parts of it to show. So if you're relatively new, then you might not have seen this as widely used, but this is a really common optimization technique from the early aughts. So if you had a button with hover states and depressed states and active states, then you would just send one button image and then use your CSS background to decide which one of that image to show. So a little bit more common if you're kind of started there. You might remember this from back then. But to show this a little bit more concretely, a lot of people know about this from video game sprites. All of Mario's states are in one image, and then the client viewport just shows the state of Mario at once. So you might be wondering what in the world does this have to do with video? The gist here is that the same technique works. You can send a few videos combined into one, and then just show the one you care about in the player. So you might be wondering why in the world would we be doing this? And I would say, you know, use your imagination. There could be a bunch of examples of this, like I think sports, sporting events, concerts. But the biggest example that comes to mind and what we see the most from customers wanting to do stuff like this is around feed synchronization, particularly around being able to pick between these different feeds in a synchronized way. So let's say you're streaming live concerts, live music. So you've got a bunch of different cameras streaming a bunch of different angles. So one's on the drummer, one's on the singer, one's on the audience. And a producer on site is typically deciding which one of those feeds to show at any given time. So they might do a nice little transition, go from the drummer to the audience, et cetera. That producer then sends a feed to an encoder service or whatever else that looks like. I'm using Mux's example here for obvious reasons. But then that service then broadcasts that to all of your end viewers. So then those viewers start saying, like, actually, I just want to watch the drummer all the time, and I hate the transitions that this producer is doing. So they want the power to be able to pick which feed they watch. So you decide, okay, how can we go about building this out? So you start thinking, okay, I'll send every camera directly to that encoder or broadcast service. And then every viewer can get all the feeds, in this example, three feeds. And this is where things really get hairy if you start going down this path. So now you've got three different live streams that people can watch, but how do you switch between them? So do people just click another feed, and then, you know, you might be a few seconds off in terms of audio for all of them. So it can be tough to synchronize that in the client, or honestly, next to impossible to do that well. So one solution would just be to send one video again. So like you were doing before, but instead of that video being produced, you just combine all the feeds at that level and then send them along. So in this example, all the cameras go into that one encoder box locally. It just lays them out in four quadrants, sends those to the encoder or broadcaster service, and that goes out to all the viewers. And then from there, the viewers can then pick which one they want. So now you're guaranteed that your feeds are synced. You only have to worry about one audio stream that's shared between all of them for obvious reasons, and then you only show the quadrant of the video that the viewer selects at any given time. So how might this work detail-wise? So all this code's on GitHub. I would suggest checking it out there. But at a high level, you want to grab the coordinates of the feed that you want to show. So I have the feeds just named like zero through three. And you want to lay those coordinates out. So it's in this example, this array is the source X, source Y position, source width, and source height. So what you want to chop. So zero, zero, top left. Zero, half the video width is the top right quadrant, and so on and so forth. And since these are quadrants, you just each one is half the video height and half the video width. And then you use these when you're updating the canvas. So when you draw that image to the canvas, you draw the video image or you pass the video image in and then you pass these coordinates that we just grabbed, which will then say like, okay, only draw the top right quadrant of the video into canvas. And then, you know, call request animation frame over and over and over again. So if you want to go have a play, it's just video-sprites.mux.dev. And then the source is all on GitHub.com slash muxlab slash video sprites. But real quick, I wanted to just show you what this looks like. This is just video-sprites.mux.dev. I just have a bod asset in here, but this could also work with live, whatever else. Here's some smooth stream music. So it works surprisingly well. The gotchas here that I would call out are that you should keep in mind that you're only going to be able to broadcast a subset of whatever your top level is. So if your streaming service is limited to 1080, for example, then each one can only be a quarter of that, which is fine for most examples. Once we get up to 4K for like really a lot of services supporting, then you could do 1080 for each quadrant. But that's the one thing to keep in mind and all the other gotchas with canvas. But thanks so much for listening. If you decide to use this at all, please let me know, shoot me a note at Matt McClure on Twitter. And otherwise, if you want any video stuff, check out mux.com. All right. Thanks, everybody.