Synchronizing multiple streams, particularly multiple live streams, and especially when the audio is critical too, can be incredibly difficult. How, then, could you build an experience where you have a few camera angles and you want a viewer to be able to seamlessly pick one? We'll walk through a hack technique that allows you to do just that without ever having to worry about synchronization.
Remember CSS Sprites? Let's do that with video!
AI Generated Video Summary
Today's Talk discusses the use of video sprites, which are similar to CSS sprites and are commonly used for optimization. Video sprites can be used to synchronize and select multiple video feeds, allowing viewers to choose the desired feed. Smooth streaming and broadcasting limitations are also mentioned, highlighting the need to consider resolution constraints. Overall, the Talk provides insights into the benefits and techniques of using video sprites for feed synchronization and optimization.
1. Introduction to Video Sprites
Today we'll talk about using video sprites, similar to CSS sprites. A sprite is an image with multiple images in it, allowing the client to choose which parts to show. This technique is commonly used for optimization. For example, video game sprites combine all states of a character into one image. The same technique can be applied to videos, where multiple videos are combined into one and the desired one is shown. This technique is useful for feed synchronization, such as streaming concerts with different camera angles. Viewers can choose which feed to watch.
Hi everyone, today I want to talk about something a little hacky I've been thinking about lately, but the idea here is that we want to use video sprites in the same way that a lot of you might have used CSS sprites back in the day.
So we'll talk about what what I mean by a lot of this, but hello, I'm Matt McClure. I am one of the co-founders of a company called Mux and I run the Developer Experience org there. And in a nutshell, we build awesome video infrastructure for developers. So if you're looking for a video API, we're there so check us out.
So okay, taking a step back, what is a sprite? So in a nutshell, this is an image with a bunch of images in it. So the client gets this big combined image and then can just pick and choose which parts of it to show. So if you're relatively new, then you might not have seen this as widely used, but this is a really common optimization technique from the early aughts.
So if you had a button with hover states and depress states and active states, then you would just send one button image and then use your CSS background to decide which one of that image to show. So a little bit more common if you're kind of started there. You might remember this from back then.
But to show this a little bit more concretely, a lot of people know about this from video game sprites. So all of Mario's states are in one image. And then the client viewport then just shows the state of Mario it wants. So you might be wondering what in the world does this have to do with video? The gist here is the same technique works. You can send a few videos combined into one and then just show the one you care about in the player.
So you might be wondering, like, why in the world would we be doing this? And I would say, you know, use your imagination. There could be a bunch of examples of this, like, I think, sports, sporting events, concerts. But the biggest example that comes to mind and what we see the most from customers want to do stuff like this is around feed synchronization, particularly around, like, being able to pick between these different feeds in a synchronized way. So let's say you're streaming live concerts, live music. So you've got a bunch of different cameras streaming a bunch of different angles. So one's on the drummer, one's on the singer, one's on the audience. And a producer on site is typically deciding which one of those feeds to show at any given time.
So they might do a nice little transition, go from the drummer to the audience, et cetera. That producer then sends a feed to an encoder service or whatever else. That looks like I'm using Mux as an example here for obvious reasons. But then that service then broadcasts that to all of your end viewers. So then those viewers start saying, like, actually, I just want to watch the drummer all the time, and I hate the transitions that this producer is doing. So they want the power to be able to pick which feed they watch. So you decide, okay, how can we go about building this out.
2. Synchronizing and Selecting Video Feeds
To synchronize multiple live video feeds, combine them into one video and allow viewers to select the desired feed. This can be achieved by sending all camera feeds to a local encoder box, which lays them out in quadrants and sends them to the encoder broadcaster service. Viewers can then choose which feed to watch, ensuring synchronized feeds and a shared audio stream. The process involves grabbing the coordinates of the desired feed, laying out the coordinates, and updating the canvas with the selected quadrant of the video. For more details and code examples, visit video-sprites.mux.dev and the GitHub repository at github.com/muxlabs/video-sprites.
So you start thinking, okay, I'll send every camera directly to that encoder or broadcast service. And then every viewer can get all the feeds. In this example, three feeds. And this is where things really get hairy if you start going down this path. So now you've got three different live streams that people can watch. But how do you switch between them? So do people just click another feed? And then you might be a few seconds off in terms of audio for all of them. So it can be tough to synchronize that in the client, or honestly, next to impossible to do that well.
So one solution would just be to send one video again. So like you were doing before, but instead of that video being produced, you just combine all the feeds at that level and then send them along. So in this example, all the cameras go into that one encoder box locally. It just lays them out in four quadrants, sends those to the encoder broadcaster service, and that goes out to all the viewers. And then from there, the viewers can then pick which one they want. So now you're guaranteed that your feeds are synced. You only have to worry about one audio stream that's shared between all of them for obvious reasons, and then you only show the quadrant of the video that the viewer selects at any given time.
So how might this work detail-wise? So all this code is on GitHub. I would suggest just checking it out there, but at a high level, you want to grab the coordinates of the feed that you want to show. So I have the feeds just named like zero through three. And you want to lay those coordinates out. Coordinates in this example of this array is the source X, source Y position, source width and source height, so what you want to chop. So zero, zero, top left. Zero, half the video width is the top right quadrant, and so on and so forth. And since these are quadrants, you just each one is half the video height and half the video width. And then you use these when you're updating the canvas. So when you draw that image to the canvas, you draw the video image or you pass the video image in, and then you pass these coordinates that we just grabbed, which will then say like, okay, only draw the top right quadrant of the video into canvas. And then, you know, call request animation frame over and over and over again.
So if you want to go have a play, it's just video-sprites.mux.dev. And then the source is all on github.com slash mux labs slash video sprites. But real quick, I wanted to just show you what this looks like. This is just video-sprites.mux.dev. I just have a bot asset in here, but this could also work with live, whatever else.
3. Smooth Streaming and Broadcasting Limitations
Here's some smooth stream music. Keep in mind that you can only broadcast a subset of your top level. If your streaming service is limited to 1080, each quadrant can only be a quarter of that. Once we get up to 4K, you could do 1080 for each quadrant. Thanks for listening. If you decide to use us, let me know at Matt McClure on Twitter. For video stuff, check out mux.com.
Here's some smooth stream music. So, it works surprisingly well. The gotchas here that I would call out are that you should keep in mind that you're only going to be able to broadcast a subset of whatever your top level is. So, if your streaming service is limited to 1080, for example, then each one can only be a quarter of that, which is fine for most examples.
Once we get up to 4K for, like, really a lot of services supporting, then you could do 1080 for each quadrant, but that's the one thing to keep in mind and all the other gotchas for Canvas.
But, thanks so much for listening. If you decide to use us at all, please let me know. Shoot me a note at Matt McClure on Twitter. And otherwise, if you want any video stuff, check out mux.com. All right.