Getting Weird with Video Manipulation and HTML5 Canvas


In this lightning talk we will be pushing the boundaries of HTMl5 Canvas browser APIs. Join us while we do some experiments with video in the browser to see what’s truly possible. DISCLAIMER: No promises of cross-browser compatibility. Not for the faint of heart. Must be this tall to ride. Attending this lightning talk may result in serious injury or death. All participants must bring a life jacket.


♪ Hello everyone at React Summit. I'm very excited to be talking to you here today. We're going to be talking about the Canvas and HTML5 video APIs and some cool stuff that we found that you can do with them. So, quick intro, I'm Dylan Tveri. I work at Mux. If you've not heard of Mux, Mux is video for developers. Maybe you know of Stripe. Stripe is payments for developers. Or you know Twilio, which is phone calls and text messages for developers. We like to be like those companies, where we're built first with developers in mind and try to make great, easy-to-use APIs. But we do this for video. So we make all kinds of tools and products and APIs for developers working with video. I'm not going to be talking too much more about Mux today, but if you are interested, come talk to me. I'd love to chat with you. Cool, so now to jump into some code. So I have this code sandbox set up. Code sandbox is a great tool, by the way. It's become kind of one of my favorite pieces of software. I think there's some code sandbox folks here at this conference, so shout out to you all. I love this product. And I'll be sharing this after, so you can fork it, play with the code, do things yourself. And let's just start out with a really simple demo. So this is a very kind of straightforward React app. We have a few different routes. One of these five different examples I'm going to show. And we're using React Router, React DOM. And let's start with the first one, start with a simple demo. So right here we have simple.js. This is the component that we're rendering. We have this player component, and then we have this canvas. And right now you can't see the canvas on the page, but that's what we will be. We'll be kind of manipulating that and doing some fun stuff as we go along. So real quickly, let's just take a look at this player component. So this player component is really just a video element. But if you're familiar with video, how many of you all have done video on the Internet? So video streaming, video on demand, or live streaming, anything like that. You might have used the video element before, and maybe you've done like an MP4 file. And that can kind of work, but when you really want to do video streaming properly, what you need to do is use something like HLS. So HLS is a technology that allows you to basically download videos in segments and at different bit rates and different quality levels according to the user's bandwidth. So that's kind of something Mux does for you. We're not going to get too deep into that, but that's what we're using here on this video player. So this is really just an HTML5 video element, and then we're attaching some extra JavaScript to give it some HLS capabilities. And then when the play event fires, that play event is when the playback begins on the video. We're going to call this onPlayCallback. So let's jump back into the component that's rendering this page. Zoom in a little bit here and make sure you can see that. So right here we have the player onPlayCallback, and when that fires, see what happens. What we see is this video is playing in the video element, and then it's being duplicated on this canvas element right below. So let's jump into some of this code. So onPlay calls, we grab the video element, and we create this context, this context ref. What this is is it's sort of a handle onto the canvas element, and then we can call functions on that context that allows us to manipulate that canvas element, change how it's displayed, and that's kind of our hook into manipulating the actual canvas itself. So onPlay we call requestAnimationFrame, call updateCanvas, and what that's going to do is just call this one-liner drawImage. We pass that video element into it, and this tells the canvas to just draw this image onto the canvas, and these are the dimensions. This is the coordinates where to start, and these are the dimensions to draw, and this is actually we call this recursively. So every time this runs, we requestAnimationFrame again, and then the callback call updateCanvas again. So you can see what's happening. We're just drawing that, we're basically copying that video element down onto the canvas right below it. So that's how that works. Quick show what we did there. Video element, copy each frame, draw them onto the canvas. Pretty simple, right? So now let's jump into take this one step further. So let's go to this filter example. So what the filter does, click play, okay, same kind of thing, but you can see something else is going on here. What we're doing is the same kind of callback, updateCanvas, and what we do is we draw that image onto the canvas, we extract the image data off the canvas, and now we have, like, raw image data that we can actually manipulate and work with, and we're going to iterate through that image data, and we're going to mess with the color values. We can average out, if we average out the red, green, and blue values, that's going to give us this grayscale effect. So we're actually just, like, manipulating the image frame by frame from the video at a time and then putting it back onto the canvas, redrawing it onto the canvas, and you can see it has that effect, and you can see this canvas is always staying synced with the frame of video that the video element is rendering. Okay, pretty cool, right? So let's look at the steps that we did there where we took this kind of a little bit further. So instead of just drawing each frame onto the canvas, after we do that, we're extracting the frame, manipulating the colors onto a grayscale, and then redrawing it back onto the canvas. Okay, so now we have a few more examples. Let's see what else we can do. It's going to get better and better each time. Layla, this is my coworker Phil's dog, and let's look at this example. So now in the update canvas function, we draw the image, and then we're just going to add this context fill. We're going to call this fill text method on the canvas. So what we're doing there is we're actually just adding text on top of the canvas. So we're rendering the video image into the canvas and then just adding text on top. Now you can imagine this could get pretty useful, right? If we have a video and that video, we're playing. If we just hid this video element and played it and draw it onto the canvas, then we can do all these cool things like add text in real time, do all these cool things in real time, frame by frame on the client side in the browser, all with these browser APIs. So that's where we're adding a name. Let's see what else we can do. Okay, so now let's get into this one. This is called classify. So what we've looked at is we can grab individual frames from the video in real time, draw them onto a canvas, and before we draw them onto the canvas, we can manipulate them, right? So what else can we do? When we have a raw frame of a video, let's think about what else we can do. So this video, if you don't recognize this video, this video is Big Buck Bunny. It's sort of the canonical Hello World video example in the kind of video streaming community. I've watched this video way too many times, and it kind of makes a good example. So I'm going to use this for the purposes of this classify demo, and let's just push play here. And if you see what's happening is every frame of the video, we're running some, like, machine learning object detection functionality on each image frame, and you can see it's – and then we're drawing the rectangle after we've detected the object onto the frame, and right now it thinks this is a person. We go a little further. Now it thinks it's a bird. So we're actually, like, detecting frame by frame what's going on with the objects in this video. So let's take a look at the code. We draw the image onto the context. We extract the image data, and this is the same image data where we were manipulating the colors, so we have this extra call here, which is model.detect, and we pass in that image data. So model is something that comes from this TensorFlow Cocoa SSD model, which is this TensorFlow model that will do object detection on images. It's made to work with images, and when we pass in this image data that we've extracted from the canvas, it's going to run the object detection and send us back an array of predictions, that's what they call it. So now once we have an array of predictions, we can pass those into this outline stuff function. That's going to map those predictions. It has the X, Y coordinates, the width and the height of this bounding box, and then we can actually just draw those boxes with the labels directly onto that canvas element that we're already using to render the video. So you can see it thinks it's a bird. Well, it still thinks it's a bird. And dog, we saw there was a dog there for a second. Here it thinks that is a sports ball. So, you know, it's not the most accurate object detection for this animated content. Now it's a sheep. It kind of looks like a sheep. But we're actually able to do some pretty cool stuff, and remember, this is happening in real time. So we don't even necessarily – a lot of times when you're doing image detection on a video, you would do that out of band on a server kind of once the video is kind of finalized, but imagine this was a live stream, right? If we're dealing with a live stream of video, we'd be able to actually run this on the client and actually detect objects in real time, and, you know, the sky's the limit there, and we could do all kinds of things with the detection that we're doing. Let's look at one more example of the classification. Let's pull up Leila Phil's dog again, and you can see here TensorFlow. For a real live video, it's the type that sees that that's a dog. It's actually pretty good at detecting real life things, animated things, animated giant bunnies, maybe not so much, but a dog, it can get. So that is to really quickly review what we did there. So the kind of key kind of part to pay attention is that once we get images into a canvas, we can actually extract that raw image data, and in this red circle where we're doing live object detection, replace that with anything, right? Manipulate the colors, add text overlays, and then we can redraw those back onto the canvas with all the canvas APIs that are available. So that's what we did there. Now let's take a quick look at some real world use cases of this. At Mux, we actually used this on our marketing website recently. So we recently kind of did a design refresh on our marketing website, and we have this API demo in this top hero section, and you can see what's going on here. These, before, previously on our marketing site, before this iteration, we had a similar sort of API demo, but it was all one video. So you can imagine if all of this here was just one video with this device and the browser popping out, that worked pretty well, but we kind of wanted to make it better this time. What we were thinking is that you'll notice that as I'm hovering over this, that's popping out. If I hover over the browser, the browser pops out. I can copy text here. I can interact with it. That's what we wanted to do. Like let's say a developer comes here, they want to copy this text or just make it more interactive. We also have these bleeding colors in the back, and we want those to bleed kind of outside the bounds of this element and kind of bleed into the top header and bleed into the bottom. And if this was just a static video, we wouldn't be able to get that effect. So the way we were able to pull this off, I have a storybook here, an example. So the way we were actually able to do this is through the strategy that I described. So we actually inspect these elements. If we inspect these elements, you can see that this right here is a canvas. Let me replay this. And then we see that this right here is another canvas. And then if we look further down here in the DOM, we can actually see that there's a video element. So this is the video element that is streaming the video. And then we're copying the frames of that video and rendering it to these two canvas elements in real time. So the benefits of that strategy, you know, alternatively, we could kind of pull off the same design and have this browser be one video element and this device be another video element. And that would work okay, except the downside of that is, number one, we're like double streaming the same video, which is going to double the bandwidth, more bandwidth for the user, more video data being downloaded. Seems unnecessary and repetitive. Number two is that then the two videos could get out of sync, right? Like if one video buffers and you're on a slow connection and the other one's not buffered yet, then you can get this playback sync happening. So we'd probably have to write some JavaScript that kind of like keeps the play heads aligned and in sync. And that seems kind of buggy, not a great solution. So what we did is kind of apply the strategy of taking this video element, grabbing the frames from that video element, rendering them to the canvas, and that way these two canvases will always stay in sync. We're only downloading the video once. It works well. And let's play this one more time. And that's the solution we came to. So you notice now I can hover over this, hover over this, and the devices pop out and it's more interactive. I can copy code. And now this video, this is a happy birthday video for React Summit. It's a video I found online of kids crying when they blow out their birthday candles, and it's kind of funny. So happy birthday, React Summit. I'm excited to be here, excited to talk with you all. And if you have anything to talk about video, I'd love to chat. If you're adding video to your product, building video, doing cool things, please chat with me. And thanks for having me. Find me on Twitter, DylanJAJ. And that is the end. Thank you. ♪♪♪♪
16 min
17 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic