Create AR Face Filters With the Chrome Face Detection API

Rate this content
Bookmark

In the fast pacing space of social media apps, some functionality could also be used for web applications. I am going to show you how you could use the feature flagged Face Detection API in Chrome. With a demo, we will explore the possibilities to implement face filters in your future projects. With the webcam access of a device, we add glasses by processing a video feed and using it for fun, finding that sweet spot where fun and learning come together.

30 min
01 Jun, 2023

Video Summary and Transcription

The Chrome Face Detection API is part of the bigger shape detection API and includes text detection and barcode detection. Enabling the API is as simple as opening a specific URL and enabling the experimental feature. The API provides features like detecting faces and processing landmarks, rendering glasses on faces, and applying face filters. It can handle multiple faces and images in videos, but performance depends on hardware and device processing speed. The API is currently in progress and feedback is being requested for potential production capabilities.

Available in Español

1. Introduction to Chrome Face Detection API

Short description:

Welcome. My name is Jorrik Leijnsma. I'm a creative front-end developer and I'm going to share something about the Chrome face detection API. Developers come with some kind of image. Sometimes we are a bit nerdy, but we are also problem solvers. Code can be fun and what you create with code is often seen as fun. AI is introduced into the scene, and there are more demanding users expecting more features. The Chrome Face Detection API is part of the bigger shape detection API.

Welcome. My name is Jorrik Leijnsma. As Caroline said, I'm a creative front-end developer and I'm going to share something about the Chrome face detection API. It's a mouthful, but managed to say it okay.

So, I'm a front-end developer and I work for over six years as a front-end developer now. Last two years at a company called Ordina. And, what we have, we are developers. I think most of us are here. And developers come with some kind of image. Sometimes we are a bit, like, nerdy, and... But we are also problem solvers.

But there's also this shift changing where code can be fun, and also what you create with code is more and more often seen as fun. But the code part is still boring. And that's where sometimes there is this interesting thing where you also want to create no boring code. But also not the great end result. So maybe people can show hands if you have seen boring code somewhere in your life. Yeah. I think most of us have. Or have you seen code that's not boring and was more interesting? Maybe also raise hands. So maybe a few less, but still there. And the last question, maybe people have seen code that was not functional at all, but was purely for fun purposes. So also some hands over there.

With the recent years, these all kinds of developments, AI is introduced into the scene. There are more demanding users. They are expecting more and more features. And you have the problem of maybe not being noticed if you don't have these amazing features in your app or service or whatever. So you need to be divergent and show what you can do and what's possible. This Chrome Face Detection API, it's something that's behind the feature flag, meaning it's not usable right out of the box. There is a process for this shape detection API. The Chrome Face Detection API is part of that bigger shape detection API.

2. Enabling the Chrome Face Detection API

Short description:

The Chrome Face Detection API includes text detection and barcode detection. It's important to note that face detection does not mean face recognition. Enabling the Chrome feature flag is as simple as opening a specific URL and enabling the experimental feature. You can then use the face detector by calling a new face detector and setting the max detected faces and fast mode properties.

It includes text detection and also barcode detection. And the barcode detection is already done, it's ready to use for everyone. But that face detection and text detection are still being developed. For this, it's an important note that the face detection does not mean face recognition. So it detects your face, but is not able to recognize a face from this is this person or that's that user logging in. So it's only showing there is this face on this type of media.

So how do you enable the Chrome feature flag? So there's this link, when you open this, let me try if it works. No, that needs to open. Let me copy paste it in then. It's not not not a valid normal link, choose application and then show options. You're not seeing this. I have to select Chrome for it to open. Google Chrome browser. So over here, let me slide it over. So now it should be visible. This is the part if you open that URL. You're seeing this experimental feature thing. And you just need to enable it. It comes with some other features as well. So keep that in mind. I recently got some trouble with selecting, for example, text on GitHub, which is not showing properly, so that also seems to be part of this something buggy going on in this web platform stuff. So that's how you enable this feature. Like this. That was way easier. So now you have enabled in your browser and now you can use it.

To use it, you start to call a new face detector. It's as simple as that. It can handle two properties. It can handle the max detected faces, which can set to I think any number, but at a certain point, your machine is not going to be fast enough to detect all those faces. And there's this fast mode which can be true or false.

3. Detecting Faces and Processing Landmarks

Short description:

The Chrome Face Detection API provides more features if you do not use the fast mode. With the detect function, you can retrieve all the detected faces. The faces object returned from detect faces includes landmarks such as eyes, nose, mouth, and boundary box. These landmarks provide location points on a 2D plane. To prepare the image, create a new image and add necessary functions. Check if face detection is available and if the images are present. After detecting the faces, further processing is required. The canvas is used to create lines and needs proper styling to display correctly.

And it basically is about the speed of recognizing. So and you get more features if you do not use the fast mode. So you get more data from the face. And otherwise, it's more of a narrow version you'll get back.

So now we have this face detector called and now we can detect some faces. This is everything that you need. So now with this detect function you get back all the faces that it finds with that API. This slide and the previous slide are the only things that you need from the API. All the other slides will be functions added to do something with these faces because we need to put in an image as you see. We need to do something with the faces because now we got a function which has all the faces but what do we do with it?

So let's say I have this picture and you see a face and the face has eyes, has a nose, has a mouth, has a boundary box of the face. And for it to properly get feedback you see it detects those landmarks and they are displayed in this type of object you get back. That is the faces object that is returned from detect faces. So you get the location of the points for a particular type of landmark so here you see some of the points from the right eye. You're seeing here the right eye, one of those points is detected and it's in here. So that's the whole line that surrounds the eye. So you get those points back on a 2D plane, so it's starting at the image axis, so it's going from for you top left and then moving up and down, and you got those values back and you can do stuff with them.

So first of all we need to prepare the image. So I created simple image tag and imported an image. Now I need to create this new image to get a bitmap from it and with this I need to add some functions to make sure it works. I'm checking if that face detection is available otherwise where you don't have the feature flag for example enabled it will give some errors, so you want to know it's there. And after that we can check if the images are there. So we know every data is available. And after that we can just detect the faces and with what comes back we can do that stuff. So these steps are needed, these steps to get the image ready to put into the detector. That will result in these kind of landmarks. But as I showed this was only getting the image into the detection part, we still need to do something with those landmarks.

So what happens next we need the canvas because on the canvas we are going to create the lines. The canvas needs some styling because we need to display properly so we have some styling. So the canvas matches the image size and the image itself. And with that, we're going to initialize some things.

4. Rendering Glasses on the Face

Short description:

We initialize the image and do something with glasses. We define the scale and render everything into the canvas. We have two functions to apply something with the landmarks. We calculate the coordinates of the landmarks and filter out the eyes. We find the right distances from two points, scale everything, and put the values back into the SVG. We draw the image and rotation, and the glasses are rendered on the face. We open the local host and see an example with Albert Einstein.

So here you see some parts where we initialize the image, but also get the element and do something with glasses. We'll come to that later. With this, we can also need to do some math. We need to define the scale, for example, of the image because that bitmap may have a different size so we need to set all the properties right.

After we've done that, we are going to render everything into the ‑‑ oh, the code goes here. This is the canvas where you get the 2D context back. You need that for rendering those points back into the canvas. When you have that, now we have this faces part where I can have to ‑‑ two functions where I can apply something with those landmarks. I have created those functions and we are going to look at that.

First of all, from this part of the code, with those functions called, I got this glasses on the picture. It is still a static picture, it's still fun to do. But how do I get the calculations for those glasses? We'll get to that now. I did some math magic. So I needed to get the efforts coordinates of those landmarks. Because like you saw, it was this big line with this amount of points. And I need to get the efforts of all those points so I know what is the middle point of my eye. So I got part.

Then maybe it's a bit small to see but I need to filter out the landmarks. Because I needed the eyes. Sometimes they mess up the eyes or they just name them eyes instead of eye left and eye right. So I need to filter that out so I know which part of the coordinates match those landmarks. After that, I'm going to find some math on the internet on how to find some right distances from two points, how to scale everything properly and put all those values back into the SVG. And with that, I apply some things. So I draw the image, I draw the rotation, and now I have those glasses rendered as an image inside that canvas. With that, it shows that image on my face. So let's see how that looks. I'm going to open the local host. I'm going to move the Chrome browser back over. So here I have an example. This is Albert Einstein.

5. Applying Face Filter with Glasses on Einstein

Short description:

I applied a face filter with glasses on top of the face of Einstein. I'll show you the code. I have the image and compass, and with that, I do some stuff. I open up the image of Einstein, detect the faces, and apply landmarks. I update the bottom compass to show specific elements. I have a list of persons and configurations to choose what to put on top of the image. The glasses are just a cutout picture.

I sometimes need to refresh the page. Here it applied a face filter with the glasses on top of the face of Einstein.

I'm going to show you the code related to it. So move back over. Get my code. Over here. Let's zoom in a bit. Not too much.

So I have this still image for now. This is the code I've been showing. So over here at the bottom, you see some styling happening. I have the image and the compass over here. And with that, I do some stuff.

So like I said, I got all the elements. Now I open up the image of Einstein which is person 2 in this case. I create some conflicts so I know what things I would like. So I create all these scaling and all the things happening over there. I detect the faces and I have this field function where I apply the landmarks to it. And I was able to update this bottom compass which is on top of the image to show this specific elements.

So with that, I have this person over here. Let's go to public persons. I just have this list of person, just a plain image of Einstein. Let's go back to this image thing. I created these configurations where I was able to easily choose which thing I would like to put on top of the image.

So now I have this glass over here. And the glasses are, again, just a cutout picture. I think it's over here. No, it's media and then glasses. So it's over here. I had those glasses.

6. Drawing Landmarks and Using Videos

Short description:

I draw the landmarks and detect the eyes, nose, and mouth. Green lines are used to get the average and apply glasses. A prerecorded video and live recorded video can be used to detect landmarks on every frame.

It's just a transparent picture of those glasses. So that gets rendered on top. With that happening, let me show you this part where I draw the landmarks. So now I have this function where I just draw the outline of those landmarks. It's basically just creating a line, a stroke from every point to every point of those landmarks.

And now I was able to see what it detects. So now when I switch back to my Chrome... Oh, not to this. If I refresh it, sometimes it needs a hard refresh. It did not update very well. So I think I need to save it before I refresh it. So over here, you now see it detected those landmarks. So you see the eyes, the nose, and the mouth, and this is where I use those green lines to get that average to apply the glasses on it. This could be extended into other elements as well.

So I also have for example, this video snippet, which I used before. It's just a recording of myself, and I put that recording into a video element. So when I open this, it's basically the same thing. It's only showing this video element with a prerecorded video. After that, create a request frame animation. I'm able to detect faces on every frame of the video, then render the frames into the canvas. So when I open this, it's there. So when I now switch over. Oh, this was actually the camera. This was the last one I wanted to show. But let's see this one first.

So with a prerecorded video, you can also do it with a live recorded video. So it's now using the webcam of my laptop. That's a different part where you use a certain element where you create the – you get the user media. It's another API, a web API, where you get the media from your device, so from your phone, from your laptop, and I'm putting that image right into the video element. So with that, I'm able to use the video of my webcam to live detect the landmarks of my face.

7. Using the Chrome Face Detection API for Fun

Short description:

I turned off the landmark part for the AR filter and focused on the glasses. The API provides data on eyes, mouth, and nose, allowing for creative possibilities. While not yet useful in production, it's a fun showcase of coding and problem-solving. Passion for creating with code and having fun is the fundamental essence of this feature.

It had some trouble with the glasses. So for now, let's turn off the landmark part because for an AR filter, you don't want to know where all the eyes are, the mouth is. So now the lighting is a bit different, but over here, now I have my glasses on. It also follows my eyes. It's a bit janky because now it takes the average of every frame. You are able to take the average of multiple frames, and with that, you are easy to just smooth it out. It is able to turn somewhat, so it also does that. And it stops detecting at some point.

So this is very fun, where you can also do this live. For when you take a picture using your device. And the last thing I wanted to show, which is basically the more easy version of this, where you have this pre-recorded video. So this pre-recorded video, and over here, it's also detecting the faces in real-time. So, I do the same thing, where I have these glasses again, and I'm just putting all the data into that video element. I'm also recording it here, so I can export it with the glasses on. Let me get to the part where I apply the images. So, here I've got that drawLandmark function I created again, which I can turn off. So, now it's only showing the glasses. And I can go back and stop adding the glasses, but only show the landmarks for example. So, this is the data where you can go wild with. So, you have the eyes, the mouth, the nose, and you are free to do anything with that. It's nice that you have this very small part of the API, which you can use to get the landmarks, and from that you are able to do more stuff with it. This is just one showcase, I think, which is nice, that you are able to do something with it, which is some lines of code, some calculations, and if you show this on a birthday for example, like this is what I created with some text and numbers. I think people can be amazed that they know this stuff from social media, know how those filters look like, but if you are able to recreate them just yourself, that's really great.

So, with that, there is this fun part of coding, which is this is maybe less functional. There are possibilities, use cases for it, but definitely with a feature flag right now, it's not that useful in production. But this is where the fun part interests you. This passion about getting code to do fun things helps you understand more and more what the subject is about and how the coding works. And that passion for problem solving, for creating something with code, that can go together with fun. And some things don't make sense other than just being fun, just being nice, and that is the fundamental thing of passion, I think. And when you let that mental part go, then that's where the fun starts.

QnA

API Capabilities and Use Cases

Short description:

The API can handle a stream and not just a single image, but there may be limitations when dealing with changing faces. The face detection API can be used beyond AR, such as for cropping profile pictures with the face centered. It can also handle rotated faces to a certain extent, but you can add your own code to compensate for rotation and ensure continuous detection.

If you stop thinking about it, just have fun with what you do. Then, yeah, that helps being excited about things. So thanks for your attention. This is some of my social media, and I'm going to invite Caroline back on to the stage.

All right. We've got the questions. So first question up, can this API handle a stream and not just a single image? I would guess yes, because of the videos, but maybe if it's changing faces, there might be limitations? Yeah, you especially have the problem with multiple faces in the stream that you have in a photo as well, but then it just picks one of them. And in a video stream, if you have multiple faces on it, it has a hard time for every frame deciding which face to pick. So it goes more over the place and randomly picking a face for every frame. So I could imagine it also takes a little bit of time sometimes to detect the landmarks, so that might be a limitation within that.

All right, so then what could I use the face detection API for beyond AR? So this is an interesting question. It can also, for example, be really helpful on the picture. For example, the important parts of the picture are most of the time when, for example, people upload a profile picture, that a face is the most important part of it. So you can now, for example, crop that picture more clearly with the face centered, for example. Yeah, we've all had that where it cuts you off on your eyes. Yeah, or especially when you have this profile picture turned into a circle when you upload it. It's nice when you have it already centered. It's not your neck. Yeah, yeah. That's great.

All right, so you kind of touched on this when you were doing the video demonstration, but how well does it handle rotated faces? Have you tried that with images? Yeah, so it has some threshold. So at a certain point, it's going to stop detect it or the rotation doesn't match because it's turning the landmark somewhat. But you are able to add your own code to it to apply for detection. So, for example, what happens if you turn at a certain amount, it stops detecting and then starts detecting again. But you are able to compensate by, for example, turning the image a bit. So if you see the face is turning, you could apply a counter-turn on the source image or video. So especially for a video, when you see the face gets turned, then you would be able to compensate for that earlier on, so it keeps detecting the faces. Yeah. Yeah. That makes sense.

Landmarks, Production Status, and Feedback

Short description:

Different landmarks provide a clear understanding of the face's orientation. The feature is currently behind a flag and in progress, with no estimated release date. Feedback is being requested to determine use cases and potential production capabilities.

Have you noticed, or have you tried, like, is it different for different landmarks? Like, I could imagine your eyes when they turn versus a nose, maybe...

Yeah. So, this helps where most of the general parts of the face are nicely separated. So when you have the both of the eyes and the mouth, you're really able to get a clear understanding of what orientation the face is, for example.

Yeah. All right. So, it is behind a feature flag, as you mentioned. Do you have any information on when people would be able to use it in production, or maybe even other browsers?

Yeah. So right now, this is in the progress state. They are also requesting for use cases, so when you go to the blog that Chrome wrote about it, they are also asking give feedback on this, share things, what you want it to be able to do in production, especially because the barcode scanner is already in production. I think this will get there somewhere as well. No really time estimation. I think the first thing I shared about this was over a year ago, almost a year ago, and it was still in progress at that time, so we all know how the browsers deal with these changes.

Handling Multiple Faces and Images in Videos

Short description:

The Chrome Face Detection API can handle multiple faces and images in videos, but the performance depends on the hardware and device processing speed. It works well with two or three people on screen, especially in fast mode. The API detects the two points of the eyes, eliminating the need for the whole outline of the eyes.

Oh, absolutely, yes, for sure. You mentioned this a little bit in your first answer about the rotation, but how well does it handle multiple faces and images of videos, and do you notice a difference between multiple faces and images versus multiple faces in videos? Yeah, so this is also where the hardware comes in, because it's also depending on how fast your device is able to process these tasks. I haven't tried this with a really big group, but two or three people on screen, in camera view works okay, especially if you turn for example to fast mode. Yeah, you get the two points of the eyes, yeah, you don't need the whole outline of those eyes, so that can already be very helpful, yeah.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

JSNation 2023JSNation 2023
25 min
Pushing the Limits of Video Encoding in Browsers With WebCodecs
High quality video encoding in browsers have traditionally been slow, low-quality and did not allow much customisation. This is because browsers never had a native way to encode videos leveraging hardware acceleration. In this talk, I’ll be going over the secrets of creating high-quality videos in-browsers efficiently with the power of WebCodecs and WebAssembly. From video containers to muxing, audio and beyond, this talk will give you everything you need to render your videos in browsers today!
JSNation 2022JSNation 2022
28 min
MIDI in the Browser... Let's Rock the Web!
If you own an electronic music instrument made in the last 3 decades, it most likely supports the MIDI protocol. What if I told you that it is possible to interact with your keytar or drum machine directly from your beloved browser? You would go crazy, right? Well, prepare to do so…With built-in support in Chrome, Firefox and Opera, this possibility is now a reality. This talk will introduce the audience to the Web MIDI API and to my own WEBMIDI.js library so you can get rockin' fast.Web devs, man your synths!
JSNation 2023JSNation 2023
11 min
Web Push Notifications Done Right
Finally, Web Push API is available in all major browsers and platforms. It's a feature that can take your users' experience to the next level or... ruin it! In my session, after a tech intro about how Web Push works, we'll explore implementing smart permission request dialogues, various types of notifications themselves, and communicating with your app for more sophisticated scenarios - all done right, with the best possible UX.
JSNation Live 2021JSNation Live 2021
34 min
Service Workers: How to Run a Man-in-the-middle Attack on Your Own Site for Fun and Profit
Service workers bring amazing new capabilities to the web. They make fully offline web apps possible, improve performance, and bring more resilience and stability to any site. In this talk, you'll learn how these man-in-the-middle attacks on your own site work, different approaches you can use, and how they might replace many of our current best practices.

Workshops on related topic

Node Congress 2022Node Congress 2022
57 min
Writing Universal Modules for Deno, Node and the Browser
Workshop
This workshop will walk you through writing a module in TypeScript that can be consumed users of Deno, Node and the browsers. I will explain how to set up formatting, linting and testing in Deno, and then how to publish your module to deno.land/x and npm. We’ll start out with a quick introduction to what Deno is.