TensorFlow.JS 101: ML in the Browser and Beyond
AI Generated Video Summary
2. TensorFlow.js: Pre-trained Models and Face Mesh
Now, with TensorFlow.js, you can use pre-trained models for various use cases, including object detection, body segmentation, post estimation, face landmark detection, and natural language processing in the browser. Object recognition using Cocoa SSD is showcased, allowing for real-time detection and classification of objects in images. Webcam integration enables live object detection and classification, running client-side in a browser, ensuring privacy and cost savings. The three-megabyte face mesh model can recognize 468 facial landmarks, enabling real-time applications like AR makeup try-on without physical presence. See face mesh in action during the demo.
So, let's see some of these in action. Now, first up is object recognition. This is using Cocoa SSD behind the scenes and is trained on 90 object classes. You can see this in action on the right hand side with the dogs being highlighted with their bounding boxes. And we can even know that there's two dogs in this image as both are returned to us. So, let's see this in action live to see how it performs in the browser.
So, next up, we've got face mesh. This is just three megabytes in size, and can recognize 468 facial landmarks on the human face. You can see this in action on the left-hand side of the slide right now. Now, people are starting to use this for real world use cases such as L'oreal, and they've created AR makeup Trion, which allows you to try on lipstick in this case in real time without even having to be there physically present in the store. So, you should note, the lady on the right-hand side is not wearing any lipstick. We're using face mesh to understand where her lips are, and then we use WebGL shaders to augment the color of lipstick she wants on her face in real time. So, this is super cool, and I'm sure we're gonna see more stuff like this coming out in the future. So, let's see face mesh in action to see how it performs in the real world. Let's switch to the demo. Okay. So, now you can see me talking to you with face mesh running in real time in the Web browser at the same time.
3. Machine Learning in Action and 3D Rendering
4. Body Segmentation and Superpowers
Okay. Back to the slides. So, next up is body segmentation. This allows you to distinguish 24 body areas across multiple bodies all in realtime. You can see this is an action on the slide. And on the right hand side, you can see that the different colors represent different parts of each body.
Even better, we've got pose estimation going on at the same time, those light blue lines contained with each one of the bodies on the right hand side there, which allow us to estimate where the human skeleton is. And with that, that can enable really powerful demos such as the ability to recognize when you're in a certain pose or a gesture, or something like this. We've already seen people in our community use this to do workout instructors, or yoga yoga instructors, and this kind of stuff. So it's super cool to see the creative potential of this model.
On the right-hand side, I also made a clothing size estimator. Now, I don't know about you, but I'm really terrible at knowing what size clothing I am when I'm trying to buy clothes once a year. And for different brands, I have different sizes in some brands, I'm a small other brands, I'm a medium. So I never know what to select at checkout. Now here, I can in under 15 seconds get an estimate of my body size for the key measurements that they care about for that particular brand, and I can automatically select at checkout the correct size for me. And that saves me time and money of having to return things when they don't fit. And this solved a problem I had in my daily life.
Next up, what about giving yourself superpowers? One guy from our community combined our face mesh model with WebGL shaders, to create this Iron Man like effect. And here you can see lasers coming from his eyes and mouth in a really beautiful, realistic kind of way, which could be great for an activation with a movie company, or something like this, for a new movie release. Or what about if you combine with other technologies? Here we see another member of the community using WebXR and WebGL and TensorFlow.js together to extract an image of a body from a magazine and then bring that body into the real world, so they can inspect the fashion design in more detail. I've even seen this person go one step further, and make the face animate and say sounds, which is really, really cool.
5. WebRTC and Teleportation
By adding WebRTC to this, I can even teleport myself by segmenting my body using body pics from my room, transmitting that segmentation over the Internet, and reconstructing it in a real physical space using WebXR. This allows for more meaningful communication with friends and family, surpassing the limitations of rectangular video calls.
But why stop there? We can go one step further still. By adding WebRTC to this, which stands for Web Real-Time Communication, I can even teleport myself. Here, I can segment my body using body pics from my room. I can transmit that segmentation over the Internet and then reconstruct it in a real physical space using WebXR. And this allows me to speak to my friends and family in the current times where we're not able to travel as much in a more meaningful way than a rectangular video call. In fact, maybe in the future, my future presentations will be delivered to you in this form. Who knows? But some very exciting stuff ahead.
6. Using Teachable Machine for Transfer Learning
The second way to use TensorFlow.js is by transfer learning, which allows you to retrain existing models with your own data. Teachable Machine is a website that enables training and inference in the web browser, making it great for prototyping. You can select an image project, choose classes, and record samples using your webcam.
Now, the second way to use TensorFlow.js is by transfer learning. And this allows you to retrain existing models to work with your own custom data. Now, of course, if you're a machine expert, you can do this all programmatically. But today I want to show you two easier ways to get started.
Now, the first is Teachable Machine. This is a website that can do both the training and the inference completely in the web browser. This is great for prototyping, for things like object detection, pose estimation and sound detection. I think more models will be supported in the future. So, watch this space.
But let's see it in action to give you a flavor of how it works. Okay. So, if we head over to teachablemachine.withgoogle.com. You can follow along if you like. We can actually select one of three projects to choose from. Today, we're going to go for image project to recognize a custom object. So, we click on that, and we're then presented with a screen like this. On the left, we've got a number of classes for the object we want to recognize. If you want to recognize more than two things, we can click the nice add class button if we choose to do so.
But today, we're just going to recognize my face or a deck of playing cards. So, let's give it more meaningful names. I'm going to call the first one Jason to represent me, and the second class, I'm going to call cards, which represents the cards. All we need to do is allow access to our webcam, and you'll see a live preview pop up on the left-hand side for the first class. Now, I just need to record some samples of my face to make sure we have some training data for this class type. So, let's go ahead and do that. I'm going to move my head around to get some variety. There we go. And we can see that I've got how many images have I got there? About 38 sample images. Perfect. I'm now going to go and do the same thing with class number two, the deck of cards, and I've got here a nice set of playing cards. So, what I'm going to do is hold to record again, but this time I'm going to get roughly the same number of images of the cards.
7. Teachable Machine and Cloud AutoML
So, I've trained a prototype model using Teachable Machine that can detect JSON and playing cards with high accuracy. If this meets your needs, you can export the model and use it on your own website. However, for larger datasets, Cloud AutoML provides a more robust solution. By uploading images to Google Cloud Storage, you can train custom vision models optimized for accuracy or speed. Once trained, you can export the model as TensorFlow.js files and easily incorporate it into your own webpage.
So, I've got 42 there. That's close enough. All I need to do now is click on train model. And now, live in the web browser, this is going to attempt to categorize the training data that I've presented to it versus the ones it was previously taught on. I can see there in under 30 seconds, it's already complete. And it's currently predicting JSON as the output with 99% confidence. Which is pretty good. And if I bring my deck of playing cards up, you can see that switches to cards with 100% confidence. So, JSON cards. JSON cards. And you can see how easy that was to make and how robust it is in actually detecting those two objects.
Of course, this is a prototype. If this was good enough for what I needed, I can click on export model here. I can click on the download button. And of course, I can then copy this code and use it on my own website if I choose to do so. So, that's Teachable Machine. And great for prototyping. However, if you've got gigabytes of data, you might want to use something more robust for production quality models. So, let's go back to the slides and see how to do that.
So, cloud AutoML allows us to train custom vision models in the cloud. And we can deploy the TensorFlow.js at the end, which is super useful. So, all you have to do is upload folders of images that you want to classify to Google Cloud Storage, as you can see here. And then click on the next button. Once you do that, you'll be asked if you want to optimize your model for higher accuracy or faster predictions or some kind of tradeoff between the two. You can set a budget and leave it training for hours or days, depending on how much data you've got uploaded there. And it'll come back to you with the best results. It's going to try many different hyperparameters, many different types of computer vision models and try and figure out what works best with your data. Once it's ready, you can then click export and choose TensorFlow.js as shown here in the circle, which will download the model.json files, which you need to run in the web browser. And with that, you can then use it on your own webpage and add your own user experience and user interface and so on and so forth. You might be wondering, well, how hard is it to actually use this production-quality-trained model? And actually, it's super simple.
8. Code Walkthrough and API Structure
In this code walkthrough, we import the TensorFlow.js and auto ML libraries. Then, we load the image classification model using model.json. After that, we grab a reference to the image we want to classify and use the model to classify it. This allows us to perform various actions based on the predictions. TensorFlow.js offers the flexibility to write your own code and provides superpowers and performance benefits when used in the browser. Now, let's explore the structure of our API.
In fact, it's so simple it fits on one slide. So let's walk through this code. First, we're going to import the TensorFlow.js library using this script tag. Second, we're going to import the auto ML library with the second script tag. Below this, I've created an image, which is just a daisy image, which I found somewhere on the internet. This is the image we want to classify. This could be something else like a frame from the webcam or whatever it might be. But this, I've just taken for simplicity a daisy.jpg.
So next, we grab a reference to the image we want to classify. So in this case, we call document.getElementById Daisy, which is referring to the Daisy image above here. And now we've got a reference to that in memory. All we need to do now is call await model.classify and pass it the image you want to classify. And this, again, is an asynchronous operation because this might take several milliseconds to execute, which of course in computer terms is a very long time. So we want to wait for that to finish and then we'll have a JSON object assigned to this predictions constant here on the left, which you can then iterate through and go through all the things it thinks it's found in the image. And with that, you can do whatever you like. You can trigger something to run. You could control a robot. You could do whatever you wanted to do just with a few lines of code. So super cool and super functional.
Now the third way to use TensorFlow.js is to write your own code. And of course, to go through that would be a whole different talk in itself. So today I'm going to focus on the super powers and performance benefits of why you might want to consider using TensorFlow.js in the browser. Now first up, I want to give you an overview of how our API is structured.
9. APIs, Environments, and Performance
We've got two APIs: the high level LES API, similar to Keras in Python, and the Ops API for lower-level mathematical operations. The APIs can run on different environments, including the client-side (web browser) and server-side (Node.js). The server-side execution supports loading Keras and TensorFlow Saves models without conversion. To run a saves model in the browser, use the TensorFlow.js command line converter. Performance-wise, Node.js execution is comparable to Python, and just-in-time compilation in Node.js can boost performance for pre- and post-processing.
We've got two APIs. One is the high level LES API, which is very similar to Keras if you're familiar with Python. In fact, if you use Keras, it's basically the same function signatures. So you should feel very much at home. And then for those of you who want to go lower level, we have the Ops API, which is the more mathematical layer that allows you to do things like linear algebra and so on and so forth. You can see how this comes together in the following diagram.
At the top there, we've got our premade models, which are sitting upon our LES API. That LES API sits on top of our Ops API and this understands how to talk to different environments, such as the client-side. And by client-side here, we mean things like the web browser. Now those environments themselves can execute on different backends. And in this case, we can execute on things like the CPU, which is the slowest form of execution, WebGL to get graphics card acceleration and Web Assembly or WASM for short, for improved performance on the CPU across mobile devices. And the same is true for the server side as well. We can execute using Node.js on the server side and this can talk to the same TensorFlow CPU and GPU bindings that Python has. So yes, that means you get the same AVX support and the same CUDA acceleration that you do in Python. And in fact, as we'll see later, this means the performance benefits are pretty much exactly the same as well. We execute as fast and sometimes faster than Python for certain use cases.
10. Benefits of TensorFlow.js and Node.js
11. Benefits of TensorFlow.js and Community
It allows us to use the TensorFlow saved modal format without any kind of conversion or performance penalties. And we can run larger models than we can do on client side. There are of course some GPU memory limits you might run into if you try and push a gigabyte model over the web to the client device.
And then fifth point, performance, as we spoke about, we got the same C bindings as the original TensorFlow in Python, which can be used to get parity for inference speeds, and we've got the just-in-time compiler boost for the pre and post-processing if you choose to convert that over to Node.js. So with that, let's wrap up with some resources that you can use to get started, and learn more.
If there's one slide you want to bookmark, let it be this one. Here you can see all the resources you need to get started with TensorFlow.js. Our website at the top there, you can find many resources and tutorials to help you on your way. We've got our models available at tensorflow.org.js.models. I've only shown you three or four today, there's many, many more on there which you can also be using out of the box to get started super fast. We are completely open source, so we're available on GitHub as well, and we encourage contributions back to the project if you are feeling ambitious. We have a Google Group for more advanced technical questions which are group monitors, and of course, we've even got code planning glitch examples to help you get started with boilerplate code to understand how to take data from a webcam and pass it to some of our models.
And, with that, I encourage you to come join our community. If you check out the made with TFJS hashtag on Twitter or LinkedIn, you'll find hundreds of projects that people are creating every single week around the world, and I can't show them all in the presentation today, but here's just a glimpse of some of the great things that are going on elsewhere in the community. So, my last question for you is what will you make? Here's one final piece of inspiration from a guy from our community in Tokyo, Japan. He is a dancer, but he's used TensorFlow.js to make this cool looking hip-hop video as you can see on the slide. My point is machine learning is now for everyone, and I'm super excited to see how everyone else in the world will start to use machine learning now that it becomes more accessible. Artists, musicians, creatives... Everyone has a chance now to use machine learning, and if you do, please make use of that madewithTF.js hashtag so we can have you featured in our future presentations and blog post write-ups. Thank you very much for listening, and with that feel free to stay in touch. I'm available on Twitter and LinkedIn for further questions and I look forward to talking with you soon.