Discover how to embrace machine learning in JavaScript using TensorFlow.js in the browser and beyond in this speedy talk. Get inspired through a whole bunch of creative prototypes that push the boundaries of what is possible in the modern web browser (things have come a long way) and then take your own first steps with machine learning in minutes. By the end of the talk everyone will understand how to recognize an object of their choice which could then be used in any creative way you can imagine. Familiarity with JavaScript is assumed, but no background in machine learning is required. Come take your first steps with TensorFlow.js!
TensorFlow.JS 101: ML in the Browser and Beyond

AI Generated Video Summary
JavaScript with TensorFlow.js allows for machine learning in various environments, enabling the creation of applications like augmented reality and sentiment analysis. TensorFlow.js offers pre-trained models for object detection, body segmentation, and face landmark detection. It also allows for 3D rendering and the combination of machine learning with WebGL. The integration of WebRTC and WebXR enables teleportation and enhanced communication. TensorFlow.js supports transfer learning through Teachable Machine and Cloud AutoML, and provides flexibility and performance benefits in the browser and Node.js environments.
1. Introduction to Machine Learning in JavaScript
Hello, everyone. My name is Jason Mays, I'm developer advocate for TensorFlow.js here at Google. We can use machine learning anywhere JavaScript can run, and that's actually a lot of places. JavaScript is one of the only languages that can run across all of these environments without additional plugins. And with TensorFlow.js, you can run, retrain via transfer learning, or write your own models from scratch if you choose to do so. Just like you might be doing in Python right now, but in JavaScript. And that means you can make anything you might dream up. From augmented reality, sound recognition, sentiment analysis, and much, much more.
Hello, everyone. My name is Jason Mays, I'm developer advocate for TensorFlow.js here at Google, and I'm excited to be talking to you about machine learning in the browser and JavaScript. So, let's get started.
Now, first off, why would you want to use machine learning in JavaScript? That's a great question. And if we look here, we can see that we can use machine learning anywhere JavaScript can run, and that's actually a lot of places. The web browser, server side, desktop, mobile, and even Internet of Things. And if we dive into each of these areas a bit more, we can see many of the technology we know and love already. The common web browsers on the left-hand side, Node.js, React Native for mobile native apps, Electron for desktop native apps, and even Raspberry Pi for Internet of Things. And JavaScript is one of the only languages that can run across all of these environments without additional plugins. And that alone is very, very powerful. And with TensorFlow.js, you can run, retrain via transfer learning, or write your own models from scratch if you choose to do so. Just like you might be doing in Python right now, but in JavaScript. And that means you can make anything you might dream up. From augmented reality, sound recognition, sentiment analysis, and much, much more. It's really up to you what you create.
2. TensorFlow.js: Pre-trained Models and Face Mesh
Now, with TensorFlow.js, you can use pre-trained models for various use cases, including object detection, body segmentation, post estimation, face landmark detection, and natural language processing in the browser. Object recognition using Cocoa SSD is showcased, allowing for real-time detection and classification of objects in images. Webcam integration enables live object detection and classification, running client-side in a browser, ensuring privacy and cost savings. The three-megabyte face mesh model can recognize 468 facial landmarks, enabling real-time applications like AR makeup try-on without physical presence. See face mesh in action during the demo.
Now, with TensorFlow.js, we've got a number of different ways you can use us. The first way and the easiest way is to use our pre-trained models. These are super easy to use JavaScript classes for many common use cases, like we can see on the current slide. And here, we can see things like object detection, body segmentation, post estimation, face landmark detection, and much, much more. Even natural language processing is supported in the browser.
So, let's see some of these in action. Now, first up is object recognition. This is using Cocoa SSD behind the scenes and is trained on 90 object classes. You can see this in action on the right hand side with the dogs being highlighted with their bounding boxes. And we can even know that there's two dogs in this image as both are returned to us. So, let's see this in action live to see how it performs in the browser.
Okay, so here's a webpage I created that's running this code live in Chrome, and if I click on any one of these images, I can now get object detection working for any objects it finds in those images, even if it's different object class types. But we can do better than this. We can actually enable the webcam and then do this live in real time. And you can see me talking to you right now, and you can see how it's classifying both myself and the bed in the background sometimes as I speak to you. And this is super cool, because not only is this running at a very high frames per second live in JavaScript, this is running client side in a browser. So, that means none of this webcam imagery is being sent to a server for classification. Not only does that save cost, it also means my privacy is preserved, and that's a really big thing these days. And with TensorFlow.js, you can get that for free.
So, next up, we've got face mesh. This is just three megabytes in size, and can recognize 468 facial landmarks on the human face. You can see this in action on the left-hand side of the slide right now. Now, people are starting to use this for real world use cases such as L'oreal, and they've created AR makeup Trion, which allows you to try on lipstick in this case in real time without even having to be there physically present in the store. So, you should note, the lady on the right-hand side is not wearing any lipstick. We're using face mesh to understand where her lips are, and then we use WebGL shaders to augment the color of lipstick she wants on her face in real time. So, this is super cool, and I'm sure we're gonna see more stuff like this coming out in the future. So, let's see face mesh in action to see how it performs in the real world. Let's switch to the demo. Okay. So, now you can see me talking to you with face mesh running in real time in the Web browser at the same time.
3. Machine Learning in Action and 3D Rendering
On the left-hand side, you can see the machine learning in action, with a mesh of my face being overlaid as I move. I can open and close my mouth and eyes in real time. The execution can be done on GPU via WebGL, Wasm on the CPU, or CPU alone. JavaScript allows rendering of 3D point cloud results using 3.js, making it easy to display information and manipulate objects in the browser.
On the left-hand side here, you can see the machine learning in action, and there's indeed a mesh of my face being overlaid as I move my face around, and it's pretty robust. I can open and close my mouth and my eyes, and you can see that happening all in real time. And I'm getting a pretty solid 25 frames a second or so as I'm running on my GPU via WebGL. Now, of course, I'm also streaming this live. So, I'm using some of my GPU resources right now. I could get higher frames per second if I desired. But I can also switch as a developer to Wasm, which is WebAssembly, to execute on a performant way on the CPU, or I can choose to execute on the CPU by itself, which is the slowest form of execution. So, there's a number of options on where you can deploy to. And because this is JavaScript, not only am I doing the machine learning there on the left hand side, I'm also able to render the 3D point cloud of the results on the right hand side using 3.js. And JavaScript from day one has been designed for the presentation and display of information. So, you've got really rich libraries, especially for 3D graphics to be able to do this very, very easily. And you can see I can move it around and now inspect my face from different angles, which is being constructed in realtime live in the browser.
4. Body Segmentation and Superpowers
Next up is body segmentation, which allows you to distinguish 24 body areas across multiple bodies in real-time. Pose estimation can be combined with body segmentation to enable powerful demos like recognizing poses or gestures. JavaScript allows rapid prototyping, as demonstrated by creating an invisible effect and a clothing size estimator. Combining face mesh with WebGL shaders can create stunning effects, such as lasers coming from the eyes and mouth. Combining TensorFlow.js with WebXR and WebGL enables extracting and manipulating body images in the real world.
Okay. Back to the slides. So, next up is body segmentation. This allows you to distinguish 24 body areas across multiple bodies all in realtime. You can see this is an action on the slide. And on the right hand side, you can see that the different colors represent different parts of each body.
Even better, we've got pose estimation going on at the same time, those light blue lines contained with each one of the bodies on the right hand side there, which allow us to estimate where the human skeleton is. And with that, that can enable really powerful demos such as the ability to recognize when you're in a certain pose or a gesture, or something like this. We've already seen people in our community use this to do workout instructors, or yoga yoga instructors, and this kind of stuff. So it's super cool to see the creative potential of this model.
And in fact, with a bit of creativity, we can use things like body pics in a number of delightful ways. Here's just two examples I created in my spare time. On the left-hand side, you can see me becoming invisible as I walk around on my bed. And notice when I walk on the bed, the bed still deforms. And this is more powerful than just a cheap trick where you're just replacing the background image with a static image. I'm able to calculate the background updates in real time and only remove my body from the parts where my body actually is. And of course, you know, it's not perfect, but it's the first steps, and this was made in just a day. So very, very cool stuff. You can see here how JavaScript can allow you to prototype ideas very, very rapidly.
On the right-hand side, I also made a clothing size estimator. Now, I don't know about you, but I'm really terrible at knowing what size clothing I am when I'm trying to buy clothes once a year. And for different brands, I have different sizes in some brands, I'm a small other brands, I'm a medium. So I never know what to select at checkout. Now here, I can in under 15 seconds get an estimate of my body size for the key measurements that they care about for that particular brand, and I can automatically select at checkout the correct size for me. And that saves me time and money of having to return things when they don't fit. And this solved a problem I had in my daily life.
Next up, what about giving yourself superpowers? One guy from our community combined our face mesh model with WebGL shaders, to create this Iron Man like effect. And here you can see lasers coming from his eyes and mouth in a really beautiful, realistic kind of way, which could be great for an activation with a movie company, or something like this, for a new movie release. Or what about if you combine with other technologies? Here we see another member of the community using WebXR and WebGL and TensorFlow.js together to extract an image of a body from a magazine and then bring that body into the real world, so they can inspect the fashion design in more detail. I've even seen this person go one step further, and make the face animate and say sounds, which is really, really cool.
5. WebRTC and Teleportation
By adding WebRTC to this, I can even teleport myself by segmenting my body using body pics from my room, transmitting that segmentation over the Internet, and reconstructing it in a real physical space using WebXR. This allows for more meaningful communication with friends and family, surpassing the limitations of rectangular video calls.
But why stop there? We can go one step further still. By adding WebRTC to this, which stands for Web Real-Time Communication, I can even teleport myself. Here, I can segment my body using body pics from my room. I can transmit that segmentation over the Internet and then reconstruct it in a real physical space using WebXR. And this allows me to speak to my friends and family in the current times where we're not able to travel as much in a more meaningful way than a rectangular video call. In fact, maybe in the future, my future presentations will be delivered to you in this form. Who knows? But some very exciting stuff ahead.
6. Using Teachable Machine for Transfer Learning
The second way to use TensorFlow.js is by transfer learning, which allows you to retrain existing models with your own data. Teachable Machine is a website that enables training and inference in the web browser, making it great for prototyping. You can select an image project, choose classes, and record samples using your webcam.
Now, the second way to use TensorFlow.js is by transfer learning. And this allows you to retrain existing models to work with your own custom data. Now, of course, if you're a machine expert, you can do this all programmatically. But today I want to show you two easier ways to get started.
Now, the first is Teachable Machine. This is a website that can do both the training and the inference completely in the web browser. This is great for prototyping, for things like object detection, pose estimation and sound detection. I think more models will be supported in the future. So, watch this space.
But let's see it in action to give you a flavor of how it works. Okay. So, if we head over to teachablemachine.withgoogle.com. You can follow along if you like. We can actually select one of three projects to choose from. Today, we're going to go for image project to recognize a custom object. So, we click on that, and we're then presented with a screen like this. On the left, we've got a number of classes for the object we want to recognize. If you want to recognize more than two things, we can click the nice add class button if we choose to do so.
But today, we're just going to recognize my face or a deck of playing cards. So, let's give it more meaningful names. I'm going to call the first one Jason to represent me, and the second class, I'm going to call cards, which represents the cards. All we need to do is allow access to our webcam, and you'll see a live preview pop up on the left-hand side for the first class. Now, I just need to record some samples of my face to make sure we have some training data for this class type. So, let's go ahead and do that. I'm going to move my head around to get some variety. There we go. And we can see that I've got how many images have I got there? About 38 sample images. Perfect. I'm now going to go and do the same thing with class number two, the deck of cards, and I've got here a nice set of playing cards. So, what I'm going to do is hold to record again, but this time I'm going to get roughly the same number of images of the cards.
7. Teachable Machine and Cloud AutoML
So, I've trained a prototype model using Teachable Machine that can detect JSON and playing cards with high accuracy. If this meets your needs, you can export the model and use it on your own website. However, for larger datasets, Cloud AutoML provides a more robust solution. By uploading images to Google Cloud Storage, you can train custom vision models optimized for accuracy or speed. Once trained, you can export the model as TensorFlow.js files and easily incorporate it into your own webpage.
So, I've got 42 there. That's close enough. All I need to do now is click on train model. And now, live in the web browser, this is going to attempt to categorize the training data that I've presented to it versus the ones it was previously taught on. I can see there in under 30 seconds, it's already complete. And it's currently predicting JSON as the output with 99% confidence. Which is pretty good. And if I bring my deck of playing cards up, you can see that switches to cards with 100% confidence. So, JSON cards. JSON cards. And you can see how easy that was to make and how robust it is in actually detecting those two objects.
Of course, this is a prototype. If this was good enough for what I needed, I can click on export model here. I can click on the download button. And of course, I can then copy this code and use it on my own website if I choose to do so. So, that's Teachable Machine. And great for prototyping. However, if you've got gigabytes of data, you might want to use something more robust for production quality models. So, let's go back to the slides and see how to do that.
So, cloud AutoML allows us to train custom vision models in the cloud. And we can deploy the TensorFlow.js at the end, which is super useful. So, all you have to do is upload folders of images that you want to classify to Google Cloud Storage, as you can see here. And then click on the next button. Once you do that, you'll be asked if you want to optimize your model for higher accuracy or faster predictions or some kind of tradeoff between the two. You can set a budget and leave it training for hours or days, depending on how much data you've got uploaded there. And it'll come back to you with the best results. It's going to try many different hyperparameters, many different types of computer vision models and try and figure out what works best with your data. Once it's ready, you can then click export and choose TensorFlow.js as shown here in the circle, which will download the model.json files, which you need to run in the web browser. And with that, you can then use it on your own webpage and add your own user experience and user interface and so on and so forth. You might be wondering, well, how hard is it to actually use this production-quality-trained model? And actually, it's super simple.
8. Code Walkthrough and API Structure
In this code walkthrough, we import the TensorFlow.js and auto ML libraries. Then, we load the image classification model using model.json. After that, we grab a reference to the image we want to classify and use the model to classify it. This allows us to perform various actions based on the predictions. TensorFlow.js offers the flexibility to write your own code and provides superpowers and performance benefits when used in the browser. Now, let's explore the structure of our API.
In fact, it's so simple it fits on one slide. So let's walk through this code. First, we're going to import the TensorFlow.js library using this script tag. Second, we're going to import the auto ML library with the second script tag. Below this, I've created an image, which is just a daisy image, which I found somewhere on the internet. This is the image we want to classify. This could be something else like a frame from the webcam or whatever it might be. But this, I've just taken for simplicity a daisy.jpg.
Now, the meat of the code is in these three lines of JavaScript here. The first one simply calls await tf.autoML load image classification. And then we pass to it model.json. So model.json here is the file we downloaded in the previous page. And this would be hosted somewhere on your content delivery network or your web server. Now, because this is an asynchronous operation, we have to use the await keyword in JavaScript to tell JavaScript to wait for that to finish before continuing sequentially thereafter. So once that's loaded, because the model might be a couple of megabytes in size, we can then move on to the next step.
So next, we grab a reference to the image we want to classify. So in this case, we call document.getElementById Daisy, which is referring to the Daisy image above here. And now we've got a reference to that in memory. All we need to do now is call await model.classify and pass it the image you want to classify. And this, again, is an asynchronous operation because this might take several milliseconds to execute, which of course in computer terms is a very long time. So we want to wait for that to finish and then we'll have a JSON object assigned to this predictions constant here on the left, which you can then iterate through and go through all the things it thinks it's found in the image. And with that, you can do whatever you like. You can trigger something to run. You could control a robot. You could do whatever you wanted to do just with a few lines of code. So super cool and super functional.
Now the third way to use TensorFlow.js is to write your own code. And of course, to go through that would be a whole different talk in itself. So today I'm going to focus on the super powers and performance benefits of why you might want to consider using TensorFlow.js in the browser. Now first up, I want to give you an overview of how our API is structured.
9. APIs, Environments, and Performance
We've got two APIs: the high level LES API, similar to Keras in Python, and the Ops API for lower-level mathematical operations. The APIs can run on different environments, including the client-side (web browser) and server-side (Node.js). The server-side execution supports loading Keras and TensorFlow Saves models without conversion. To run a saves model in the browser, use the TensorFlow.js command line converter. Performance-wise, Node.js execution is comparable to Python, and just-in-time compilation in Node.js can boost performance for pre- and post-processing.
We've got two APIs. One is the high level LES API, which is very similar to Keras if you're familiar with Python. In fact, if you use Keras, it's basically the same function signatures. So you should feel very much at home. And then for those of you who want to go lower level, we have the Ops API, which is the more mathematical layer that allows you to do things like linear algebra and so on and so forth. You can see how this comes together in the following diagram.
At the top there, we've got our premade models, which are sitting upon our LES API. That LES API sits on top of our Ops API and this understands how to talk to different environments, such as the client-side. And by client-side here, we mean things like the web browser. Now those environments themselves can execute on different backends. And in this case, we can execute on things like the CPU, which is the slowest form of execution, WebGL to get graphics card acceleration and Web Assembly or WASM for short, for improved performance on the CPU across mobile devices. And the same is true for the server side as well. We can execute using Node.js on the server side and this can talk to the same TensorFlow CPU and GPU bindings that Python has. So yes, that means you get the same AVX support and the same CUDA acceleration that you do in Python. And in fact, as we'll see later, this means the performance benefits are pretty much exactly the same as well. We execute as fast and sometimes faster than Python for certain use cases.
Now, if you choose to still develop your machine learning in Python, which many of you, of course, will, that's completely fine, too. Our Node.js implementation supports the loading of Keras models and TensorFlow Saves models without any kind of conversion. So as long as you're executing on the server side in Node, no conversion is required to use that and integrate with, say, a web team. So very convenient. And then if you choose to take your saves model and you want to run that in the web browser, then you'll have to use our TensorFlow.js command line converter to do so. That will convert the saves model format into the dot json format we need to run in the web browser. And that's only required if you want to run the client side in the browser. So let's talk about performance. Here, we can see for MobileNet V2, the average inference times for GPU and CPU. Looking at the GPU results there, you can see that for tf.python, we get a result of 7.98 milliseconds, and for Node.js, we get 8.81. So basically within a millisecond of each other, which is basically the same result. Now, the key thing to note here is that if you have a lot of pre- and post-processing, and you convert that to be written in Node.js, you'll get the benefits of the just-in-time compiler that's unique to JavaScript at runtime. And this can lead to significant performance boosts outside of the inference alone. So let's see how Huggingface used this.
10. Benefits of TensorFlow.js and Node.js
Huggingface converted their distill-burt model to run in Node.js, resulting in a two-times performance boost. Executing on the client side offers privacy, lower latencies, lower cost, interactivity, and the reach and scale of the web. JavaScript allows for rich presentations, quick prototyping, and a lower barrier to entry. TensorFlow.js in the browser can run on GPUs on 84% of devices. Node.js provides benefits on the server side.
So here, we can see how Huggingface converted their distill-burt model to run in Node.js. They basically converted the pre- and post-processing layers to be written in JavaScript in Node, and this led to a two-times performance boost over the Python equivalent. So what's important to note here is that we are using the saved model from Python, but by converting the pre- and post-processing layers, it gives us a two-time speed increase because of a just-in-time compilation of JavaScript in Node.
And on that note, if you are thinking about executing on the client side, there's also some superpowers to consider here as well, and these are hard or impossible to achieve on the server side in Node or Python. So the first one is privacy. If you're executing on the client side completely, then none of the sensor data is going to a server for inference, and that means the client's privacy is completely preserved, and that's very important for certain types of applications like medical or legal or if you're trying to comply with certain rules such as GDPR where you might not be physically allowed to transmit data to a different server. Second point, if no server is involved, you can achieve lower latencies. Typically it might take 100 milliseconds or more if you're using a mobile device to talk to a server and get the results. If you're using TensorFlow.js on device, you can cut that middleman out and essentially have lower latency for your inference times, resulting in a higher frames per second to allow real-time applications. The third point is lower cost. Because no servers are involved, you can save significant costs on hiring GPUs, RAM and CPUs which might be running 24-7 for a busy machine learning application. By doing this all on client side, you don't need to hire those pieces of hardware in the first place. You just need to have a standard web CDN to deliver the website. Fourth point, interactivity. JavaScript from day one has been designed to present information in a rich way. And we have mature libraries for 3D graphics, data visualization, charting, and much, much more. This allows you to prototype very complex things very, very quickly and this is one of the advantages of using JavaScript. And then the fifth point is the reach and scale of the web. Anyone in the world can click a link and use your machine learning model in a web browser. The same is not true if you want to do this in Node or Python. Because first of all, you have to understand how to install Linux. Secondly, you need to install TensorFlow. Then you need to install the crude drivers. And then you need to clone the person's GitHub repository, read their readme, and if all of that works in your favor, then you might have a chance of running their machine learning model. So, you can see how there's a much lower barrier to entry if your purpose is to get your research used by many people around the world. And that can be really great because it can allow you to identify biases or bugs that maybe could have gone overlooked if only ten people were using it instead of ten thousand. And of course, with TensorFlow.js in the browser, we can run on GPUs on 84% of devices due to WebGL. We're limited to just graphics cards, but we can run on AMD ones, too. And so on and so forth. And if we look at the server side, we can also see some of the benefits of running Node.js.
11. Benefits of TensorFlow.js and Community
It allows us to use the TensorFlow saved modal format without any kind of conversion or performance penalties. We can run larger models than we can do on client side. Allows you to code in just one language. Performance-wise, we got the same C bindings as the original TensorFlow in Python. Check out the resources on our website, GitHub, and the Google Group. If you want to go deeper, I recommend reading deep learning with JavaScript by Manning Productions. Join our community and share your projects with the madewithTF.js hashtag.
It allows us to use the TensorFlow saved modal format without any kind of conversion or performance penalties. And we can run larger models than we can do on client side. There are of course some GPU memory limits you might run into if you try and push a gigabyte model over the web to the client device.
Third point, allows you to code in just one language. If you're already using JavaScript, this is a big win. Currently 67% of developers use JavaScript in development already, according to the StackOverflow 2020 survey. And there's a large NPM ecosystem for Node.js with lots of modules and libraries coming out all the time, so great community support, too.
And then fifth point, performance, as we spoke about, we got the same C bindings as the original TensorFlow in Python, which can be used to get parity for inference speeds, and we've got the just-in-time compiler boost for the pre and post-processing if you choose to convert that over to Node.js. So with that, let's wrap up with some resources that you can use to get started, and learn more.
If there's one slide you want to bookmark, let it be this one. Here you can see all the resources you need to get started with TensorFlow.js. Our website at the top there, you can find many resources and tutorials to help you on your way. We've got our models available at tensorflow.org.js.models. I've only shown you three or four today, there's many, many more on there which you can also be using out of the box to get started super fast. We are completely open source, so we're available on GitHub as well, and we encourage contributions back to the project if you are feeling ambitious. We have a Google Group for more advanced technical questions which are group monitors, and of course, we've even got code planning glitch examples to help you get started with boilerplate code to understand how to take data from a webcam and pass it to some of our models.
So, with that, you can get started very, very quickly. Now, if you want to go deeper, I recommend reading deep learning with JavaScript by Manning Productions, and this is written by folk on my team and the TensorFlow team itself. It's a great book, and all you need is some knowledge of JavaScript, it assumes no prior knowledge of machine learning and it's a great resource to go from zero to hero.
And, with that, I encourage you to come join our community. If you check out the made with TFJS hashtag on Twitter or LinkedIn, you'll find hundreds of projects that people are creating every single week around the world, and I can't show them all in the presentation today, but here's just a glimpse of some of the great things that are going on elsewhere in the community. So, my last question for you is what will you make? Here's one final piece of inspiration from a guy from our community in Tokyo, Japan. He is a dancer, but he's used TensorFlow.js to make this cool looking hip-hop video as you can see on the slide. My point is machine learning is now for everyone, and I'm super excited to see how everyone else in the world will start to use machine learning now that it becomes more accessible. Artists, musicians, creatives... Everyone has a chance now to use machine learning, and if you do, please make use of that madewithTF.js hashtag so we can have you featured in our future presentations and blog post write-ups. Thank you very much for listening, and with that feel free to stay in touch. I'm available on Twitter and LinkedIn for further questions and I look forward to talking with you soon.
Comments