Using MediaPipe to Create Cross Platform Machine Learning Applications with React

Rate this content
Bookmark
Slides

This talk gives an introduction about MediaPipe which is an open source Machine Learning Solutions that allows running machine learning models on low powered devices and helps integrate the models with mobile applications. It gives these creative professionals a lot of dynamic tools and utilizes Machine learning in a really easy way to create powerful and intuitive applications without having much / no knowledge of machine learning beforehand. So we can see how MediaPipe can be integrated with React. Giving easy access to include machine learning use cases to build web applications with React.

20 min
05 Dec, 2022

Video Summary and Transcription

Welcome to a talk on using MediaPipe for cross-platform machine learning applications with ReactJS. MediaPipe provides ready-to-use solutions for object detection, tracking, face mesh, and more. It allows for video transformation and tensor conversion, enabling the interpretation of video footage in a human-readable form. MediaPipe utilizes graphs and calculators to handle the perception pipeline. Learn how to use MediaPipe packages in React and explore a demo showcasing the hands model for detecting landmarks. Custom logic can be written to detect open and closed landmarks, making it useful for applications like American Sign Language.

Available in Español

1. Introduction to MediaPipe

Short description:

Welcome to my talk at React Day Berlin 2022. I'm Shivaay, presenting on using MediaPipe for cross-platform machine learning applications with ReactJS. MediaPipe is an open source framework that allows for end-to-end machine learning inference and is especially useful for video and audio analysis. It provides acceleration using system hardware and can be used across multiple platforms.

Welcome, everyone, to my talk at React Day Berlin 2022. I'm Shivaay, presenting virtually on the topic of using MediaPipe to create cross-platform machine learning applications with the help of ReactJS. I'm a Google Code Mentor at MediaPipe and also on TensorFlow.js working group lead, and you can connect with me on my Twitter, how-to-develop. And without wasting any further time, let's get started.

So today we see a lot of applications of machine learning everywhere. And this is especially true for web applications with the advent of libraries like TensorFlow.js, MediaPipe. There are a lot of these full stack applications that utilize the machine learning capabilities in their web apps, and we are seeing those also in production with a lot of startups and even companies like LinkedIn, which are using machine learning to power up multiple applications. And that's because of the fact that machine learning is so versatile that it could use it for a number of different applications. And here are some of the common areas where you see the use of machine learning. And especially one thing that is common amongst all of these applications. We can see from the left-hand side we have some people utilizing the face detection on the iPhone XR. You can see some points that are able to detect your hands. Then you can see some really cool effects with web and we can see some facial expressions. And then you have the next cam that uses the camera to be able to detect objects. Then we have OkGoogle or Google assistant and even things like Raspberry Pi, Coral, Edge TPUs. So all of them have one thing in common and that common thing is they are being powered with the help of machine learning and that with the help of MediaPipe.

So what is MediaPipe? MediaPipe is an open source cross-platform framework that is used for building perceptions and dedicatedly towards video and audio based perceptions. So just think of it in this line that in case you want to build an end-to-end machine learning application, so MediaPipe allows you to actually prepare not only your datasets, but also it allows you to go through the entire machine learning inference. That means that not only getting your objects that will be used for the detection, but also then being able to get the visualizations and the outputs for a particular model that you might be running. Because in a typical machine learning scenario or typical machine learning algorithm, you'll start off by taking some input data and you will run a machine learning model on top of it and then you'd be getting some inference. So MetaPy allows for an end-to-end pipeline for being able to do machine learning inference. And it's especially useful for analyzing video or audio data. And today we'll be seeing some examples where you could actually use it for a live audio live video or a camera based scenario. And there are of course a lot of different features that come out of the box. So it provides end-to-end acceleration. That means that MetaPy can use your system's CPU or GPU as well. And the underlying technology, especially if you're using it with JavaScript is that it uses WebAssembly on the backend. And that means that with the help of WebAssembly you can also leverage the use of your system hardware to be able to accelerate and improve the performance of the inference of the machine learning models. And one of the great things is that you just need one MediaPipe pipeline and one MediaPipe model, and it can be used in multiple areas because MediaPipe is supported across multiple frameworks, including JavaScript, Android, iOS, and other platforms, and you can also actually deploy it on platforms like Raspberry Pi for IoT or Edge applications. And there are ready to use solutions.

2. Exploring MediaPipe Solutions

Short description:

We'll explore ready-to-use MediaPipe solutions that cover object detection, tracking, face mesh, human pose tracking, and more. These solutions are being used in various applications, such as virtual exercise tracking and augmented reality-based lipstick techniques. MediaPipe provides end-to-end machine learning pipelines that can be easily integrated into your programs. Check out MediaPipe.dev for more information and examples of how MediaPipe is used in JavaScript and other platforms.

That means we'll be exploring some of these MediaPipe solutions in a bit. And these are completely ready to use. You just have to import them inside of your functions, inside of your programs.

For example, if you're using JavaScript, you just have to import the actual function and you'll be able to use it very quickly and all of these different solutions that we'll be exploring are completely open sourced. So in case you are interested to level with them, you can also check out their code base and they can apply them for your own use case.

And here are some of the solutions that are currently there. And when we, again, just kind of a quick reminder that when we talk about solutions, these essentially are end-to-end machine learning pipelines. That means right from being able to detect and then also classify or get your inference running, up and running, all of that is handled with the help of these machine learning pipelines provided by MediaPipe. So you have some standard models that you will also see in Python. Things like object detection, object tracking, but also being able to do things like face mesh or human pose tracking, place post, all of these are really being used by a lot of different startups that are basically providing electronic or e- being able to do like, you know, gym or being able to do your exercises and keeping a track of your exercises virtually with just the help of your webcam to do things like, you know, rep counts or like having an E physiotherapist in your, so they're being utilized for those.

Then we have the face detection, which is being used by companies like L'Oreal for augmented reality based lipstick techniques. So a lot of these solutions are already being used in production. And then of course you see some even more solutions and you can of course take them out on the MediaPipe website. It's called MediaPipe.dev. So you can just visit that and check out all these different solutions. Things like the selfie segmentation solution has been used in Google meet to put up virtual backgrounds. So these are just some of the solutions that you can directly use and embed inside of your program. And of course, these are some of the examples that we can share. So you can see on the left hand side, the one that is being very similar to the one in LogL that you can use an augmented reality based lipstick. Then you can see some augmented reality based movie trailers in YouTube. You can use Google Lens that is able to basically add virtual reality-based or augmented reality-based objects in front of you using computer vision. So these are some examples where MediaPipe is being used for not just applications for JavaScript, but also for other platforms as well. But of course, I'd like to also break down how this inference is actually taking place in the first place. So for that, let's take a look at a live perception. And for that, we'll take the example of a hand tracking algorithm. So the idea is that if you were to use a webcam and you use your hand in front of the webcam, it should be able to detect something that we call as these landmarks. So basically the idea is that you will take the image or a video of your hand and the ML model are typically basically the media pipe pipeline will be able to get these specific landmarks. And these landmarks are usually denoting the different joints inside of your hand. And you'll be able to overlap the landmarks on top of your hand so that it detects your hand and it detects the exact location of the landmarks and then superimposes them to it. So that's what we are trying to do with the meaning of actually localizing your hand landmarks.

3. Video Transformation and Tensor Conversion

Short description:

The idea is to transform the video image into tensors, which are n-dimensional mathematical arrays used for machine learning. These tensors undergo transformations and are converted into landmarks that are superimposed on the captured image. This process is the basis of media pipe solutions, allowing for the interpretation of video footage in a human-readable form.

So the idea is that first you will take your video. So this is the video, it could be a webcam footage or it could be a Nest cam footage for that matter. And then what we do is that we transform this image to a size that will be used by the machine learning model. So let's say if your original size is 1920 by 1080 so it will go a transformation where we are resizing the image and recruiting it so that it fits the expected size from the machine learning model. Then what we do is we convert the image that we have to our tensors. Now, if you are aware, or if you have not heard about tensors, so tensors are like the building blocks of deep learning or TensorFlow. If you have probably heard about the word TensorFlow before, or if you haven't, so it's kind of like large numerical arrays and what we are essentially doing is that we are converting our image into these large n-dimensional mathematical arrays. Which will be, again, they will undergo some transformations to be used for our machine learning property. And then once these have been converted to the tensors, we'll run the machine learning inference on top of these tensors that will, of course, make some changes inside of the tensors and then what we'll do is that we'll basically go ahead and convert these tensors into the landmarks that you see on the right-hand side and then we'll render this landmark on top of the image that has captured your hand and once you get that you'll finally end up getting a video output where you see something like this where the landmarks have been superimposed on top of the hand. So this will be a similar kind of perception pipelines for different solutions, media pipe solutions that you use, but the typical idea of utilizing your video footage, capturing the input and converting it into the mathematical tensors and running the inference on top of it, making some changes to these tensors and finally getting it back so that it's understandable in a human interpretable form is what is being utilized with the help of media pipe.

4. Media Pipe Graphs and Calculators

Short description:

In media pipe solutions, there are nodes and edges that form a graph representing the perception pipeline. Each node represents a media pipe calculator connected by streams. These calculators handle the flow of packets and perform the necessary calculations. Input and output nodes handle incoming and outgoing packets. For more visualizations, visit wiz.mediapipe.dev.

So now we just go and talk a bit more about usage of graphs and calculators. So basically whenever we talk about any media pipe solution, there are primarily two different talking points or two different points to consider. So first one is a midi pipe graph and that kind of denotes the entire end to end perception pipeline that we just, spoke about, or we showcased for the hand detection examples and, inside of this graph, if, if you are aware of how graphs in data structures work, so there are edge and nodes.

So in, in case of a media pipe graph, each and every node depicts kind of a unique media pipe calculator and, like, very similar to how nodes are being connected by what I says in the case of, in the case of media pipe, these two nodes will be connected by something called a stream. And these nodes are basically accepting new inputs or packets that contain the information about the pipeline that we are running. And again, all these calculators are written in C++.

So whenever you're doing any kind of an inference or doing the actual perception pipeline, all of that is being handled with the help of the flow of the packets inside of these nodes. And the calculations are being taken care of with the help of the media pipe calculators that are present at these nodes. And you'll have both your input and your output nodes. So input ports will take care of the incoming packets and the output nodes will take care of the outgoing resultant packets that you will end up seeing as the output. And again, if you are more interested to see different types of visualizations, you can take a look at wiz.mediapipe.dev to see the perception pipelines for different types of media pipe solutions. So I'll definitely recommend to check that out in case you're interested to know what's going on behind the scenes. Again, it's not required if you're just coming from a JavaScript background, you can directly use the solution. But in case you want, if you are interested, you can definitely check that out as well.

5. Using Media Pipe Packages in React

Short description:

Learn how to use media pipe packages within React by importing the specific NPM packages for the desired module or solution. Additionally, install other necessary NPM packages, such as camera utils, for capturing camera footage. Utilize the webcam, reference, and canvas to run the model and perform inference on the webcam footage. Explore a demo showcasing the hands model, which detects landmarks of the hand and superimposes them. Import the hands solution from MetaPipe and set up landmarks and fetch the camera in the index.ts file.

Now, of course, coming to the most important part, and that's how to use these media pipe packages within React. So there are a number of different media pipe solutions alongside their NPM packages that you can directly embed inside of your React code. So, some of them, the most common ones, are over here on the screen that you can see, including face mesh, face detection, hands, holistic, object drawn pose.

So, again, they can be used for a multitude of different applications. And in case you are interested, you can check out the specific examples that are being provided by the media pipe team on mediapipe.dev slash demo with whichever demo that you want.

So if you were to integrate it with the React JS code, this is very simply that you'll do it. Since we have the NPM packages, the first thing that you'll do is you'll go ahead and import the specific NPM packages for that specific module or that specific solution that you want to use. And apart from that, there will be a number of other NPM packages that you will have to install. So one of them is the camera utils. Again, you'll find the documentation for this on the MediaPipe in JavaScript website and these are used just for being able to capture your camera footage and then use the frames to be run from the inference and you can see in the lower half we are using the selfie segmentation model.

So you can see that we first used the webcam, the reference, and the canvas. So first we go ahead and run our model and it is able to locate the file where we have actually utilized the actual machine learning model. And then it will take your webcam footage, it will capture the input and run the inference on top of it and render it on top of the canvas. So that's a simple code and you can see that again within 10-29 codes you are able to create an end-to-end machine learning solution.

And of course now we'll go ahead and take a look at a demo that I created. So basically the demo is pretty much the hand perception that we created. So we here I have like let's say I bring up my hand and you can see that it says that it's open. And if I close it, you can see that it changes the label to closed. So let's kind of see how this will actually work. Right. So the model that we're going to be taking a look at is basically the hands model that as we showcased in the perception pipeline is able to detect the landmarks of your hand and then it superimposes them. And here I have my code that I'm running in a github.code space. So the most important one that we're going to be seeing is the index.ts file. So this is where you'll see that we have imported the hands solution from MetaPipe. And over here I have two different objects that I'm using. One is the hand itself and the hand connection. So these basically are all the list of different points or the landmarks that we have. So pretty much like the example that we showed in the actual slides. We are first taking some important constants and that's primarily the setting up our landmarks and fetching our camera, the video element. And of course, the first thing that we're doing is that since it's going to be detecting hands, so we just run this on top of our canvas.

6. Rendering Landmarks and Conclusion

Short description:

In the demo, we render landmarks on a canvas, process the footage, and draw the image on top. The hands model is loaded, the camera is initialized, and the results are rendered, connecting and superimposing the landmarks. Custom logic can be written to detect open and closed landmarks. The hands demo can be used for various applications, including American Sign Language. Visit MediaPipe in JavaScript for more information and examples.

So inside the demo we are rendering all of this on top of a canvas element. So we basically go ahead and render our landmarks to kind of see how many landmarks do we actually get inside of our canvas. And then what we're doing is that we are basically going ahead and drawing the actual image on top of our canvas. And this is where we are primarily just processing our actual footage.

So if you take a look at from code lines 36 to 48, this is where when you bring your image and in this case, like in the webcam footage near your webcam, then it will be able to fetch the landmarks and then find the coordinates for each of these landmarks since it's a 2D image. It will fetch the X and Y coordinates of each and every landmark and it will keep them stored inside of an array. And then what we're doing is that we are drawing a rectangle. So as you can see that when I bring my hand up inside the demonstration, it will actually go ahead and render these landmarks by fetching the X-coordinate and Y-coordinate of each and every landmark that finds and it will render it on top of the actual image that is being rendered on top of the canvas.

And this is where we are basically loading the actual model and that's the hands model. And this is where we are initializing our camera. So when we run the demo, this is where we initialize our camera and then we have a couple of functions that we run including loading the hands model. And then finally, what we are seeing is we are rendering the results using the async function on results, which is basically capturing your footage. And it's basically going ahead and rendering the landmarks and connecting them, ensuring that they're matching or being superimposed on top of your footage. So this way what you can do is that of course, this is one example for just being able to run the hands demo. And of course the separate logic that has been put on top of this is to be able to detect when the landmarks are closed or open.

So, of course, in this case, the logic that I use is that when all the landmarks are not overlapping with each other, then it should just print it as an open. But in this case, when the landmarks are overlapping with each other, we can see that I'm printing the label as closed. So depending on the need, or depending on how you want to use this particular model, you could go ahead and write your custom logic in JavaScript itself to see if, like, let's say you are, because each of these different landmarks have their unique coordinates. So you could do a lot more with this specifically, like, let's say, if you wanted to create something like an American Sign Language, so you could train your model in such a way that depending on the landmarks or the positions of your landmark, and the way they are oriented, you could create an entire end-to-end American Sign Language demonstration as well with the help of the hands demo as well, or the hands self-segmentation model, hands segmentation model. So that's, of course, you know, totally up to you in terms of how you want to do it.

So, basically, going back to our screen. So this is basically the quick demonstration that I want to showcase. So with that, I like to conclude my talk. And of course, in case you have any questions about how to get started with MediaPipe in JavaScript, definitely you can reach out to me and I'll recommend you to check out the MediaPipe in JavaScript, where you'll find a list of all the different solutions and their respective NPM modules. And, of course, you'll see some working examples that are already out there. With that, I like to conclude and thank you so much. And I hope to see you in person next year at React Dev Berlin.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

React Advanced Conference 2022React Advanced Conference 2022
25 min
A Guide to React Rendering Behavior
Top Content
React is a library for "rendering" UI from components, but many users find themselves confused about how React rendering actually works. What do terms like "rendering", "reconciliation", "Fibers", and "committing" actually mean? When do renders happen? How does Context affect rendering, and how do libraries like Redux cause updates? In this talk, we'll clear up the confusion and provide a solid foundation for understanding when, why, and how React renders. We'll look at: - What "rendering" actually is - How React queues renders and the standard rendering behavior - How keys and component types are used in rendering - Techniques for optimizing render performance - How context usage affects rendering behavior| - How external libraries tie into React rendering
React Summit Remote Edition 2021React Summit Remote Edition 2021
33 min
Building Better Websites with Remix
Top Content
Remix is a new web framework from the creators of React Router that helps you build better, faster websites through a solid understanding of web fundamentals. Remix takes care of the heavy lifting like server rendering, code splitting, prefetching, and navigation and leaves you with the fun part: building something awesome!
React Advanced Conference 2023React Advanced Conference 2023
33 min
React Compiler - Understanding Idiomatic React (React Forget)
Top Content
React provides a contract to developers- uphold certain rules, and React can efficiently and correctly update the UI. In this talk we'll explore these rules in depth, understanding the reasoning behind them and how they unlock new directions such as automatic memoization. 
React Advanced Conference 2022React Advanced Conference 2022
30 min
Using useEffect Effectively
Top Content
Can useEffect affect your codebase negatively? From fetching data to fighting with imperative APIs, side effects are one of the biggest sources of frustration in web app development. And let’s be honest, putting everything in useEffect hooks doesn’t help much. In this talk, we'll demystify the useEffect hook and get a better understanding of when (and when not) to use it, as well as discover how declarative effects can make effect management more maintainable in even the most complex React apps.
React Summit 2022React Summit 2022
20 min
Routing in React 18 and Beyond
Top Content
Concurrent React and Server Components are changing the way we think about routing, rendering, and fetching in web applications. Next.js recently shared part of its vision to help developers adopt these new React features and take advantage of the benefits they unlock.In this talk, we’ll explore the past, present and future of routing in front-end applications and discuss how new features in React and Next.js can help us architect more performant and feature-rich applications.
React Advanced Conference 2021React Advanced Conference 2021
27 min
(Easier) Interactive Data Visualization in React
Top Content
If you’re building a dashboard, analytics platform, or any web app where you need to give your users insight into their data, you need beautiful, custom, interactive data visualizations in your React app. But building visualizations hand with a low-level library like D3 can be a huge headache, involving lots of wheel-reinventing. In this talk, we’ll see how data viz development can get so much easier thanks to tools like Plot, a high-level dataviz library for quick & easy charting, and Observable, a reactive dataviz prototyping environment, both from the creator of D3. Through live coding examples we’ll explore how React refs let us delegate DOM manipulation for our data visualizations, and how Observable’s embedding functionality lets us easily repurpose community-built visualizations for our own data & use cases. By the end of this talk we’ll know how to get a beautiful, customized, interactive data visualization into our apps with a fraction of the time & effort!

Workshops on related topic

React Summit 2023React Summit 2023
170 min
React Performance Debugging Masterclass
Top Content
Featured WorkshopFree
Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)
React Advanced Conference 2021React Advanced Conference 2021
132 min
Concurrent Rendering Adventures in React 18
Top Content
Featured WorkshopFree
With the release of React 18 we finally get the long awaited concurrent rendering. But how is that going to affect your application? What are the benefits of concurrent rendering in React? What do you need to do to switch to concurrent rendering when you upgrade to React 18? And what if you don’t want or can’t use concurrent rendering yet?

There are some behavior changes you need to be aware of! In this workshop we will cover all of those subjects and more.

Join me with your laptop in this interactive workshop. You will see how easy it is to switch to concurrent rendering in your React application. You will learn all about concurrent rendering, SuspenseList, the startTransition API and more.
React Summit Remote Edition 2021React Summit Remote Edition 2021
177 min
React Hooks Tips Only the Pros Know
Top Content
Featured Workshop
The addition of the hooks API to React was quite a major change. Before hooks most components had to be class based. Now, with hooks, these are often much simpler functional components. Hooks can be really simple to use. Almost deceptively simple. Because there are still plenty of ways you can mess up with hooks. And it often turns out there are many ways where you can improve your components a better understanding of how each React hook can be used.You will learn all about the pros and cons of the various hooks. You will learn when to use useState() versus useReducer(). We will look at using useContext() efficiently. You will see when to use useLayoutEffect() and when useEffect() is better.
React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.
React Advanced Conference 2021React Advanced Conference 2021
145 min
Web3 Workshop - Building Your First Dapp
Top Content
Featured WorkshopFree
In this workshop, you'll learn how to build your first full stack dapp on the Ethereum blockchain, reading and writing data to the network, and connecting a front end application to the contract you've deployed. By the end of the workshop, you'll understand how to set up a full stack development environment, run a local node, and interact with any smart contract using React, HardHat, and Ethers.js.
React Summit 2023React Summit 2023
151 min
Designing Effective Tests With React Testing Library
Top Content
Featured Workshop
React Testing Library is a great framework for React component tests because there are a lot of questions it answers for you, so you don’t need to worry about those questions. But that doesn’t mean testing is easy. There are still a lot of questions you have to figure out for yourself: How many component tests should you write vs end-to-end tests or lower-level unit tests? How can you test a certain line of code that is tricky to test? And what in the world are you supposed to do about that persistent act() warning?
In this three-hour workshop we’ll introduce React Testing Library along with a mental model for how to think about designing your component tests. This mental model will help you see how to test each bit of logic, whether or not to mock dependencies, and will help improve the design of your components. You’ll walk away with the tools, techniques, and principles you need to implement low-cost, high-value component tests.
Table of contents- The different kinds of React application tests, and where component tests fit in- A mental model for thinking about the inputs and outputs of the components you test- Options for selecting DOM elements to verify and interact with them- The value of mocks and why they shouldn’t be avoided- The challenges with asynchrony in RTL tests and how to handle them
Prerequisites- Familiarity with building applications with React- Basic experience writing automated tests with Jest or another unit testing framework- You do not need any experience with React Testing Library- Machine setup: Node LTS, Yarn