GPU Accelerating Node.js Web Services and Visualization with RAPIDS


The expansion of data size and complexity, broader adoption of ML, as well as the high expectations put on modern web apps all demand increasing compute power. Learn how the RAPIDS data science libraries can be used beyond notebooks, with GPU accelerated Node.js web services. From ETL to server side rendered streaming visualizations, the experimental Node RAPIDS project is developing a broad set of modules able to run across local desktops and multi-GPU cloud instances.



Hi, and welcome to GPU Accelerating node.js Web Services and Visualization with Rapids. I'm Alan Enemark, and I am the lead in the Rapids Viz team here at NVIDIA. So Rapids is an open source GPU accelerated data science platform, and you can find more details at and And Node Rapids, which is the project I'm going to be talking about, is an open source modular library of Rapids and Included Bindings in node.js, as well as some other complementary methods for supporting high-performance browser-like visualizations. It's currently in technical preview, but you can find more details about it at slash slash node. And really, our main goal in this framework is creating something that can accelerate data science and visualization pipelines fully in javascript and typescript, which is something that is traditionally done mostly in, say, Python. Our second goal is bringing GPU acceleration to a wider variety of node.js and JS utilities, since we feel like the general community is sort of getting not as much access to these high-performance tools as we'd like. So what do you get with Rapids, which is traditionally Python and C++? You get these data science libraries, such as DataFrame Operations in CUDF. You get CUML, which is a lot of GPU-accelerated machine learning algorithms. CUGraph for graph stuff, Spatial, Signal, all the like. And more are being developed continuously, and these are continuously getting improved. The caveat being, these are mainly around Linux-based systems. So if you want to use Windows with them, you can. It has to be through WSL2, however. So what kind of libraries in the Viz ecosystem traditionally happen in Python? We have our own cuCrossFilter, which is a notebook-based cross-filtering tool where you can create these dashboards very quickly in a few lines of Python code, and then very quickly interact with hundreds of millions of rows of data in a pretty customizable way. We make extensive use of one of the other great Viz libraries out there called DataShader, which is great at server-side rendering hundreds of millions of points. All this is GPU-accelerated. And really part of this great ecosystem of Viz and Analytics tools, which sort of lie in the spectrum between your back-end C, C++, Python, that sort of transitions into just front-side JS. And really, when it comes to the data science and compute, and mainly analytic stuff, more on the performance side, it all sort of starts with the Python C++ side, and it sort of is like then translated into javascript for interface stuff. You have some that are a little bit more intermediary, but really it sort of starts there and then ends up in javascript, or just stays in javascript. What we're sort of proposing is the inverse. We're going to start with the JS libraries and then bring them back to this more high-performing back-end in node.js, so giving them access to CUDA and CUDF, Cougraph, all those sort of things. So our experience with this is sort of started a while ago when we were making lots of demos for Rapids, and in this case we were making a great mortgage visualization where you have DECGL and react, and it was kind of a very fast, nice interface, fits all kinds of different screens and all that. But the back-end was a mess. We had multiple languages, multiple servers, it sort of became unsustainable, and we just basically each gave up, said, oh well, let's just do it in Python and notebooks. But deep down we were really kind of sad because there's all these great JS Viz libraries and the kind of custom abilities you get using them that we sort of were lacking. And it's a shame because now you kind of have this two continental divides, right? You have your Python C++ CUDA land and your javascript typescript land, and there's this chasm between them where you're sort of separating the capabilities between them. So like one side you get direct access to hardware, most of the HPC, high-performance computing, data science, and compute libraries are in this space. Not the best usability because it's like a high learning curve, but this is the place to go for that high-performance stuff. And the other side, you kind of have the javascript typescript where you have your nice browser environment which is great for shareability and accessibility and compatibility. In my opinion, a little bit more refined visualization and interface libraries. But again, you don't get that performance because you're sort of bounded by the browser sandbox. So it's kind of a shame because you have data scientists and engineers and Viz folks in front of those, and they're kind of all siloed in their side, but they can mutually benefit from each other's tooling and experience. So hence Node Rapids, where we're hoping to give the node.js dev community the streamlined api to a high-performance data science platform, Rapids, without the need to learn a new language or environment. So you can then leverage Rapids and node.js features. You can accelerate the catalog of great JS Viz libraries already out there without major refactoring. You can learn locally or through cloud instances, and it's sort of well-suited for accelerated Viz apps, Node service apps. And again, you help enable these two kind of communities to more closely work together, vice versa. So that's sort of the high ideals. What's the actual meat and bones of this thing? Well here it is, Node Rapids. It's a very modular kind of library. So it's a buffet style, you kind of pick and choose what you need for your use case. It's kind of organized in these main categories, the main being the memory management that gives you access to CUDA, so GPU memory. We also have a kind of really nice SQL engine that we bind these to, which enables us to do multi-node, multi-GPU stuff when needed. Then there's a whole data science wing, all in Rapids. So you have your CUDF and Cougraph stuff. And then this is the graphics column here, where we're sort of taking advantage of the fact that webgl is a subset of OpenGL. So really what we're doing with these bindings is, you can take your webgl code and you can then run it in OpenGL and get the benefit of that performance increase from OpenGL. We're also doing things with GLF bindings and Node processes. But again, you can now use your LumaGL, DECGL, SigmaJS, we're hoping to get two and three JS and basically run them in OpenGL without much effort. So another component to this is, since you're already on GPU, you get the benefit of GPU video encoding. And so by taking advantage of that and using WebRTC, you can then do server-side rendering, bring that over as a video to the browser, and then just have the browser-side JS interact with it like a video tag. And it's very lightweight in that sense. And so it kind of enables a lot more capabilities in that sense. Or you can just do stuff like, you know, interacting with all this stuff in a notebook. So what do we mean by kind of architecture? GPU hardware, it's a little bit different. You know, it's pretty straightforward when you have a single GPU and just doing client-side rendering. So all the compute happens on the GPU, you send those computer values over and the client JS renders it, those few values, you know, pretty straightforward and sort of where GPUs excel. It's excelled so well that you can have multiple users accessing the same kind of GPU data, and it's fast enough to handle that. Or if you have particularly large data, NVIDIA has NVIDIA NVLink. And so you can kind of link multiple GPUs together and kind of get expanded memory and compute power that way. Or you can go into more kind of traditional cloud architecture where you just sort of have multiple GPUs all separated. You know, you have lots of child processes running on each GPU, load balancer running across all of those, and so lots of instances of people accessing it. And basically, whatever GPU is free at that moment is the one that is serving up the information to that user. So pretty straightforward. Again, you still tend to need a lot of GPUs, but not something terribly unfamiliar. So now taking advantage of the server-side rendering and streaming component, if you have some pretty heavy workloads, then you can then separate out those tasks for each GPU. So one can do solely doing the compute stuff, and one can do the rendering and encoding part to the client-side video. And again, it can do this well enough that you can have multiple people accessing the same system and same data. The caveat here being that for consumer-grade GPUs, I think they limited to three NV encoding streams at once. So keep that in mind. So something that is kind of interesting and new, and a little bit more on the like wishful thinking but possible side, is just basically multi-GPU, service-side rendering, multi-user, like massive system, where in this case, it would be like a DGX, which has eight GPUs and all NV linked and all that. And you can divvy up the compute and rendering between them. You have tons of GPU memory. And basically, you have this mini supercomputer that you can then access and control with off-the-shelf javascript. So it's kind of wild. So you normally think you need some sort of HPC type software and all that, and all the overhead that involves learning that stuff. But really, there's nothing stopping you from leveraging a system this powerful with your run-of-the-mill javascript, which we think is pretty cool. But anyways, now for the good part, examples and demos. So, for this first one, we're going to keep it simple. We're basically just going to show off some basic ETL data processing stuff in a notebook, a notebook that is running node.js and JS syntax. But we're kind of basically using it as a placeholder for very common Node services that could be used for like batch services and stuff, which you'd need to do for parsing logs or serving up some sort of subsets of data, things like that. So in this case, we're going to do it live. Right now, I have a Docker container running in our Node Rapids instance and a notebook. And so here I have a Jupyter notebook and JupyterLab. If you're not familiar, it's sort of like the data science sort of IDE. And you can see here, all we're doing, like we did in any sort of Node app, is require our Rapids AI CUDF status of var. For us, we're going to be loading a 1.2 gigabyte sized dataset of US car accidents from Kaggle. And we're going to read this as a CSV into the GPU. And well, it was actually extra fast today. It only took under a second, which is kind of wild considering this is a 1.2 gig file. And so how big is this? It is 2.8 million rows with 47 columns. So for us, you know, not that big of a dataset, but for people typically used to doing data in like node.js and stuff, it's pretty good. And how fast and responsive it is, is kind of impressive. So we're going to just see, you know, what is the headers in this? It's sort of messy. And we're going to go through this pretty quick, but basically get rid of some of the ones you don't need. See again, we're going to then see what the columns are and say, all right, there's some temperature data in here. Let's parse this to see, you know, what the ranges are, sanity checks, and you say, all right, there's some really wonky values in there, as you always have with data stuff. Let's, you know, bracket it so it's sensible. And so this whole operation of, again, 2 million rows, 2.8 million rows took 20 milliseconds to kind of filter out. It's kind of wild. So now this is looking better. We have basically 2.7 million now. Oh no. And we can start doing some other operations. In this case, we're going to stringify it. And then with cuDF, you get some regex operations. And so we're going to have a pretty complicated regex here, where we sort of mask out commonalities between these terms. And again, 2.7 million rows, it took 113 milliseconds. So you can see how powerful and fast it is on a GPU. And basically at the end of this, you're saying that, yeah, you know, when there's clouds and rain, the accidents get more severe. But this is kind of just the first showcase of what you can do with GPUs and Rapids. And now you get access to it in node.js, which we think is pretty neat. And the JS syntax, which is kind of cool. So we're going to step it up a bit more. So the next demo is using SigmaJS, which if you're not familiar, is a graph rendering library that's on the client side. So it's pretty performant and pretty neat. They have a lot of functionality. But what we did is basically take out the memory, or graph loading into system memory and load it into GPU memory and then serve that up or stream it into the client size app. So in this case, ignore the numbers. It's actually 1 million nodes with 200,000 edges. And you can see it zooms in and pans like buttery smooth, even if it's in webgl. So you get that great interactivity and all that. But this is technically loaded onto GPU memory. So that means that you're not bottlenecked by web browser limitations. If without using the GPU, you basically are limited to 500,000 nodes, and then the tab craps out on you. So as you can see here, it's using the GPU or the nodes and edges and straightforward SigmaJS with very little changes. So what we're hoping to do in the future is enable lots of SigmaJS users to get that performance optimization and memory access without really needing to change much of how their code works. For this next one, we're something more similar. We're going to do a server-side compute and a client-side rendering of a geospatial visualization. It's using a DeckGL and Uber data set of 40 million rows. So not bad, still not that big for us. And you can see basically it's all computed into source and destinations. So by clicking on each of these areas, you're kind of computing how many of those trips went where and what time. And you can see that clicking on each of these, it's essentially instantaneous. And you get the results from 40 million rows. So fast, you can basically just start scrubbing the values and getting that interaction back. So what this basically means is you can get away with having multiple instances accessing the same GPU and the same data set. And because this is a react app, the state's managed on the client side. So that means they all get their unique view of the data. But you're still querying that same GPU. And because it's so fast, it gets the values back basically still in real time. And so again, like I said, really where the GPU excels is doing the server-side compute and the streaming over the values. So you can get away with quite a lot even on a single GPU. So next one is a bit more complicated. You can see we're sort of building up modules as we go. This was a video streamed server-side rendered and compute of a graph visualization that's so basically then streamed to the client as a video. So we're using LumaGL javascript library to then convert it to OpenGL, encode that with NV encoding, stream it with WebRTC. We're going to use CUDF for the data frame and CUDGraph for the graph layout compute. So you can see we're sort of starting to wrap this in a little bit more of a GUI app. In this case, we're just using a sort of sample data set. We're selecting those edges, colors, and then we're going to render it. So we have this giant sort of hairball, but it is 1 million edges and 1 million nodes that you're able to basically filter and query real time. And so you can see how quickly query it, clear the filter, you're back to a million nodes, a million edges, and you can basically tool tip and hover over each of these. Now visually, it doesn't really make that much sense, but this is more again for a kind of performance benchmark and how you can still get each individual node. And now we're going to run Fort Statless 2 in real time. So Fort Statless 2 is basically a layout algorithm. And so it's going to be computing the layout for every single node and every single edge in real time and all the forces are being calculated. And you can see this is doing it on again, a million edges and a million nodes. And it's all streamed to video to the browser. You can see it moving through the iterations in real time, which is pretty wild. Again, this is just a single GPU and you're able to interact with this much data. And I kind of love looking at this, this is sort of like looking at the surface of the sun, which is pretty neat. For the last one, basically very similar, except we're going to be using point cloud data instead. We don't need to do any like graph layout and stuff, but similar thing where we're doing video streams. So in this case, it's a point cloud of kind of a building scan. So we can pan and zoom and interact with it. But again, because GPUs are pretty powerful, I'm just going to duplicate this tab and have multiple instances of the same dataset from the same GPU. But I'm again, able to interact with it uniquely for instance, because of how you're handling the data. And so again, just the video stream, pretty cool. So you can use a single GPU for a lot. So that's a lot to talk about. We have even more demos and examples. This is the location of that. We have things where we're showing graph renderings with the GLFW. We have all the DEC GL demos and local rendered OpenGL. We have a really good multi-GPU SQL engine query demo. So we are querying all of English Wikipedia using multi-GPUs. We have examples showing the clustering UMAP algorithm and spatial quadrary algorithms. So they're pretty cool. Recommend you check them out. If you don't even install it, we have links to YouTube demos of the videos. So what's next? Moving forward, we have three main guiding roadmaps ideas. We're going to continue doing these demos and working on doing external library bindings. So again, looking forward to doing 3JS and some more work with SigmaJS. From that, we're going to start doing more specialized applications, specifically on cyber and graph, geospatial point clouds, stuff like that. Maybe something that makes it a little bit more easy for turnkey usage. And then hopefully more broader community adoption, specifically in non-Viz use case. We're sort of biased being the Viz team for Viz stuff, but we know that providing these CUDA bindings has an opportunity for a lot more functionality that we're just going to not think of because we're not aware of them. So we'd love to hear back from the community of novel ways to use this. So yeah, RAPIDS is a pretty incredible framework, and we kind of want to bring this capability to more developers and more applications. We feel like the javascript and node.js dev communities can take advantage of some of the learnings and performance from the data science community. So that's what we're trying to do, is bridge the gap between the two, in this case with a lot of Viz use cases. But this is what we're working on next. Mainly better developer UX. So it's a bit of a bear to install right now. Or if you're just not familiar with Docker images, it can be a little bit of a learning curve. So we're hoping to get this installable through npm. It's going to be a bit of a work for us, but we know that'll make it a lot more approachable to javascript devs. Like I said, we're going to make a couple more visualization applications that are going to be a little bit more turnkey for more just general analysts. And hopefully we're going to get full Windows WSL2 support. So currently you can run node.js RAPIDS on WSL2 and Windows, but there's no OpenGL support from NVIDIA yet. They're working on it. So you can do the compute side stuff, but not the rendered side stuff. And then maybe native Windows support. This is a little bit of a long shot, but it would be pretty neat for us to be able to have this so you don't need to use WSL2 or anything. So with that, if any of this sounds interesting or piqued your curiosity, come check it out. GPU accelerated data science services and visualization with node.js and RAPIDS. It's still kind of technical preview. We're working out the kinks and still pretty new, but we'd love to hear feedback about all kinds of interesting use cases you might have for us. And we're very responsive to that stuff. So hoping to hear from you and thanks for listening.
26 min
20 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic