Monitoring, Alerting, And Visualizing your Node.JS server infrastructure with Open Source tools

Bookmark

When monitoring Node.js, you can track your applications’ performance and availability by finding bottlenecks and fixing errors. You can identify issues by specifically looking at metrics like Process memory usage, Average response time, CPU usage, and more. If you add monitoring of the other components of your entire stack, you will gain a comprehensive view of what could be impacting application performance.  At that point, it can point out the problem at the code level — allowing you to track down and fix those issues before they negatively impact end user experience.  This talk will focus on the tools available from the Open Source time series database InfluxDB. It will be using the open source node.js telegraf plugin, so you can easily collect key metrics to help you get that view into your application. We will be using the Node.js Monitoring Template which is prebuilt and equipped to monitor an applications' performance and availability. All code examples will be in Javascript, and we will also go over the javascript library for those who are working in other javascript server environments, or who want to export data to their preferred visualization tools. 



Transcription


Awesome. Thank you so much for the introduction. I have to admit the video kind of startled me. I didn't expect the noise. So we're going to go ahead and get started here. I'm going to be going over monitoring, alerting, and visualizing your node.js server infrastructure with open source tooling. So let's go ahead. We already got a pretty good introduction, but if you guys want to add me on LinkedIn, if you don't want to ask questions here, which I don't know why you wouldn't, because it's all very, what's the word? You can just ask them and not have to attach your name to them. Yeah. So it's really quite great. But otherwise, you're welcome to connect with me there. So let's go over really quick the session agenda, because we only have about 20 minutes here. So we're going to really quick go over why you want to use a time series database when it comes to your server monitoring. We're going to really quick talk over Influx data and platform tools that it comes built in with. So Influx data is a fully open source InfluxDB time series database. We also do offer a cloud, but we're just going to be talking about the open source during this talk. We're going to briefly talk about the javascript client library. We're going to talk about this awesome monitoring template that we offer for node.js server monitoring that you can easily install and get going on. I'm going to do a live demo where I kind of show off all of these pieces all put together. The reason I kind of put some code inside the slides and such is because otherwise, I might forget to show them to you, and it might not make quite as much sense. But we're going to try and give some extra time to that demo. And then finally, Q&A and further resources. So I will have like a final slide deck that will have a ton of the links that you can take a photo of with your phone, because I know people tend to take photos throughout. And so I do put them all at the end for you guys for more convenience. So the importance of time series databases. So time series data is any data that you want to deal with over a time span. Now obviously, right now, we're talking about server infrastructure monitoring, which is very much time based. It's all about what happened at this time of the day. When did the server scale up? When did it scale down? When did it stop giving 200 responses and start giving 400 instead? These are all things that happen over the span of time. Now the other thing is, though, when we talk about time series, we also normally tend to talk about more physical things. You can think of your Fitbit or health tracker. That's a time series device that you just happen to wear. Your thermostat at home is most likely streaming time series data somewhere. Solar panels on your house. Nowadays, in this world, we have a lot more time series applications. And nowadays, also, server monitoring has become a lot more important. They say the average user will only wait about three seconds for a website to load. I swear, I only wait two. We expect everything to work perfectly, and we don't like when things are down. So nowadays, server monitoring has become a lot more important. So one thing to take note, which I already kind of explained here, but that doesn't mean that you can't use other DBs for your server monitoring data. That's not the case at all. There's plenty of databases that you can choose from. But for the most part, we see other DBs as good for other types of data. This is just a few listed examples here. As you guys know, you're probably using Elastic if you have any searching on your website. You might use Mongo for documents. And of course, we all use SQL for, I would say, about 90% of data in the worldwide web is actually stored in a SQL-type DB. And these are all DBs that offer some form of open source as well. So let's really quick go over this overview so that we kind of get an idea of some of the tools that we can start using. So this is what the platform ends up looking like. So we have those data sources, which I've already mentioned. For us, it's going to be things like cloud services, most likely. And then inside the InfluxDB platform, we have Telegraph, which some of you guys might actually have used before in the past. It's just not necessarily known that InfluxDB is the caretaker of Telegraph. It's fully open source. So we don't write all the code for it. A lot of the companies or individuals write the libraries themselves. But we tend to take care of it in that we don't allow malicious code uploads and such. The other thing is we have our client libraries. Those are managed by us. Today, we're going to kind of go over the javascript slash node.js client library. They are combined inside of the client library, but they do have slightly different functionalities. And then finally, the actual platform and what it allows us to do, which is things like collection, transformation, which we're going to briefly touch on but not go too in-depth on, downsampling, triggering, and alerting. Now I'll kind of show some of these, but we don't have enough time to go into all of them. But there is tons of resources online. So basically, the idea with this is you can get the project going, you can get your data stored, and then you can kind of build upon it. So these are current data acquisition methods. I've already talked pretty much about the big two. And what I'm really going to mention here is the difference between Telegraph and the difference between our client libraries. So Telegraph is basically a YAML file that runs. We have like 300 plus input plugins, a few different output plugins, say about 20 of those. But basically, the reason you use it is because it's a no code. It's basically you just change the parameters inside the YAML file, and then you can go ahead and start uploading. I'll show a visualization of some of these in our UI, so it'll make a little bit more sense. But the client libraries, which you can just think of them almost as like a REST api, they allow you to read, to write, to delete, and to just in general look at your data. And that I'll actually go into a little bit more because, at least for my project, and for most people's projects, you're going to use the client libraries to actually get your data in. You could use Telegraph. There's no reason not to. I just chose to use the client library because it was what I just found the easiest for my use case. These are also some of the things that you can do. So you can go ahead and do visualizations, which I'm going to show in a dashboard format. You can take action. So this task engine up here is basically like a cron job engine. It basically allows you to say something like, every hour of the day, run this cron job. What that allows you to do is things like downsampling, which when we talk about downsampling, what we're actually referring to is the ability to take your data and aggregate it down. So that's something like, the day is over. You've got all your server monitoring data, but your servers have sent you data in for like every second. So you have a ton of data points. You just have millions of them, basically. But the thing is, the day was pretty good. It was a calm day. Nothing really exciting happened. So what you're going to do now is you're going to run this cron job that says, give me the aggregate average for every hour of the day. So instead of having the CPU load be in the second range, it would give you 24 data points back of the average throughout the day. So you could still see your high times and your low times and such, but overall, it's a much smaller data packet to store, which is always good. Because the smaller data packet that you store, the more in theory, if you were a big client, you're spending money to store it, it does save you money in the long run. The other thing that you can see here is the check in notifications at the bottom. I know visually this can be a little hard to see. I'll show it in the UI and it will look a little bit clearer. But basically what you're doing here is you're saying, I'm going to set this threshold. If the CPU load goes above this amount, go ahead and send me a Slack notification or a PagerDuty notification. You have a few different options here. We also have a HTTP endpoint. So really the Rolls-Royce-ster with that one. You can send it really wherever you want. I, for example, have a plant one at home that I monitor and I send myself a text via Twilio when it needs to be watered, which I then probably ignore, of course, because why actually take advantage of it? So let's go ahead and get started on the javascript Client Library. So this is the two projects to follow. Now again, I will have these links available at the end. This is more just in case people had their laptops out and they wanted to follow. On the left side, the longer one, that is the node.js dashboard, which I'll go over. And then the other one closer to me is the Client.js library. So that's just the basic library. This one is how you would actually do the server monitoring template. So as I said before, there are two main libraries. For me, I'm specifically using the MFlexDB Client one because that is what allows me to query and write my data in. The other one is for managing. So what that means is that if you wanted to, you don't have to spin up the MFlexDB UI in your local host. You don't have to do that. You can do all of your buckets and authorizations and tasks inside of the Client api, but I'm not going to use that. I'm just going to go ahead and do the UI because it's a little bit easier visually for all of us to understand. And it's a little bit more exciting than me just doing command line, just little command line snippets. It's not quite as exciting. So this is basically how you get everything installed. It's super straightforward. You can do npm. You can do yarn. Basically, once you get it installed, you're good to go. I'm not going to go too much into the install. We all know how to install packages here. So when it comes to the right api, basically what we're doing here is we are setting up our authentication. So you need a URL, a token, or an organization and a bucket, which a bucket in this case is just another word for database. It's basically what you name your DB. And I will show how this actually looks. And I'm going to be a bad person and just show you my tokens and everything else. So it's fine though, because I can easily... This is on my local host. You can't do anything with my token anyways, but basically I'm going to show you how you set this all up. Just keep in mind in the future, please do keep it in a separate file when you put in these environment variables. As you can see here up at the top, it's got them in an environment MJS file instead. But basically you get all of your authentication up and running, and then you can finally actually start writing. So the writing is pretty straightforward here. So as you can see, one thing to note real quick here is we are a schema-less DB, which means you don't have to tell us what you intend to put in there. You can just start writing immediately, which is a little bit different because most people have to tell their database what they expect to receive in it. That being said, that has a double-edged sword, the double-edged sword being you could send bad data and we would write it. We do have an explicit schema option, so you could upload a schema and say, I only expect this type of data to come up. Don't allow anything that doesn't match the schema. But one thing to note here is we're going ahead and we're sending up a point, a point of temperature. We have a tag for it, which in this case, the tag is called example, and it's a string called write.ts. And then finally we have a value, which for this one is like a random math number because it's just an example of how to do this. And basically because we're not sending a timestamp with this, Influx will automatically put a timestamp on it for when it arrives. So you can put a timestamp. You could even put a timestamp that's in theory like historical, like something like from two days ago if you needed to. Obviously some people need to upload old data, things from CSVs and such. So that's when you're going to actually have to tell us a timestamp, but otherwise you can just start writing up data and we'll attach it ourselves. So this one I'm not going to go too much into because we don't need it for this project. But if you actually want to get your data back out, you can go ahead and use a Flux query to retrieve it back out. So the way Flux works is pretty straightforward. This is just a very basic query. Basically we're saying from my bucket, give me a range of the past day. You could do it, you could give it no range. You could just say from bucket, but most people tend to give it some type of reasonable range to get their data out. And then for this one, we're filtering down on the measurement of temperature. So we're saying I maybe gave you some light values. I gave you some humidity values, but right now I just want my temperature values so I can graph just those instead. And what that's doing down here is it's actually turning them into rows and those rows could be used inside of a visualization library. We also allow things like data frames and other such ways to get your data out. It's all meant to be compatible with other open source plotting libraries, things like Grafana or Plotly JS, et cetera. And then finally being able to delete data. So this one we're not again going to use in our example, but I just wanted you guys to be aware. Deleting in a time series DB is not how you're used to doing it normally. You're not doing it based off a value or based off an ID. It's based off a range. So that also means that if you have data from this that you want to keep, do keep that in mind if it's within that range, it will be deleted. So that's how you end up deleting your data. Maybe you had a bad first couple of tries. You can go ahead and say delete for the last 30 minutes and that's going to be that range, that start and stop. So now I'm going to talk about the monitoring template. So this is how you actually get it set up. And again, this is mainly just to highlight some things because when I start doing the demo, I might kind of miss these a little bit, but I'm going to try not to. But basically what you're going to do is you're going to use that write api to write points, specifically things about node CPU usage, memory usage, and resource usage. And this is kind of an overreaching function that kind of formats that point for us. So it's basically saying, take the point measurement, which in this one, for example, was node CPU usage and go ahead and from that point, make a float field with a key and a usage key. When I actually put this in and show you guys, it's going to make a lot more sense in a more table type format. But basically this line of code here, which this is available on GitHub, this whole project is, so you don't have to try to remember all of this, but these are the big components on how this actually is going to end up working. The other one that we do is we grab the monitoring from the express.js server. So with this one, we're grabbing things like how well it responded basically to being called, like did everything come in as we expected? Was the status 200 or was it getting other random statuses? This can obviously help if you are experiencing problems where certain pages or files aren't quite loading as you expect. Hopefully that's not the case, but just in case this would allow you to monitor for that. And this is what the dashboard ends up looking like. And again, when I get to the demo, this is going to be a little bit more easy to understand. But basically here, what we're saying is we can add new charts up here on the add cell. We can filter. So for mine, I'm just doing my node services, iot center. These are all kind of hard coded for my example. And then I'll show you how to actually modify these cells. But basically it's this little widget right here. Again, I'll show it and I'll kind of zoom in so we can all kind of understand how this all works together. One quick note I just want to mention here is that if you do intend to take this project out of the open source and into the cloud, you do need to be aware that going forward in the cloud environment, we're going to be focusing on SQL, not Flux, which means you would need to use a SQL query to create your data back out. And we're going to be having a Flight SQL javascript integration for this change. Let's go ahead and get into the demo because that's honestly the real meat of everything and that will kind of help us understand a bit better. So please excuse my slightly messy code here. But basically this is my little fake Express server. It's sending me back some data. It's not super exciting because it's not hosting anything. But as you can see up here, I've got my token, I've got my org ID, my bucket is called iot Center because that's how I downloaded it. And then I've got a URL because this is all running on my local host. So as you can see in here, I've got this get right api. I've given it default tags, which basically automatically makes the service to be iot Center. And the host currently is basically my laptop. That's what it will say inside the table. You don't have to have these default tags, but the dashboard won't work without them basically. And from there, I'm using that point code that we talked about before. The one that's got your CPU usage, your memory usage, and your resource usage, and the overreaching one that creates the actual point value. So these are all kind of living here together. For my point Express, I was fighting with it too much. So I just ended up doing some hard-coded values. But it will still help get the point across, and we can kind of see how this ends up looking. And from there, it's just running on the port 3000. This is a very, very basic little project. This is just to help us really get going. Just want to show what this looks like. So this is inside the local host. So this is what it actually looks like inside InfluxDB. Let me see if I can make this a little bit bigger. This might cause some things to get a little weird, but we're going to go with it. So let's go ahead into sources. So I'm using the client library to get this uploaded, obviously. There are other client libraries that you can pick from. And if you come here to this website, and this is available, this is the open source. This is, oh dear, this is not working. That's okay. It's probably not working because we're on the open source, but it is available online. Let me really quick just find it. So it normally takes you here to the InfluxDB client JS, which has examples on how to get started as well as all of the other details. It's basically everything I already went through, but normally it would link for other ones as well. These are some of those Telegraph plugins that I was talking about. I'm actually going to make this a little smaller so they come up. Obviously we have a lot of different ones for a lot of different monitoring and just in general technologies. They're not all here in this UI because as people add them, we don't always add them to the UI. It takes some time to get everything in. So you can go to the full list if you just type in Telegraph. And yes, this does work also with other DBs if you would prefer to use a different time series database. Plenty of our competitors love Telegraph just as much as we do. And that's what makes it so beautiful and open source. The other thing here, so I have my bucket iot center and I could create other buckets here, but actually really quick. Sorry guys, I don't mean to be so scatterbrained, but one thing to take note of is this is where you actually get your community template. So this is a page called community templates. It's got a lot of different variety here. As you can see, this one, for example, is for Apex Legends, which is like a game. So this allows people who are playing this game to kind of get their data from it and display it in a really pretty dashboard. But the one that we're using, not quite game. Let's see. Oh no. Oh dear. I think I just crashed my window. Great. It wouldn't be live if I didn't, you know, just destroy everything. Sorry guys. Let's go back. Luckily it will still reopen. Cool. So we're just going to go on down here to node.js. So this is the one that we're going to be using and I'm going to zoom into this so we can actually see a little bit better. So as you can see, and I'll show this also inside the UI, I'm trying to make this as like big as possible for us to kind of read it. So this is going to show the average response time, the maximum, the current heap usage. You can see some CPU usage. This dashboard was used on like one of our customers' node.js servers. So it actually has like real data that doesn't look as sad as mine does, which is mainly just a straight line because it's not doing anything. But as you can see, you can really get started on just this graph here. And basically how you do the install is you come up and you grab, sorry, I got to make this a little smaller. You grab this node.js YAML file, which basically you throw into here. You say, look up my template, please, and go ahead and install it. This might get a little bit weird because I've already got this template installed, but we're going to run with it. All right. So the one I modified an hour ago is probably working. Yes, this is the stuff that you can see from my very own laptop. I started running it about 15 minutes ago or so. This is probably when I first got started. It had some weird spike. But as you can see, I can go ahead and just get this going automatically because I followed the instructions. So one thing to note is that this project comes with a monitor.js file, which basically is how I got everything set up. As I said before, you can see, here we go. You can see this write process usage. You can see the setup. Like I said, this is expecting the default tags of service and host. You can change the name of these, but do be aware, this right here, this is where it's looking for it. So there's my MacBook. There's iot Center. So if I had multiple node servers running, I could go ahead and change the default tags and change it out as needed, basically. And this one also added an if loop. Apparently that's for an older version of node, basically. And down here, what it's doing is it's doing the monitoring function down here in a function that you can export out. Again, I just did mine hardcoded because this is just a very small little project. But obviously, this is very helpful. And then one thing to note, they do it properly. They make an environment.js file. This is where you should normally put your URL token and organization so you don't accidentally end up sending them up to GitHub because I've never done that. I'm responsible. And also, this is where you find your URL. For me, it's localhost 886. Your org is the ID right here. After the org, you can find it within the UI, but I just find it fast to just get it from here. And then for token creation, you just come right here. You can do an all access token, which gives you all access to all of your buckets. Because I'm just on my local host, I don't care, so I just did an all access. But if you're using this in a more serious manner, please do make sure that you actually do a custom one and you just give it the permissions for just the buckets that it should have so you don't end up with random permissions everywhere. But yeah, so this is basically how you get this project up and going. And let me really quick pull back up my Node deck so that way then we can grab the links here. Let's scroll on down. Great. So these are going to be where I'm going to leave this as we go into the Q&A. As I said before, these are all those resources that I've already kind of mentioned, but really quick, the Slack. If you have any further questions or you're getting started on your own project, please feel free to come to our Slack community. We're super active. The dev rels are there. Our engineers are there. You can also check out our docs as well. They're pretty well written, if I do say so myself. We also have blogs. Some of the projects that inspired some of our customers are from those blogs. And then finally, InfluxCB University is a learn at your own pace platform. So if you want to learn a little bit more, but you want to do it on your own, you're welcome to use that service. It's completely free. And then obviously the JS client library and the node.js server template, which you can follow word for word to get this all set up. And now I will go ahead and go to the Q&A. All right. Thanks a lot, Zoe. Awesome. Yeah, please take a seat. We are not sure how much questions we would have, but at least this one. What other options do we have in a visualization? Actually, let me, can I stay near my laptop for one sec? So one other thing that you could do is you could hook us up with, for example, Grafana. I had the tab open, but basically Grafana has their own version of a little bit more slightly, I'm going to say advanced version of this with a node.js monitor dashboard, very similar concept. And so you could use this instead. We have a Telegraph output plugin directly to Grafana. So it's super, super easy for setup. And so you could do that. And then the other thing that I kind of really quick forgot to show is if you want to go ahead and edit things in here, this is that toggle that I was talking about. So you can go ahead and go to the configure and you can either use, for these ones, they're all hard-coded into Flux because that's just how they are built. But if you wanted to, you could add a brand new cell here. I can go to iot Center. Let's check out our CPU usage. Go ahead and hit submit and I can get a new graph and I'm going to go ahead and name this Zoe's new cell, because why not? And then I can just go ahead and add it. And you can move these around because obviously now this one's kind of a little bit weird in the way it looks. So I might go ahead and I'd reorganize this a little bit better, obviously, but there we go. And you can kind of change these as you'd like. You can go ahead and obviously delete or move them if they don't meet your criteria or needs, but this can definitely be expanded upon too. While we are here, maybe another question. How can you define the layout of the dashboard? If you can show that maybe. With CloudWatch. Let's see. Let's go to Telegraph. Let's see what it's got and let's see if we have CloudWatch because I think that is one of the integrations that we offer. And so that would be very good to see. Oh, that's right. I think the full list is somewhere down below. Sorry, guys, I'm going to pull this back a little. Actually we'll go to the docs because they're a little bit... Man, where's the full list? I should have had this up in a tab, but then I would have deleted it anyway. So here's all of our output plugins somewhere. Our list is... You know what? I have an even better idea. Maybe I'm just going to put in Telegraph CloudWatch. That's a great solution to my problems. Another question, is there an integration plugin between CloudWatch and the integrator? Yeah. So it would appear that we have an integration here and it allows you to do statistic metrics use monitoring. And as you can see, we have a wide list here that you can use to pull metrics from. Most of these are aws based. So you can check out the documentation here, which will tell you the aws CloudWatch statistics and really it's just going to link to the GitHub. So we'll just go there. And so with this, you can see all the authentication you're going to need, the global configuration options. And again, this is what that YAML file looks like. So as you can see, most of it's commented out, but you would need to obviously start uncommenting some of these. I think for example, all of these are double commented. They're meant to be comments, but you're going to need to give it things like the access key, the secret key, the token. And normally these are pretty good about telling you how to get this all set up. This one also has an example on how it would be normally set up for people. And so obviously these docs are pretty large. For example, this CloudWatch one at least is very large. Most of the docs for Telegraph are pretty in depth, I would use the word for. So definitely go ahead and check it out. And I'm sorry, this is what I was looking for is the full plugin directory. So up here, you could go ahead and change depending on what you're looking for. You can also obviously just do like a full text search like I did. Awesome. Thanks for that. Yeah, there is another question. Do you support anything else than metrics? So in the open source, we normally just suggest metrics. In our cloud versioning, we now have unlimited cardinality. So we do support metrics, traces and events. It will eventually be coming to the open source in the next couple of months. It's just not available just yet. All right. Good to know. And maybe to be back on that one. How do we define the layout for the dashboards? So by layout, I'm going to assume that they're talking about like, so obviously this was a pre-built dashboard. It came pre-built with these cells and everything. So when you do the download here, you can't really do any edits. You could edit from here though, so you could start to reconfigure these or like I did, you could add some new cells or you could start to delete them. And as for moving stuff around, it tends to be kind of like a drag and drop kind of deal. It can be, I'm not going to lie, it can be a little unruly depending on how many cells you have. And in this case, because my cell was created right at the top, it kind of shifted everything a little weird. Most dashboards I see tend to be a little bit more like, they're just a lot of the long graphs basically and like a few of these up at the top. I could show some other examples in the community templates, but for the most part with this one, it's just going to be right out of the box the way it is. And then if you want to go ahead and change it, you're going to have to do that just inside. So like for example, this Docker one is what I would call a little bit more on the straightforward side where all the graphs are kind of just in one four by four block, I guess you could call it. And like, for example, down here, they have two longer graphs, if that makes sense. Absolutely. I think we will have one more question. Maybe they are raising quite a lot. So yeah, maybe the top one, will you add typescript support, for example, export types? I am not sure on that question. So with that one, I would ask it unfortunately in our Slack channel, because I'm not sure what our friends and devs are planning. From what I understand, though, you should be able to use our library, even if you have typescript installed, it should be relatively compatible, because I've used it in a few typescript projects with no issue. Well, thanks a lot again. I have one last question. What are these socks on the table? All right. So for everybody here who's in person, I brought some really awesome socks for you guys to grab. They're sitting up here on the table. We have two different colors, but I've got these blue ones here. Look at this awesome little database on them. These are great hiking socks, and they fit on most feet, as we like to say. Great. Thanks a lot again, Zoe.
31 min
14 Apr, 2023

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic