Tools for better Observability in NodeJS Serverless IoT Applications

Bookmark

Observability is crucial for successfully operating large IoT fleets. IoT incorporates different components, including hardware, network, on-device software, and cloud. Devices operate under unreliable conditions and constraints, and need to be monitored remotely. Cloud applications become complex and costly, as they are built to handle device activity at scale. Answering questions such as:
-    Do I have a problem in my IoT application?
-    Where is the problem?
-    What is causing the problem?
-    How much of my fleet is affected?
-    Is my code expensive to run and if yes, how can I fix that? ,
can be challenging. Logging, monitoring and tracing are fundamental observability pillars. However, they are often viewed as non-functionals in IoT applications, and can fall off the radar, or are not standardized during development.

This session will show how to leverage Open Source tools, such as AWS Lambda PowerTools, in a fully functional Serverless IoT application, to ease adoption of best practices for modern application development, and integrate services such as Amazon X-Ray, Amazon CloudWatch and AWS IoT Core features, to achieve end-to-end observability.



Transcription


All right. So everybody who starts in the iot space kind of thinks that this is how the iot journey looks like. You know, everything is sunshine and butterflies and you work with devices and they're very cool, you prototype with devices, you learn new protocols and so on. So you think it looks like that.

But actually, it doesn't look like that. When you're going to production with an iot solution, it more looks like that. So you're constantly putting off fires of one kind or another. So what's really going on?

So when you're working with iot in iot, you are actually part of an ecosystem. You are working on the device side or you're working with teams who work on the device side. You have backends on the cloud side. You're working with cloud teams. You're working with data teams. So it's all just really relatively crazy. And it can get even crazier really fast.

So it is a maze. Right. So, you know, you suddenly are prototyping or testing with your devices in the lab and it all works and you said everything's fine, everybody's happy. And then you go to production and suddenly like 50% of your fleet is offline from one day to another. And you try to investigate why and you don't know why. Then you've got data missing, right? You're sending data. You're using MQTT. You've done all the right things. Ideally you've used quality of service one. Ideally you've actually used local storage at the edge as well. But still data has gone missing.

So your data team is complaining. You don't know where the problem is. What about the alerts you built in? Well, you're not seeing any of them. Have you actually built them? Well, I don't know. It would be a good idea if you did.

data is inconsistent. Your users are basically complaining that loading a dashboard for, I don't know, 50 of their devices as an aggregate, to see aggregate metrics just takes too long. So it's all crazy.

So what do you do about all of this, right? So clearly you actually need to build observability in your application. So you need metrics, you need logging, and you need tracing. And ideally you'd need all of this in a standardized way so an operations team who is actually looking at this stuff, looking at this data, can actually understand what's going on.

So let's see how we can build observability in an iot application. I mean, I'm going to make the assumption that the back end here is mostly serverless. So I'm imagining a situation where you've got an iot device sending some data over MQTT and you've got an iot rule, you know, an aws iot rule picking up this data and pushing it into a lambda function. You're using this amazing cool integration that aws iot has with the rules engine, and you think everything's perfect, right?

So if you scan that QR code, you can actually look at the code for what I'm going to show you. So you can do that. I have it linked at the end as well. So I'll give like two seconds for people to look at that. All right. So what I'm going to show you right now is what you might not be expecting about the integration between the rules engine and lambda. So the lambda function, let's skip this for now. So the lambda function that I'm using looks a little bit like that, right? So you're setting, you know, you have a tracing library. I will show you what the tracing library is and talk about it later. But you've actually, your lambda function just does something and throws an exception, right? And that's basically it. And so if you look at this, you know, if you look at this architecture here, you'd expect to see the exception right away.

Well, the funny thing is that the way the rules engine actually integrates with lambda is asynchronously, right? So rule engine sends the message to the lambda service. lambda service says, great, I've got it, 202. And then your lambda function is put into a queue and it executes asynchronously. And only then, when that execution is done, somewhere, ideally in some log file, you will see the result of your lambda execution, right? So that's why, of course, when this is happening with one device, you know, you think, yeah, I've got it under control. I can go to the log of the lambda function and I can look there and I can see that actually the lambda function failed. But ideally, you would actually use some tracing tools and, you know, some tools that enable you to do logging and monitoring in your application. So you can see this stuff relatively easily. And so one of these tools is the lambda Power Tools. And lambda Power Tools is actually an open source library. It's available for typescript. It's also available for other languages like Python, for example. And what it does is it provides you with a set of utilities that you can integrate in your javascript application so that you can create easily structured logging, you can create metrics, you can even build your custom metrics, and you can also somehow, you know, see the traces in a service that is called X-Ray, right? So I mean, of course, this works if you're integrating with aws services. If you're integrating with other types of services, you might want to identify different observability tools that you can use, right? So the way you would install the lambda Power Tools is basically using either lambda layers or you can use npm. And you can instrument using MIDI, which is quite a famous middleware library for lambda, or you can do it with decorators, or you can do it manually. So it's actually looking quite neat if you look at the typescript code, right? So here I'm just using Node modules. I'm not going to go into details on that. But then you can create your tracer and logger, and then you can just use MIDI to literally inject them into your lambda function, right? So with what I have here, basically all your traces are going to go from your lambda function invocation are going to go into X-Ray, right? So let's see. This is actually not looking very good. So maybe I'm just going to switch and show it to you really quickly. I still have 56 minutes. That would be nice. Right. So basically what I've done here, I've sent a message from an iot device simulator, which is using MQTT.js as a library. So this is my client application, and in this case, it's the rules engine. And this is the lambda context from the lambda service, and this is actually the lambda invocation.

So when you look at this on a high level, this is actually looking really good, right? So you don't see an error at all. But when you look down here, you actually see that the lambda service accepted the lambda function, but then you see that it actually executed or attempted to execute this twice.

So that's interesting, and that's the default configuration of aws lambda. It's going to retry by default if it's in asynchronous mode. So when you use SAM and you create your lambda functions, the default is always twice.

So think about that. You know, asynchronous invocation twice. Right. So this is kind of how X-Ray looks like, and here you can see the tracing and so on. We don't have more time today, but you can always just have a look at the GitHub repositories that I have linked and learn a little bit more about lambda Power Tools and about X-Ray and so on. Thank you very much. This was me. Thank you. Thank you.

8 min
14 Apr, 2023

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic