Tools for better Observability in NodeJS Serverless IoT Applications

Rate this content

Observability is crucial for successfully operating large IoT fleets. IoT incorporates different components, including hardware, network, on-device software, and cloud. Devices operate under unreliable conditions and constraints, and need to be monitored remotely. Cloud applications become complex and costly, as they are built to handle device activity at scale. Answering questions such as:
-    Do I have a problem in my IoT application?
-    Where is the problem?
-    What is causing the problem?
-    How much of my fleet is affected?
-    Is my code expensive to run and if yes, how can I fix that? ,
can be challenging. Logging, monitoring and tracing are fundamental observability pillars. However, they are often viewed as non-functionals in IoT applications, and can fall off the radar, or are not standardized during development.

This session will show how to leverage Open Source tools, such as AWS Lambda PowerTools, in a fully functional Serverless IoT application, to ease adoption of best practices for modern application development, and integrate services such as Amazon X-Ray, Amazon CloudWatch and AWS IoT Core features, to achieve end-to-end observability.

8 min
14 Apr, 2023


Sign in or register to post your comment.

AI Generated Video Summary

The Talk discusses the challenges of IoT development, including issues with fleet offline, data missing, alerts not working, inconsistent data, and slow loading dashboards. It explores how to build observability in IoT applications using metrics, logging, and tracing. The integration between the rules engine and Lambda is explained, highlighting the use of tools like Lambda Power Tools and X-Ray for logging, monitoring, and tracing. The Lambda invocation process and the tracing capabilities of X-Ray are also mentioned.

1. Introduction to IoT Challenges

Short description:

Everybody starts in the IoT space thinking it's all sunshine and butterflies, but when you go to production, it becomes a maze. Prototyping and testing in the lab may go well, but then issues arise. Fleet offline, data missing, alerts not working, inconsistent data, and slow loading dashboards.

All right, so everybody who starts in the IoT space kind of thinks that this is how the IoT journey looks like. You know, everything is sunshine and butterflies, and you work with devices and they're very cool. You prototype with devices. You learn new protocols and so on, so you think it looks like that.

But actually, it doesn't look like that. When you're going to production with an IoT solution, it more looks like that. So you're constantly putting off fires of one kind or another. So what's really going on? So when you're working in IoT, you are actually part of an ecosystem. You are working on the device side, or you're working with teams who work on the device side. You have backends on the cloud side. You're working with cloud teams. You're working with data teams. So it's all just really relatively crazy, and it can get even crazier really fast. So it is a maze.

You suddenly are prototyping or testing with your devices in the lab, and it all works, and everything's fine, everybody's happy. And then you go to production, and suddenly 50% of your fleet is offline from one day to another. And you try to investigate why, and you don't know why. Then you've got data missing. You're sending data. You're using MQTT. You've done all the right things. Ideally, you've used quality of service one. Ideally, you've actually used local storage at the edge as well, but still data's gone missing. So your data team is complaining. You don't know where the problem is. What about the alerts you built in? Well, you're not seeing any of them. Have you actually built them? Well, I don't know. It would be a good idea if you did. Data is inconsistent. Your users are basically complaining that loading a dashboard for, I don't know, 50 devices as an aggregate, to see aggregate metrics just takes too long.

2. Building Observability in IoT

Short description:

To build observability in an IoT application, you need metrics, logging, and tracing in a standardized way. Let's explore how to achieve this in a serverless backend scenario, where an IoT device sends data over MQTT, picked up by an AWS IoT rule and pushed into a Lambda function.

So it's all crazy. So what do you do about all of this? So clearly, you actually need to build observability in your application. So you need metrics. You need logging. You need tracing. And, ideally, you need all of this in a standardized way, so an operations team who is actually looking at this stuff, looking at this data can actually understand what's going on. So let's see how we can build observability in an IoT application.

I'm going to make the assumption that the back end here is mostly serverless. So I'm imagining a situation where you've got an IoT device sending some data over MQTT and you've got an IoT rule, you know, an AWS IoT rule, picking up this data and pushing it into a Lambda function. You're using this amazing cool integration that AWS IoT has with the rules engine. And you think everything is perfect, right? So if you scan that QR code, you can actually look at the code for what I'm going to show you. You can do that. I have it linked at the end as well. So I'll give, like, two seconds for people to look at that.

3. Integration between Rules Engine and Lambda

Short description:

The integration between the rules engine and Lambda works asynchronously, with the Lambda function being put into a queue and executing asynchronously. To enable logging and monitoring in your applications, you can use tools like the Lambda Power Tools, an open source library that provides utilities for structured logging, metrics, and tracing. The Lambda Power Tools can be installed using Lambda layers or NPM, and can be instrumented using middleware libraries like MIDI. By injecting the tracer and logger into your Lambda function, you can send traces to X-ray for observability. In this case, an IoT device simulator sends a message to the rules engine, which interacts with the Lambda function through the Lambda service.

All right, so what I'm going to show you right now is what you might not be expecting about the integration between the rules engine and Lambda. So the Lambda function – let's skip this for now. So the Lambda function that I'm using looks a little bit like that, right? So you are setting – you have a tracing library. I will show you what the tracing library is and talk about it later. But you've actually – your Lambda function just does something and throws an exception, right? And that's basically it. And so if you look at this – you know, if you look at this architecture here, you'd expect to see the exception right away.

Well, the funny thing is that the way the rules engine actually integrates with Lambda is asynchronously, right? So rule engine sends the message to the Lambda service. Lambda service says, great, I've got it, 202. And then your Lambda function is put into a queue and it executes asynchronously. And only then, when that execution is done, somewhere, ideally in some log file, you will see the result of your Lambda execution, right? So that's why – of course, when this is happening with one device, you know, you think, yeah, I've got it under control. I can go to the log of the Lambda function, and I can look there, and I can see that actually the Lambda function failed. But ideally, you would actually use some tracing tools and, you know, some tools that you may enable you to do logging and monitoring in your applications, so you can see this stuff relatively easily.

And so one of these tools is the Lambda Power Tools, and Lambda Power Tools is actually an open source library. It's available for Typescript. It's also available for other languages like Python, for example. And what it does is it provides you with a set of utilities that you can integrate in your, you know, in your JavaScript application, so that you can create easily structured logging, you can create metrics, you can even build your custom metrics, and you can also somehow, you know, see the traces in a service that is called X-ray, right? So, I mean, of course, this works if you're integrating with AWS Services. If you're integrating with other types of services, you might want to identify different observability tools that you can use, right?

So the way you would install the Lambda Power Tools is basically using either Lambda layers or you can use NPM. And you can instrument using MIDI, which is quite a famous middleware library for Lambda, or you can do it with decorators or you can do it manually. So it's actually looking quite neat if you look at the TypeScript code, right? So here I'm just using Node modules. I'm not going to go into details on that. But then you can create your tracer and logger, and then you can just use MIDI to literally inject them into your Lambda function. Right? So with what I have here, basically all your traces are going to go from your Lambda function invocation are going to go into x-ray, right? So let's see. This is actually not looking very good. So maybe I'm just gonna switch and show it to you really quickly. I still have 56 minutes. That would be nice. Right. So basically what I've done here, I've sent a message from an IoT device simulator, which is using MQTT.js as a library. So this is my client application and this is the... In this case, it's the rules engine, and this is the Lambda context from the Lambda service.

4. Lambda Invocation and X-Ray

Short description:

This is the Lambda invocation. The Lambda service accepted the function but executed it twice due to the default retry configuration in AWS Lambda. X-Ray provides tracing capabilities. Check the GitHub repositories for more information on Lambda power tools and X-Ray.

And this is actually the Lambda invocation. So when you look at this on a high level, this is actually looking really good, right? So you don't see an error at all. But when you look down here, you actually see that the Lambda service accepted the Lambda function, but then you see that it actually executed or attempted to execute this twice. So that's interesting. And that's the default configuration of AWS Lambda. It's going to retry by default if it's an asynchronous mode. So when you use SAM and you create your Lambda functions, the default is always twice. So think about that. You know, asynchronous invocation twice. Right.

So this is kind of how X-Ray looks like, and here you can see the tracing and so on. We don't have more time today, but you can always just have a look at the GitHub repositories that I have linked and learn a little bit more about Lambda power tools and about X-Ray and so on. Thank you very much.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

React Advanced Conference 2021React Advanced Conference 2021
30 min
Building Dapps with React
Decentralized apps (dApps) are continuing to gain momentum in the industry. These developers are also now some of the highest paid in the entire trade. Building decentralized apps is a paradigm shift that requires a different way of thinking than apps built with traditional centralized infrastructure, tooling, and services – taking into consideration things like game theory, decentralized serverless infrastructure, and cryptoeconomics. As a React developer, I initially had a hard time understanding this entirely new (to me) ecosystem, how everything fit together, and the mental model needed to understand and be a productive full stack developer in this space (and why I would consider it in the first place). In this talk, I'll give a comprehensive overview of the space, how you can get started building these types of applications, and the entire tech stack broken apart then put back together to show how everything works.

React Summit 2023React Summit 2023
28 min
Advanced GraphQL Architectures: Serverless Event Sourcing and CQRS
GraphQL is a powerful and useful tool, especially popular among frontend developers. It can significantly speed up app development and improve application speed, API discoverability, and documentation. GraphQL is not an excellent fit for simple APIs only - it can power more advanced architectures. The separation between queries and mutations makes GraphQL perfect for event sourcing and Command Query Responsibility Segregation (CQRS). By making your advanced GraphQL app serverless, you get a fully managed, cheap, and extremely powerful architecture.
React Summit 2020React Summit 2020
25 min
Building Real-time Serverless GraphQL APIs on AWS with TypeScript and CDK
CDK (Cloud development kit) enables developers to build cloud infrastructure using popular programming languages like Python, Typescript, or JavaScript. CDK is a next-level abstraction in infrastructure as code, allowing developers who were traditionally unfamiliar with cloud computing to build scalable APIs and web services using their existing skillset, and do so in only a few lines of code.
In this talk, you’ll learn how to use the TypeScript flavor of CDK to build a hyper-scalable real-time API with GraphQL, Lambda, DynamoDB, and AWS AppSync . At the end of the talk, I’ll live code an API from scratch in just a couple of minutes and then test out queries, mutations, and subscriptions.
By the end of the talk, you should have a good understanding of GraphQL, AppSync, and CDK and be ready to build an API in your next project using TypeScript and CDK.

DevOps.js Conf 2021DevOps.js Conf 2021
33 min
Automate React Site Deployments from GitHub to S3 & CloudFront
In this talk, I will demonstrate how to create a CI/CD pipeline for a React application in AWS. We'll pull the source code from GitHub and run tests against the application before deploying it to an S3 bucket for static site hosting. The site will then be distributed using CloudFront which will point to the S3 bucket. All of the infrastructure will be built using Terraform. In addition, I'll make use of Terragrunt to show how to create this setup for multiple environments.

JSNation 2022JSNation 2022
22 min
How I've been Using JavaScript to Automate my House
Software Programming is naturally fun but making something physical, to interact with the world that you live in, is like magic. Is even funnier when you can reuse your knowledge and JavaScript to do it. This talk will present real use cases of automating a house using JavaScript, Instead of using C++ as usual, and Espruino as dev tools and Microcontrollers such as Arduino, ESP8266, RaspberryPI, and NodeRed to control lights, doors, lockers, and much more.

Workshops on related topic

Node Congress 2021Node Congress 2021
245 min
Building Serverless Applications on AWS with TypeScript
This workshop teaches you the basics of serverless application development with TypeScript. We'll start with a simple Lambda function, set up the project and the infrastructure-as-a-code (AWS CDK), and learn how to organize, test, and debug a more complex serverless application.
Table of contents:
        - How to set up a serverless project with TypeScript and CDK
        - How to write a testable Lambda function with hexagonal architecture
        - How to connect a function to a DynamoDB table
        - How to create a serverless API
        - How to debug and test a serverless function
        - How to organize and grow a serverless application
Materials referred to in the workshop:,HYgVepLIpfxrK4EQNclQ9w
DynamoDB blog Alex DeBrie:
Excellent book for the DynamoDB:

React Summit 2022React Summit 2022
108 min
Serverless for React Developers
Intro to serverless
Prior Art: Docker, Containers, and Kubernetes
Activity: Build a Dockerized application and deploy it to a cloud provider
Analysis: What is good/bad about this approach?
Why Serverless is Needed/Better
Activity: Build the same application with serverless
Analysis: What is good/bad about this approach?
GraphQL Galaxy 2021GraphQL Galaxy 2021
143 min
Building a GraphQL-native serverless backend with Fauna
Welcome to Fauna! This workshop helps GraphQL developers build performant applications with Fauna that scale to any size userbase. You start with the basics, using only the GraphQL playground in the Fauna dashboard, then build a complete full-stack application with Next.js, adding functionality as you go along.
In the first section, Getting started with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.
In the second section, Building with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.

JSNation Live 2021JSNation Live 2021
105 min
Build an IoT App With InfluxDB
InfluxDB is an open source time series database that empowers developers to build IoT, analytics and monitoring software. It is purpose-built to handle the massive volumes and countless sources of time-stamped data produced sensors, applications and infrastructure.
This workshop showcases a fully functional sample application called IoT Center that is built on InfluxDB. This application demonstrates the capabilities of the InfluxDB platform to develop a JavaScript-enabled time-series-based application. It collects, stores and displays a set of values that include temperature, humidity, pressure, CO2 concentration, air quality, as well as provide GPS coordinates from a set of IoT devices. With this data stored in InfluxDB, the application can query this data for display as well as write data back into the database.
This hands-on workshop will show students how to install this open source code to learn how to query and write to InfluxDB using the InfluxDB JavaScript client, and gain familiarity with the Flux lang query language which is designed for querying, analyzing, and acting on time series data. And finally, collect and visualize performance data of the Node JS application.