Troubleshooting your Serverless Node.js doesn't have to be a Pain

Rate this content

AWS ushered in a new landscape for deploying JavaScript applications using Node.js hosted in AWS Lambda, and since then the management simplicity that it provides has made serverless applications and APIs grow exponentially in both popularity and use cases. However, operationally for many starting out, troubleshooting issues can be painful. I'll walk through some techniques to make this easier and provide an evolution of how we can get to a better solution with tips and tools you can use in your serverless deployments right away.

27 min
01 Jul, 2021


Sign in or register to post your comment.

AI Generated Video Summary

Welcome to the DevOpsJS conference where Jeff Hopper introduces his Slack bot called LGTM ReplyGIF for posting GIFs on his behalf. He troubleshoots issues with the serverless Node code, uses CloudWatch logs and stack traces for debugging, and ships logs to Elasticsearch for analysis. Jeff explores troubleshooting options with Rollbar and discusses serverless deployment recommendations. The audience is invited to contribute to the Slack bot project, and the session concludes with thanks from Jeff.

1. Introduction to Troubleshooting Serverless Node

Short description:

Welcome to the DevOpsJS conference. My name is Jeff Hopper and I'm the technical leader for growth at Rollbar. I have a problem. I like to include some flare when I give a, looks good to me on a PR. Unfortunately, my go-to site for these specific GIFs, shut down a couple of years ago. I have another problem. I'm lazy. I came up with a solution, a Slack bot called LGTM ReplyGIF that combines with a Slack slash command to post on my behalf. Let's troubleshoot that together, shall we?

Welcome to the DevOpsJS conference, and thank you for attending my talk, troubleshooting your serverless node doesn't have to be a pain. My name is Jeff Hopper and I'm the technical leader for growth at Rollbar. Starting with Rollbar last September. You can find me on GitHub as udimos for those familiar with the Los Angeles, California area and people from live here know.

I'm a pretty rare person who was actually born and raised in LA, Santa Monica to be specific, and still live in LA in a different part of the city. So I have a problem. I like to include some flare when I give a, looks good to me on a PR. And unfortunately my go-to site for these specific GIFs, shut down a couple of years ago.

I have another problem. I'm lazy. It's not just that I don't want to search for a good LGTM GIF for PR in GitHub. I also need to respond in Slack to the teammate who asked for my review. I want to do it all at the same time. So I came up with a solution, a Slack bot called LGTM ReplyGIF that combines one of my favorite GIF sites, with a Slack slash command to post on my behalf. I built it in nodeJS and deployed it to AWS Lambda where it will call the replygif API, select a GIF at random, post a comment to the GitHub PR, and respond back to the Slack channel with a message to our requester. So it sounds pretty great, right? So what went wrong? Let's troubleshoot that together, shall we?

2. Troubleshooting Serverless Node

Short description:

I'm importing some things, doing some stuff in the body, and returning a response. The handler is a function that receives an event and optionally a context. We ran a local stack and encountered a 500 error from the reply-gif API. We switched to scraping the website using Cheerio. After testing locally, we published the code to AWS and tested our SlackBot. We encountered a dispatch failed error and troubleshooted using CloudWatch logs, where we found the error message 'N is not a function' in the stack trace.

I'm importing some things, I'm doing some stuff in the body here, the logic related to my function, and then I'm returning a response. But in the end, it's a handler that is a function, it receives an event and optionally a context. So I can import it into my tests. I can include it, I can require it in the node REPL. Or I can run the local stack and write some curl commands against it.

So let's do that first. So I'm running a local stack on the left side. And on the right, we're calling our curl command. We notice we get a 500 error, and then we are not returning any response. So it turns out that the 500s come from the reply-gif API, which doesn't actually work, isn't being maintained and returns a 500 error on any request to any of the endpoints. So knowing that the reply-gif API doesn't work, but I saw the website does, I switched to scraping the page with a node package called Cheerio that gives us a jQuery API to grab the image URLs. So we've tested locally and things seem to be working fine. We publish the code to AWS using our deploy command and then test our SlackBot. So let's give that a shot.

My friend Demo Slack sent me a request through Slack to review his PR. So I'm going to say I gave it a review, say lgtm. I get my prompts to tell me how to write the command. So, identify the repo, identify the PR, give them a little app shout out. And we get a dispatch failed error, which is a default error from Slack when it doesn't know what to say. So, let's take a look at how we troubleshoot this in the cloud using CloudWatch.

From the AWS web console, we come to the lambda area, we find our function, and we take a look at this monitor tab here. Now, it's going to give us a list of recent invocations, and it's going to tell us what log stream they're in. So, these recent invocations are interesting, but they're not really giving us troubleshooting information. So, we're going to go check out our CloudWatch logs directly. Now, loading here, loading in the streams, our most recent is at the top. We're going to take a look there and we see our one invocation is happening in this stream and we see this error message here. N is not a function. So, that's good. We found the error. Let's take a look at the stack trace.

3. Troubleshooting with Stack Trace and CloudWatch

Short description:

Let's take a look at the stack trace and enable source maps for troubleshooting. We can use CloudWatch to search for logs related to our invocation using the request ID. After deploying the code fix, we still encounter a dispatch failed error. Let's explore troubleshooting in aggregated logs using log aggregation tools like ElasticSearch.

Let's take a look at the stack trace. It's from our index file. It's line two, column 570,000. Okay, so, in order to troubleshoot this, we're going to need some source maps. So, let's go ahead and enable those and redeploy. And while that happens, let's poke around CloudWatch a little more to get to know it better.

So, one thing we could do is take a look at view as text, which expands all of these items. So, we see a long running history. Now, it is possible when you are troubleshooting log events in CloudWatch from lambdas that they are interleaved when you have multiple invocations happening at the same time. So, what becomes useful for you is to take this request ID here. You see it shows up on all of the logs and put that in the search. Let's go back and search all of our streams. Make sure to quote it because the hyphens will work as negators for search terms If you don't. So now we see same view where we have everything is related to just our one invocation.

Let's take a look at our deployment. It's done. So let's give another shot to our Slack bot. Now let's go find a log that can give us the information we need. Hopefully this is the most recent. Yeah, we see here, Fetch line 69, column 3. So now our source maps are working. We can go take a look at the code here and it's saying that this dollar sign is not a function. And that's because I should have called Cheerio.load instead of just Cheerio as a function. So we fix that, we redeploy. And we're gonna give it another shot. Still getting a dispatch failed. So what's going wrong here? We are scraping the ReplyGIF page for images. But now let's take a look at troubleshooting in aggregated logs. Since going through each log individually is not terribly efficient, there are some filtering capabilities we saw. But if you're used to log aggregation tools like ElasticSearch and already have your other apps hooked up to it, you'll want to use that for your Lambda log aggregation as well.

4. Shipping CloudWatch Logs to Elasticsearch

Short description:

In serverless, there are no long-running processes to ship logs to Elasticsearch. To ship CloudWatch logs to Elasticsearch, create a subscription filter in your log grouping. Select the cluster, log format, and test the pattern. Another Lambda function reads CloudWatch logs and ships them to Elasticsearch. Elasticsearch provides columns to analyze logs. We encountered a bad credentials error from our last request, which we can investigate using Elasticsearch properties. Locally, we use a .env file.

So the difference is that in your apps running on a server, you manage or within a container, you have some process that is either reading a log file that you're writing to, or it's reading from stdout, and it's shipping that off to Elasticsearch.

In the serverless environment, you don't have any long-running processes to do this for you because our serverless functions are ephemeral by design.

So we need to ship our CloudWatch logs that we already get instrumented from us for us by AWS to our Elasticsearch cluster.

One way to do this if you're already using Elasticsearch as a service in AWS is to go to your log grouping. Select the log group that you're interested in and create a subscription filter to Elasticsearch.

Here you select which cluster you want it to go to. You can choose a log format. We're using Lambda in this case. But you can just as easily do whatever you're logging in. And you can parse it. So we'll just go with Lambda.

It gives us this built-in filter pattern. You have to give it a name. And you can test it against your existing log data. So we'll go ahead and test the pattern. Yes, we see these coming through. So you start streaming. I've already enabled that. And what ends up getting created for you is another Lambda function that actually reads from your CloudWatch logs and is the one shipping it over to Elasticsearch.

So we'll take a look here. And the nice thing about having Elasticsearch as opposed to just CloudWatch is we've got these nice columns we can add and help us hone in on what's going on.

So we'll take a look here. Let's see. All right. So we have a bad credentials error showing up from our last request. We can take a look and we have the nice properties available to us that you expect in Elasticsearch. So it's taken all these for you. And we see the token, you can't see what the token is. But I noticed that's right. When we develop locally, we're using a .env file.

5. Troubleshooting Slack Bot and Rollbar

Short description:

Just like with Heroku, we need to add environment variables to avoid including credentials in the repo. We successfully posted a reply to a Github PR on behalf of me. We can monitor logs to catch errors and ensure the Slack bot is performing. If someone else encounters an error, we won't know unless they inform us. Let's explore Rollbar as another option for troubleshooting. We can see the most recent error, HTTP error not found, and analyze it in detail.

Just like with Heroku, we need to add that because we don't want to include those credentials in the repo. So we got to make sure we have the environment variables there. All right. Now let's give it another shot.

Okay. And it worked. All right. So now we have our slack message, which is saying, calling out our friend, Demo Slack, saying that we posted a reply to their PR on behalf of me, and it looks good to me. And if I click this link, this will take me to my issue comment. My comment in the PR that lets them know.

So we've got the posted comment to the Github PR, we respond back to Slack with the message in the channel. Now we put our Slack bot into the wild, and how do we know if it's performing? We could monitor the logs on a regular basis and hope we catch errors in a timely fashion, and we're able to do something about it. Let's say someone else gets an error when using the Slack bot, and because we're not the one typing the command, we can't see the error, or even that there was an error.

So let's give it another try here. Oh, yeah. So that dispatch failed. And we noticed the PR number is wrong. So what's going on? Let's take a look at another option for troubleshooting, and that's the continuous code improvement platform from Rollbar. So if we go to Rollbar, we've already instrumented the app. We see this error come up. It's the most recent. And we notice its HTTP error not found. Now, if I try this again, because I'm stubborn. And one more time. Notice that our total's gone up to 5. But if we were to look at Elasticsearch, we would see just a bunch of these show up here. Just keeps pounding it. So this shows up nicely as a single line item for us. We can drill into it. And because we are able to connect with GitHub, we actually have source code in here.

6. Troubleshooting with PR and Slack

Short description:

We fixed the issue with the wrong PR and improved our solution by fetching the PR from GitHub and providing a friendly Slack response. Attend David's workshop to learn more about Rollbar's continuous code improvement platform. Find the source code for the LGTM reply-gif slackbot on GitHub. Sign up for a new account on Rollbar using the Git Nation promo code for a free full month.

And we've got our source maps uploaded. Now we see not found. We get a response when we make the GitHub call to issues 88. Well, that doesn't exist. That's the wrong PR. But we're not handling it properly. So we'll go ahead and fix that code. And before we do, we're going to resolve this. Now we go back. Try it again. Still failed. So now this time we're going to check the PR first. Make sure it's there. And if it's not, we tell the person back that it does not exist or the bot has no access. Now if we do a PR that does exist, it works. So in our new solution, we fetch the PR from GitHub, we provide a friendly Slack response that's only visible to the caller, and we also get the author, and make sure to call them out in our comment. So they get an alert from GitHub and Slack, so they can't run away from our LGTM. So if you want to learn more about Rollbar's continuous code improvement platform, make sure to check out David's workshop, with a deep dive and tutorial that will have you set up and running it on your own. You can find the source code for the LGTM reply-gif slackbot on GitHub here. And thank you very much for attending. If you sign up for a new account on Rollbar, please use this URL with the Git Nation promo code so that you can get a free full month beyond the normal trial period.

7. Serverless Deployment and Recommendations

Short description:

Hey Jeff, thanks for joining. Hi Mitten, how are you? You asked the question, where do you deploy your application? And 65% answered to a PaaS, 25% to a Kubernetes cluster, and 19% to virtual machines. I'm surprised that the Kubernetes cluster isn't higher, but it makes sense. We now have time to go over the questions from our audience. You're working at Rollbar, so why did you choose to go completely off company? I wanted to show different ways to troubleshoot serverless, including shipping logs to an aggregated search. Rollbar makes everything easier with its SDK, Lambda integration, and dashboard. There's a workshop coming up for those interested in learning more. If I asked you, what would you say Serverless is? Serverless is handing code to a provider, without managing infrastructure. It decides where and how to run your code. For front-end developers, starting with serverless is a great approach to focus on encapsulated API calls. You can always move to a more consolidated direction later. It provides best practices and simplifies deployment.

Hey Jeff, thanks for joining. Hi Mitten, how are you? Very good, thank you. So, you asked the question, where do you deploy your application? And 65% has answered to a PaaS and only 25% to a Kubernetes cluster and 19% to virtual machines. So how do you feel? Is this something you were expecting? For the most part, I'm actually surprised that the Kubernetes cluster isn't higher, but it makes sense to me. Yeah, it seems like we have a pretty modern audience, so thank you for that.

So we now have time to go over the questions from our audience, and we're going to give our audience some time to present their questions, of course. What I was curious about is, well, you're working at Rollbar, and usually when I see people from a product company, developer product company, they will be talking about their product, so you didn't. So why did you choose to go completely off company? Well, I didn't want to go completely off. If you notice, I progressed my way towards Rollbar, and my goal was to show the different ways that you can for people that are not familiar with serverless and want to get into it and how they could troubleshoot. And then some people may have been working in serverless, but not aware that they could ship their logs to an aggregated search. And then at the same time, eventually you'd like to be in a system like Rollbar because it just makes everything easier. You include the SDK, it works directly from Lambda, and you don't need to set up all these other things, and you have the ability to go directly there to your dashboard, you can set up alerts. And we have a workshop coming up, which does a very deep dive with David Waller. And for those that are interested in learning more about Rollbar, it's a great, great place to go and actually set up your account, set up your code with Rollbar.

Yeah, it's a nice tool. I've used it at multiple companies in the past, so thank you for your work. A question from the audience. So if you ask 10 people what Serverless is, you will get 10 different answers, basically. So if I would ask you, you are person number 11, what would you say Serverless is? I would characterize Serverless, by basically the constraint that you are handing code over to whatever the Serverless provider is, and you are not managing any of the infrastructure involved. So you are deploying it to this system, and it is deciding where, when, and how to run your code. And all of the ways that is done is abstracted from you, so you don't know if it is running on a VM, in a container, on bare metal, to you it doesn't matter. You're just handing over a function, and then when you have an endpoint, you call that endpoint, your function will run. That is the guarantee they provide. So for me, I'm a front-end developer, and let's say I want to kickstart a startup or a pet project, then, would you always recommend going serverless, just so you can focus on doing what you know, which is front-end? And I know this is sometimes a barrier for people that I know as front-enders to actually do more than developing something on their local machine, and getting it off their local host. I think that approaching it from a serverless standpoint, at the beginning, is actually a really great approach, because one of the things it forces you to focus and think about being more encapsulated in your API calls. And you can always take a bunch of serverless functions and combine them into an API if you want to run that on a server. It's harder to go back the other way. So starting off with serverless actually increases your ramp up time, or I mean, decreases your ramp up time, gets you to a deployable unit faster, and at the same time, you can always move in a more consolidated direction. And it also provides you some best practices around programming and focusing on the inputs and outputs and what your API should look like on each individual endpoint. Awesome.

8. Using and Contributing to the Slackbot

Short description:

Hannah asked if the Slackbot can be used and if it should be hosted or use a serverless hosting platform. The bot still needs some fixes, and anyone is welcome to contribute by forking the project and submitting pull requests. Jeff encourages collaboration and mentions that solving his own problem led to this invention. Anna expresses interest in helping, and Jeff shares his GitHub account with her. The Q&A session concludes, and Jeff thanks the audience for their time.

So, we got a comment from Hannah that says, I love the Slackbot. Maybe you mentioned it and I missed it in the talk, but can we use it or should we host it ourselves or basically use a serverless hosting platform for that, of course, but can we already use the bot? It needs a few more pieces. You know, I cheated in that. I authorized the Slackbot. I authorized the app through GitHub and it's always posting as me. So there are some fixes and it's out there. So if anybody wants to fork it and harden it up and make it more applicable, happy to do it. Happy to take any pull requests if somebody wants to add some code and submit a pull request and maybe we can build this together as a community. I think it would be fun to have. I certainly was solving my own problem with this. Well that's how 9 out of 10 inventions are made, I guess, right? That's right. Identify a problem, and you might be the only one in the world having this problem, but at least you have someone who identifies with this problem, Anna. So Anna, you can help Jeff out, right? PR is welcome. Absolutely.

All right, so we have no more questions coming in from our audience. Oh, Anna says, very cool. Thank you. Thanks, Anna. Can you plug your GitHub account once more, so Anna can find you? Yeah, so it's E-U-D-A-I-M-O-S. All right, Anna, I hope you got that, and Jeff will be in his speaker room right after this. Jeff, I'm going to thank you for your time, and hope to see you in real life maybe one day soon. That would be great. Have a nice day. Bye bye. Thanks, man.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk

Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk

JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.
In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

Workshops on related topic

Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.
: intermediate
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:
- User authentication - Managing user interactions, returning session / refresh JWTs
- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents
- A quick intro to core authentication concepts
- Coding
- Why passwordless matters
- IDE for your choice
- Node 18 or higher
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:
- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (
- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow ( 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.