Troubleshooting your Serverless Node.js doesn't have to be a Pain
AI Generated Video Summary
Welcome to the DevOpsJS conference where Jeff Hopper introduces his Slack bot called LGTM ReplyGIF for posting GIFs on his behalf. He troubleshoots issues with the serverless Node code, uses CloudWatch logs and stack traces for debugging, and ships logs to Elasticsearch for analysis. Jeff explores troubleshooting options with Rollbar and discusses serverless deployment recommendations. The audience is invited to contribute to the Slack bot project, and the session concludes with thanks from Jeff.
1. Introduction to Troubleshooting Serverless Node
Welcome to the DevOpsJS conference. My name is Jeff Hopper and I'm the technical leader for growth at Rollbar. I have a problem. I like to include some flare when I give a, looks good to me on a PR. Unfortunately, my go-to site for these specific GIFs, LGTM.io shut down a couple of years ago. I have another problem. I'm lazy. I came up with a solution, a Slack bot called LGTM ReplyGIF that combines replygif.net with a Slack slash command to post on my behalf. Let's troubleshoot that together, shall we?
Welcome to the DevOpsJS conference, and thank you for attending my talk, troubleshooting your serverless node doesn't have to be a pain. My name is Jeff Hopper and I'm the technical leader for growth at Rollbar. Starting with Rollbar last September. You can find me on GitHub as udimos for those familiar with the Los Angeles, California area and people from live here know.
I'm a pretty rare person who was actually born and raised in LA, Santa Monica to be specific, and still live in LA in a different part of the city. So I have a problem. I like to include some flare when I give a, looks good to me on a PR. And unfortunately my go-to site for these specific GIFs, LGTM.io shut down a couple of years ago.
I have another problem. I'm lazy. It's not just that I don't want to search for a good LGTM GIF for PR in GitHub. I also need to respond in Slack to the teammate who asked for my review. I want to do it all at the same time. So I came up with a solution, a Slack bot called LGTM ReplyGIF that combines one of my favorite GIF sites, replygif.net with a Slack slash command to post on my behalf. I built it in nodeJS and deployed it to AWS Lambda where it will call the replygif API, select a GIF at random, post a comment to the GitHub PR, and respond back to the Slack channel with a message to our requester. So it sounds pretty great, right? So what went wrong? Let's troubleshoot that together, shall we?
2. Troubleshooting Serverless Node
I'm importing some things, doing some stuff in the body, and returning a response. The handler is a function that receives an event and optionally a context. We ran a local stack and encountered a 500 error from the reply-gif API. We switched to scraping the website using Cheerio. After testing locally, we published the code to AWS and tested our SlackBot. We encountered a dispatch failed error and troubleshooted using CloudWatch logs, where we found the error message 'N is not a function' in the stack trace.
I'm importing some things, I'm doing some stuff in the body here, the logic related to my function, and then I'm returning a response. But in the end, it's a handler that is a function, it receives an event and optionally a context. So I can import it into my tests. I can include it, I can require it in the node REPL. Or I can run the local stack and write some curl commands against it.
So let's do that first. So I'm running a local stack on the left side. And on the right, we're calling our curl command. We notice we get a 500 error, and then we are not returning any response. So it turns out that the 500s come from the reply-gif API, which doesn't actually work, isn't being maintained and returns a 500 error on any request to any of the endpoints. So knowing that the reply-gif API doesn't work, but I saw the website does, I switched to scraping the page with a node package called Cheerio that gives us a jQuery API to grab the image URLs. So we've tested locally and things seem to be working fine. We publish the code to AWS using our deploy command and then test our SlackBot. So let's give that a shot.
My friend Demo Slack sent me a request through Slack to review his PR. So I'm going to say I gave it a review, say lgtm. I get my prompts to tell me how to write the command. So, identify the repo, identify the PR, give them a little app shout out. And we get a dispatch failed error, which is a default error from Slack when it doesn't know what to say. So, let's take a look at how we troubleshoot this in the cloud using CloudWatch.
From the AWS web console, we come to the lambda area, we find our function, and we take a look at this monitor tab here. Now, it's going to give us a list of recent invocations, and it's going to tell us what log stream they're in. So, these recent invocations are interesting, but they're not really giving us troubleshooting information. So, we're going to go check out our CloudWatch logs directly. Now, loading here, loading in the streams, our most recent is at the top. We're going to take a look there and we see our one invocation is happening in this stream and we see this error message here. N is not a function. So, that's good. We found the error. Let's take a look at the stack trace.
3. Troubleshooting with Stack Trace and CloudWatch
Let's take a look at the stack trace and enable source maps for troubleshooting. We can use CloudWatch to search for logs related to our invocation using the request ID. After deploying the code fix, we still encounter a dispatch failed error. Let's explore troubleshooting in aggregated logs using log aggregation tools like ElasticSearch.
Let's take a look at the stack trace. It's from our index file. It's line two, column 570,000. Okay, so, in order to troubleshoot this, we're going to need some source maps. So, let's go ahead and enable those and redeploy. And while that happens, let's poke around CloudWatch a little more to get to know it better.
So, one thing we could do is take a look at view as text, which expands all of these items. So, we see a long running history. Now, it is possible when you are troubleshooting log events in CloudWatch from lambdas that they are interleaved when you have multiple invocations happening at the same time. So, what becomes useful for you is to take this request ID here. You see it shows up on all of the logs and put that in the search. Let's go back and search all of our streams. Make sure to quote it because the hyphens will work as negators for search terms If you don't. So now we see same view where we have everything is related to just our one invocation.
Let's take a look at our deployment. It's done. So let's give another shot to our Slack bot. Now let's go find a log that can give us the information we need. Hopefully this is the most recent. Yeah, we see here, Fetch line 69, column 3. So now our source maps are working. We can go take a look at the code here and it's saying that this dollar sign is not a function. And that's because I should have called Cheerio.load instead of just Cheerio as a function. So we fix that, we redeploy. And we're gonna give it another shot. Still getting a dispatch failed. So what's going wrong here? We are scraping the ReplyGIF page for images. But now let's take a look at troubleshooting in aggregated logs. Since going through each log individually is not terribly efficient, there are some filtering capabilities we saw. But if you're used to log aggregation tools like ElasticSearch and already have your other apps hooked up to it, you'll want to use that for your Lambda log aggregation as well.
4. Shipping CloudWatch Logs to Elasticsearch
In serverless, there are no long-running processes to ship logs to Elasticsearch. To ship CloudWatch logs to Elasticsearch, create a subscription filter in your log grouping. Select the cluster, log format, and test the pattern. Another Lambda function reads CloudWatch logs and ships them to Elasticsearch. Elasticsearch provides columns to analyze logs. We encountered a bad credentials error from our last request, which we can investigate using Elasticsearch properties. Locally, we use a .env file.
So the difference is that in your apps running on a server, you manage or within a container, you have some process that is either reading a log file that you're writing to, or it's reading from stdout, and it's shipping that off to Elasticsearch.
So we need to ship our CloudWatch logs that we already get instrumented from us for us by AWS to our Elasticsearch cluster.
One way to do this if you're already using Elasticsearch as a service in AWS is to go to your log grouping. Select the log group that you're interested in and create a subscription filter to Elasticsearch.
Here you select which cluster you want it to go to. You can choose a log format. We're using Lambda in this case. But you can just as easily do whatever you're logging in. And you can parse it. So we'll just go with Lambda.
It gives us this built-in filter pattern. You have to give it a name. And you can test it against your existing log data. So we'll go ahead and test the pattern. Yes, we see these coming through. So you start streaming. I've already enabled that. And what ends up getting created for you is another Lambda function that actually reads from your CloudWatch logs and is the one shipping it over to Elasticsearch.
So we'll take a look here. And the nice thing about having Elasticsearch as opposed to just CloudWatch is we've got these nice columns we can add and help us hone in on what's going on.
So we'll take a look here. Let's see. All right. So we have a bad credentials error showing up from our last request. We can take a look and we have the nice properties available to us that you expect in Elasticsearch. So it's taken all these for you. And we see the token, you can't see what the token is. But I noticed that's right. When we develop locally, we're using a .env file.
5. Troubleshooting Slack Bot and Rollbar
Just like with Heroku, we need to add environment variables to avoid including credentials in the repo. We successfully posted a reply to a Github PR on behalf of me. We can monitor logs to catch errors and ensure the Slack bot is performing. If someone else encounters an error, we won't know unless they inform us. Let's explore Rollbar as another option for troubleshooting. We can see the most recent error, HTTP error not found, and analyze it in detail.
Just like with Heroku, we need to add that because we don't want to include those credentials in the repo. So we got to make sure we have the environment variables there. All right. Now let's give it another shot.
Okay. And it worked. All right. So now we have our slack message, which is saying, calling out our friend, Demo Slack, saying that we posted a reply to their PR on behalf of me, and it looks good to me. And if I click this link, this will take me to my issue comment. My comment in the PR that lets them know.
So we've got the posted comment to the Github PR, we respond back to Slack with the message in the channel. Now we put our Slack bot into the wild, and how do we know if it's performing? We could monitor the logs on a regular basis and hope we catch errors in a timely fashion, and we're able to do something about it. Let's say someone else gets an error when using the Slack bot, and because we're not the one typing the command, we can't see the error, or even that there was an error.
So let's give it another try here. Oh, yeah. So that dispatch failed. And we noticed the PR number is wrong. So what's going on? Let's take a look at another option for troubleshooting, and that's the continuous code improvement platform from Rollbar. So if we go to Rollbar, we've already instrumented the app. We see this error come up. It's the most recent. And we notice its HTTP error not found. Now, if I try this again, because I'm stubborn. And one more time. Notice that our total's gone up to 5. But if we were to look at Elasticsearch, we would see just a bunch of these show up here. Just keeps pounding it. So this shows up nicely as a single line item for us. We can drill into it. And because we are able to connect with GitHub, we actually have source code in here.
6. Troubleshooting with PR and Slack
We fixed the issue with the wrong PR and improved our solution by fetching the PR from GitHub and providing a friendly Slack response. Attend David's workshop to learn more about Rollbar's continuous code improvement platform. Find the source code for the LGTM reply-gif slackbot on GitHub. Sign up for a new account on Rollbar using the Git Nation promo code for a free full month.
And we've got our source maps uploaded. Now we see not found. We get a response when we make the GitHub call to issues 88. Well, that doesn't exist. That's the wrong PR. But we're not handling it properly. So we'll go ahead and fix that code. And before we do, we're going to resolve this. Now we go back. Try it again. Still failed. So now this time we're going to check the PR first. Make sure it's there. And if it's not, we tell the person back that it does not exist or the bot has no access. Now if we do a PR that does exist, it works. So in our new solution, we fetch the PR from GitHub, we provide a friendly Slack response that's only visible to the caller, and we also get the author, and make sure to call them out in our comment. So they get an alert from GitHub and Slack, so they can't run away from our LGTM. So if you want to learn more about Rollbar's continuous code improvement platform, make sure to check out David's workshop, with a deep dive and tutorial that will have you set up and running it on your own. You can find the source code for the LGTM reply-gif slackbot on GitHub here. And thank you very much for attending. If you sign up for a new account on Rollbar, please use this URL with the Git Nation promo code so that you can get a free full month beyond the normal trial period.
7. Serverless Deployment and Recommendations
Hey Jeff, thanks for joining. Hi Mitten, how are you? You asked the question, where do you deploy your application? And 65% answered to a PaaS, 25% to a Kubernetes cluster, and 19% to virtual machines. I'm surprised that the Kubernetes cluster isn't higher, but it makes sense. We now have time to go over the questions from our audience. You're working at Rollbar, so why did you choose to go completely off company? I wanted to show different ways to troubleshoot serverless, including shipping logs to an aggregated search. Rollbar makes everything easier with its SDK, Lambda integration, and dashboard. There's a workshop coming up for those interested in learning more. If I asked you, what would you say Serverless is? Serverless is handing code to a provider, without managing infrastructure. It decides where and how to run your code. For front-end developers, starting with serverless is a great approach to focus on encapsulated API calls. You can always move to a more consolidated direction later. It provides best practices and simplifies deployment.
Hey Jeff, thanks for joining. Hi Mitten, how are you? Very good, thank you. So, you asked the question, where do you deploy your application? And 65% has answered to a PaaS and only 25% to a Kubernetes cluster and 19% to virtual machines. So how do you feel? Is this something you were expecting? For the most part, I'm actually surprised that the Kubernetes cluster isn't higher, but it makes sense to me. Yeah, it seems like we have a pretty modern audience, so thank you for that.
So we now have time to go over the questions from our audience, and we're going to give our audience some time to present their questions, of course. What I was curious about is, well, you're working at Rollbar, and usually when I see people from a product company, developer product company, they will be talking about their product, so you didn't. So why did you choose to go completely off company? Well, I didn't want to go completely off. If you notice, I progressed my way towards Rollbar, and my goal was to show the different ways that you can for people that are not familiar with serverless and want to get into it and how they could troubleshoot. And then some people may have been working in serverless, but not aware that they could ship their logs to an aggregated search. And then at the same time, eventually you'd like to be in a system like Rollbar because it just makes everything easier. You include the SDK, it works directly from Lambda, and you don't need to set up all these other things, and you have the ability to go directly there to your dashboard, you can set up alerts. And we have a workshop coming up, which does a very deep dive with David Waller. And for those that are interested in learning more about Rollbar, it's a great, great place to go and actually set up your account, set up your code with Rollbar.
Yeah, it's a nice tool. I've used it at multiple companies in the past, so thank you for your work. A question from the audience. So if you ask 10 people what Serverless is, you will get 10 different answers, basically. So if I would ask you, you are person number 11, what would you say Serverless is? I would characterize Serverless, by basically the constraint that you are handing code over to whatever the Serverless provider is, and you are not managing any of the infrastructure involved. So you are deploying it to this system, and it is deciding where, when, and how to run your code. And all of the ways that is done is abstracted from you, so you don't know if it is running on a VM, in a container, on bare metal, to you it doesn't matter. You're just handing over a function, and then when you have an endpoint, you call that endpoint, your function will run. That is the guarantee they provide. So for me, I'm a front-end developer, and let's say I want to kickstart a startup or a pet project, then, would you always recommend going serverless, just so you can focus on doing what you know, which is front-end? And I know this is sometimes a barrier for people that I know as front-enders to actually do more than developing something on their local machine, and getting it off their local host. I think that approaching it from a serverless standpoint, at the beginning, is actually a really great approach, because one of the things it forces you to focus and think about being more encapsulated in your API calls. And you can always take a bunch of serverless functions and combine them into an API if you want to run that on a server. It's harder to go back the other way. So starting off with serverless actually increases your ramp up time, or I mean, decreases your ramp up time, gets you to a deployable unit faster, and at the same time, you can always move in a more consolidated direction. And it also provides you some best practices around programming and focusing on the inputs and outputs and what your API should look like on each individual endpoint. Awesome.
8. Using and Contributing to the Slackbot
Hannah asked if the Slackbot can be used and if it should be hosted or use a serverless hosting platform. The bot still needs some fixes, and anyone is welcome to contribute by forking the project and submitting pull requests. Jeff encourages collaboration and mentions that solving his own problem led to this invention. Anna expresses interest in helping, and Jeff shares his GitHub account with her. The Q&A session concludes, and Jeff thanks the audience for their time.
So, we got a comment from Hannah that says, I love the Slackbot. Maybe you mentioned it and I missed it in the talk, but can we use it or should we host it ourselves or basically use a serverless hosting platform for that, of course, but can we already use the bot? It needs a few more pieces. You know, I cheated in that. I authorized the Slackbot. I authorized the app through GitHub and it's always posting as me. So there are some fixes and it's out there. So if anybody wants to fork it and harden it up and make it more applicable, happy to do it. Happy to take any pull requests if somebody wants to add some code and submit a pull request and maybe we can build this together as a community. I think it would be fun to have. I certainly was solving my own problem with this. Well that's how 9 out of 10 inventions are made, I guess, right? That's right. Identify a problem, and you might be the only one in the world having this problem, but at least you have someone who identifies with this problem, Anna. So Anna, you can help Jeff out, right? PR is welcome. Absolutely.
All right, so we have no more questions coming in from our audience. Oh, Anna says, very cool. Thank you. Thanks, Anna. Can you plug your GitHub account once more, so Anna can find you? Yeah, so it's E-U-D-A-I-M-O-S. All right, Anna, I hope you got that, and Jeff will be in his speaker room right after this. Jeff, I'm going to thank you for your time, and hope to see you in real life maybe one day soon. That would be great. Have a nice day. Bye bye. Thanks, man.