AWS ushered in a new landscape for deploying JavaScript applications using Node.js hosted in AWS Lambda, and since then the management simplicity that it provides has made serverless applications and APIs grow exponentially in both popularity and use cases. However, operationally for many starting out, troubleshooting issues can be painful. I'll walk through some techniques to make this easier and provide an evolution of how we can get to a better solution with tips and tools you can use in your serverless deployments right away.
Troubleshooting your Serverless Node.js doesn't have to be a Pain
From:

DevOps.js Conf 2021
Transcription
Welcome to the DevOpsJS conference and thank you for attending my talk. Troubleshooting your serverless node doesn't have to be a pain. My name is Jeff Hoffer and I'm the technical leader for growth at Rollbar. Starting with Rollbar last September, you can find me on GitHub as udymos. For those familiar with the Los Angeles, California area and people from who live here know, I'm a pretty rare person who was actually born and raised in LA, Santa Monica to be specific, and still live in LA in a different part of the city. So I have a problem. I like to include some flair when I give a looks good to me on a PR. And unfortunately, my go-to site for these specific gifts, lgtm.io, shut down a couple of years ago. I have another problem. I'm lazy. It's not just that I don't want to search for a good LGTM gift for PR in GitHub. I also need to respond in Slack to the teammate who asked for my review. And I want to do it all at the same time. So I came up with a solution, a Slack bot called LGTM Reply Gift, that combines one of my favorite gift sites, replygift.net. With a Slack slash command to post on my behalf. I built it in node.js and deployed it to aws lambda, where it will call the reply gift api, select the gift at random, post a comment to the GitHub PR, and respond back to the Slack channel with a message to our requester. So it sounds pretty great, right? And I'm going to show you how to do it. So I'm going to go to the Slack bot. I'm going to go to the Slack bot. I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. And I'm going to go to the Slack bot. Demo Slack sent me a request through Slack to review his PR. So I'm going to say I gave it a review. Say LGTM, I get my prompts to tell me how to write the command. So identify the repo. Identify the PR. Give him a little app shout out. And we get a dispatch failed error, which is a default error from Slack when it doesn't know what to say. So let's take a look at how we troubleshoot this in the cloud using CloudWatch. From the aws web console, we come to the lambda area. We find our function. We take a look at this Monitor tab here. Now, it's going to give us a list of recent invocations. And it's going to tell us what log stream they're in. So these recent invocations are interesting, but they're not really giving us troubleshooting information. So we're going to go check out our CloudWatch logs directly. Now, loading here, loading in the streams, our most recent is at the top. We're going to take a look there. And we see our one invocation is happening in this stream. And we see this error message here. N is not a function. That's good. We found the error. Let's take a look at the stack trace. It's from our index file. It's line 2, column 570,000. OK, so in order to troubleshoot this, we're going to need some source maps. So let's go ahead and enable those and redeploy. And while that happens, let's poke around CloudWatch a little more, get to know it better. So one thing we could do is take a look at View as Text, which expands all of these items. So we see a long-running history. Now, it is possible when you are troubleshooting log events in CloudWatch from lambdas that they are interleaved when you have multiple invocations happening at the same time. So what becomes useful for you is to take this request ID here. You see it shows up on all of the logs. And put that in the search. So let's go back and search all of our streams. And make sure to quote it, because the hyphens will work as negators for search terms if you don't. So now we see same view where we have everything is related to just our one invocation. Let's take a look at our deployment. It's done. So let's give it another shot to our Slack bot. Now, let's see. Let's go find a log that can give us the information we need. Hopefully, this is the most recent. Yeah, we see here, Fetch line 69, column 3. So now our source maps are working. We can go take a look at the code here. And it's saying that this dollar sign is not a function. And that's because I should have called cheerio.load instead of just cheerio as a function. So we fix that. We redeploy. And we're going to give it another shot. Still getting a dispatch failed. So what's going wrong here? We are scraping the reply gift page for images. But now let's take a look at troubleshooting and aggregated logs. Since going through each log individually is not terribly efficient, there are some filtering capabilities we saw. But if you're used to log aggregation tools like Elasticsearch and already have your other apps hooked up to it, you'll want to use that for your lambda log aggregation as well. So the difference is that in your apps running on a server, you manage or within a container, you have some process that is either reading a log file that you're writing to or it's reading from standard out and it's shipping that off to Elasticsearch. In the serverless environment, you don't have any long running processes to do this for you because our serverless functions are ephemeral by design. So we need to ship our CloudWatch logs that we already get instrumented from us for us by aws to our Elasticsearch cluster. One way to do this if you're already using, if you're already using Elasticsearch as a service in aws is to go to your log grouping, select the log group that you're interested in, and create a subscription filter to Elasticsearch. Here you select which cluster you want it to go to. You can choose a log format. We're using lambda in this case. But you could just as easily, whatever you're logging in and you can parse it. So we'll just go with lambda. It gives us this built in filter pattern. Have to give it a name. And you can test it against your existing log data. So we'll go ahead and test the pattern. Yes, we see these coming through. So you start streaming. I've already enabled that. And what ends up getting created for you is another lambda function that actually reads from your CloudWatch logs and is the one shipping it over to Elasticsearch. So we'll take a look here. And the nice thing about having Elasticsearch as opposed to just CloudWatch is we've got these nice columns we can add and help us hone in on what's going on. So we'll take a look here. Let's see. Let's see. All right, so we have a bad credentials error showing up from our last request. And we can take a look and we have the nice properties available to us that you expect in Elasticsearch. So it's taken all these for you. And we see the token. Can't see what the token is, but I noticed that it's a token of the same name. And I noticed that's right. When we develop locally, we're using a.env file. Just like with Heroku, we need to add that because we don't want to include those credentials in the repo. We've got to make sure we have the environment variables there. All right. Now let's give it another shot. Hey, it worked. All right. So now we have our Slack message, which is calling out our friend, Demoslack, saying that we posted a reply to their PR on behalf of me. And it looks good to me. And if I click this link, this will take me to my issue comment, my comment in the PR that lets them know. So we've got the posted comment to the GitHub PR. We respond back to Slack with the message in the channel. Now we put our Slack bot into the wild. And how do we know if it's performing? We could monitor the logs on a regular basis and hope we catch errors in a timely fashion. And we're able to do something about it. Let's say someone else gets an error when using the Slack bot. And because we're not the one typing the command, we can't see the error or even that there was an error. So let's give it another try here. Oh, yeah. So that dispatch failed. And we noticed the PR number's wrong. So what's going on? Let's take a look at another option for troubleshooting. And that's the continuous code improvement platform from Rollbar. So if we go to Rollbar, we've already instrumented the app. We see this error come up. It's the most recent. And we notice it's HTTP error not found. Now if I try this again, because I'm stubborn, and one more time, we notice that our total's gone up to five. But if we were to look at Elasticsearch, we would see just a bunch of these show up here. Just keeps pounding it. So this shows up nicely as a single line item for us. We can drill into it. And because we are able to connect with GitHub, we can drill into it. And we can see that we have a single line item. And because we're able to connect with GitHub, we actually have source code in here. And we've got our source maps uploaded. Now we see not found. We get a response when we make the GitHub call to issues 88. Well, that doesn't exist. That's the wrong PR. But we're not handling it properly. So we'll go ahead and fix that code. And before we do, we're going to resolve this. Now we go back. Try it again. And still failed. So now this time we're going to check the PR first, make sure it's there. And if it's not, we tell the person back that it does not exist or the bot has no access. Now if we do a PR that does exist, it works. So in our new solution, we fetch the PR from GitHub. We provide a friendly Slack response that's only visible to the caller. And we also get the author and make sure to call them out in our comment. So they get an alert from GitHub and Slack. So they can't run away from our LGTM. So if you want to learn more about Rollbar's continuous code improvement platform, make sure to check out David's workshop with a deep dive and tutorial that will have you set up and running it on your own. You can find the source code for the LGTM reply give Slack bot on GitHub here. And thank you very much for attending. If you sign up for a new account on Rollbar, please use this URL with the Git Nation promo code so that you can get a free full month beyond the normal trial period. Hey, Jeff. Thanks for joining. Hi, Metin. How are you? Very good. Thank you. So you asked the question, where do you deploy your application? And 65% has answered to a PaaS and only 25% to a kubernetes cluster and 19% to virtual machines. So how do you feel? Is this something you were expecting? For the most part, I'm actually surprised that the kubernetes cluster isn't higher, but it makes sense to me. Yeah, it seems like we have a pretty modern audience. So thanks for that. So we now have time to go over the questions from our audience. So I'm going to ask you to answer a few questions. And we're going to give our audience some time to present their questions, of course. What I was curious about is, well, you're working at Rollbar, and usually when I see people from a product company, developer product company, they will be talking about their product. So you didn't. So why did you choose to go completely off company? Well, I didn't want to go completely off. If you notice, I progressed my way towards Rollbar. And my goal was to show the different ways that you can, for people that are not familiar with serverless and want to get into it and how they can troubleshoot. And then some people may have been working in serverless, but not aware that they could ship their logs to an aggregated search. And then at the same time, eventually you'd like to be in a system like Rollbar, because it just makes everything easier. You include the SDK. It works directly from lambda. And you don't need to set up all these other things. And you have the ability to go directly there to your dashboard. You can set up alerts. And we have a workshop coming up, which does a very deep dive with David Waller. And for those that are interested in learning more about Rollbar, it's a great place to go and actually set up your account, set up your code with Rollbar. Yeah, it's a nice tool. I've used it at multiple companies in the past. So thank you for your work. A question from the audience. So if you ask 10 people what serverless is, you'll get 10 different answers, basically. So if I would ask you, you're person number 11, what would you say serverless is? I would characterize serverless by basically the constraint that you are handing code over to whatever the serverless provider is. And you are not managing any of the infrastructure involved. So you are deploying it to this system. And it is deciding where, when, and how to run your code. And all of the ways that's done is abstracted from you. So you don't know if it's running on a VM, in a container, on bare metal. To you, it doesn't matter. You're just handing over a function. And then when you have an endpoint, you call that endpoint. Your function will run. That's the guarantee they provide. Yeah, so for me, I'm a front end developer. And let's say I want to kickstart a startup or a pet project, then would you always recommend going serverless, just so you can focus on doing what you know, which is front end? And I know this is sometimes a barrier for people that I know as front enders to actually do more than developing something on their local machine and getting it off their local host, right? I think that approaching it from a serverless standpoint at the beginning is actually a really great approach, because one of the things that forces you to focus and think about being more encapsulated in your api calls. And you can always take a bunch of serverless functions and combine them into an api if you want to run that on a server. It's harder to go back the other way. So starting off with serverless actually increases your ramp up time, or I mean, decreases your ramp up time, gets you to a deployable unit faster. And at the same time, you can always move in a more consolidated direction. And it also provides you some best practices around programming and focusing on the inputs and outputs and what your api should look like on each individual endpoint. Awesome. So we got a comment from Hannah that says, I love the Slack bot. Maybe you mentioned it and I missed it in the talk, but can we use it, or should we host it ourselves, or basically use a serverless hosting platform for that, of course? But can we already use the bot? It needs a few more pieces. I cheated in that. I authorized the Slack bot. I authorized the app through GitHub, and it's always posting as me. So there's some fixes, and it's out there. So if anybody wants to fork it and harden it up and make it more applicable, happy to do it. Happy to take any pull requests. If somebody wants to add some code and submit a pull request, and maybe we could build this together as a community. I think it would be fun to have. I certainly was solving my own problem with this. Well, that's how 9 out of 10 inventions are made, I guess, right? That's right. Identify a problem, and you might be the only one in the world having this problem, but at least you have someone who identifies with this problem, Anna. So Anna, you can help Jeff out, right? PR is welcome. Absolutely. Yeah. All right, so we have no more questions coming in from our audience. Oh, Anna says, very cool. Thank you. So. Thanks, Anna. Can you plug your GitHub account once more so Anna can find you? Yeah, so it's E-U-D-A-I-M-O-S. All right, Anna, I hope you got that. And else, well, Jeff will be in his speaker room right after this. Jeff, I'm going to thank you for your time and hope to see you in real life maybe one day soon. That would be great. Have a nice day. Bye bye. Thanks, man. Thanks, Jeff.