AWS Lambda under the hood

Rate this content

In this talk I explain how the AWS Lambda service works explaining the architecture, how it scales and how a developer should think about when they design their software using Lambda functions

22 min
17 Apr, 2023


Sign in or register to post your comment.

AI Generated Video Summary

In this Talk, key characteristics of AWS Lambda functions are covered, including service architecture, composition, and optimization of Node.js code. The two operational models of Lambda, asynchronous and synchronous invocation, are explained, highlighting the scalability and availability of the service. The features of Lambda functions, such as retries and event source mapping, are discussed, along with the micro VM lifecycle and the three stages of a Lambda function. Code optimization techniques, including reducing bundle size and using caching options, are explained, and tools like webpack and Lambda Power Tuning are recommended for optimization. Overall, Lambda is a powerful service for handling scalability and traffic spikes while enabling developers to focus on business logic.

1. Introduction to AWS Lambda

Short description:

In this session, we will cover the key characteristics of Lambda functions, how the service architecture works, how to compose a Lambda function in AWS, and how to optimize your Node.js code for Lambda. A Lambda function allows you to focus on creating value for your customers by writing code that maps your business capabilities into production. You only pay for the execution time, making it cost-effective for both production and testing environments. You can provide your code through a zip file or container images, and choose from a range of built-in languages or bring your own. A customer even created a custom runtime in COBOL.

Hi, everyone, and welcome to this session on AWS Lambda Under the Hood. I know several of you are not looking only on the how to build some code, but also why you should build the code in a certain way. Either you are an expert on writing Lambda functions or you are thinking to use serverless and AWS Lambda in your next workload, I think at the end of this talk you will be able to take a conscious decision on why to write the code in a certain way.

So without further ado, let's go ahead. My name is Luca. I'm a principal serverless specialist in AWS. I'm based in London. I'm an international speaker and a book's author. So in this talk we're going to cover quite a lot of ground, and especially what I want to cover is, first of all, what is a Lambda function? Because maybe you have an idea of what it is, but let's try to nail the key characteristics of Lambda functions. Then we move on, understanding how the service architecture works under the hood. We then discuss how we compose a Lambda function in AWS. And then we move into the function lifecycles and how to leverage them to maximize the benefit of your code. And last, but not least, we are talking about how to optimize your Node.js code for Lambda.

There is a lot of ground to cover, so let's start. What is a Lambda function? A Lambda function, in a nutshell, is as simple as you provide some code and we take care about provisioning and managing the service for you. So you don't have to think about the networking side or how to orchestrate the scalability and so on and so forth. You just focus on what really matters for you. So creating value for your customers and moreover writing some Node.js code that maps basically your business capabilities into production. You pay by millisecond. So every time that you invoke a Lambda, you pay for only the execution time and nothing more. And that is a great way to think not only on production, but also think about, for instance, testing a staging environment that are not used 24-7, like your production environment. There, you just pay for what you use. You don't have to provision containers or virtual machine that are there 24-7. You can serve the content in two ways. You can provide us your code through a zip file when the file is up to 250 megabytes, or if it's bigger than that, you can use container images up to 10 gigabyte. In both cases, you can leverage the benefit of AWS Lambda without any problem. We also offer a built-in languages. We have some runtimes that are available and are managed by us for you, like Java, Go, Node.js, .NET, Python, and many others. But also, if you have a specific use case, you can bring your own, and we operationalize in the exact same way. A classic example, we had a customer that was working in finance, and they needed to use COBOL, but they really fell in love with Lambda, so they created their own custom runtime in COBOL.

2. Lambda Invocation Modes and Architecture

Short description:

Last but not least, scale up milliseconds. Lambda has two operational models: asynchronous invocation and synchronous invocation. The code you write runs inside a MicroVM sandbox, which is part of a Worker that operates in an availability zone. AWS takes care of making your code highly available across multiple data centers. In synchronous mode, the API gateway calls a frontend service in AWS Lambda, which immediately invokes a worker to run your code and return the response. In the case of synchronous Lambda functions, events are pushed into a message queue and the caller receives an acknowledgement.

Last but not least, scale up milliseconds. So based on your traffic pattern, we take care of scaling your Lambda function and answering all the requests that are coming. So usually, from a developer perspective, what you see is that you're writing some code, as you can see on the left of this slide, and then you upload on your AWS account, and some magic happens, and you start to have an API that is functioning.

The reality is there is way more. So the question now is, have you ever thought how Lambda works under the hood? So let's take a look into that. First of all, there are two operational model of Lambda functions. The first is asynchronous invocation, where, for instance, you have an API gateway that exposes your API, there is a request coming in from a client, and a Lambda function is triggered with the response, and then you serve the response synchronously to your client. The other option is asynchronous invocation, where in this case you have a service that is pushing an event into the Lambda service, the Lambda service stores inside an internal events queue the event, and then Lambda function starts to retrieve these events slowly but steadily and operate some work on that. The requester, so in this case Amazon EventBridge, for instance, receives directly an acknowledgement and nothing more. And those are the two ways that the Lambda invocation works.

So if we see on the grand scheme of things, how from what you see on the left of the slide, whether there are multiple services or even synchronous or not, are sending a request to the AWS Lambda service, that is the big rectangle that is inside this slide. And how to move into your code that is on the far right of this slide, where there is a MicroVM sandbox, is an interesting journey. And especially, I want to first highlight what's happening inside your sandbox. The sandbox is where your code is running. Now, if you think about that, your MicroVM, where there is the code that you have written and is operationalized by us, is running inside a Worker. And obviously, there isn't just one Worker. There is way more. Usually in AWS, we have multiple availability zones. And as you can see here, you have multiple Workers running inside one availability zone. And the availability zone is a data center. So think about how a data center looks like, and that's your availability zone. But usually in AWS, every time that we create a region, it's composed by multiple availability zones. Therefore, every time that you are pushing the code inside Lambda, automatically, you're going to have your code that is available in multiple data centers. You don't have to do anything. You just focus on which region you want to deploy and what is the business logic. And we take care about not only operationalize the code, but also making it highly available across our infrastructure. So now let's take a deeper look into the invocation mode and how it works inside architecture. So in the synchronous mode, what happens is, for instance, in the API gateway is calling synchronously, a frontend service that is inside the AWS Lambda service that is returning an immediate response because what happens is it's invoking a specific worker, spinning up a micro VM and your code is start to run and return immediately either the response or the error to the client. Instead, when you are thinking about the invocation mode for synchronous Lambda functions, it's slightly different. So in this case, for instance, we have SNS that is pushing an event into a message into the front end that the front end is storing inside an internal queue the specific message the caller received an acknowledgement just saying yes, we took into account your request and then you enter inside the internal queue.

3. Lambda Function Features and Micro VM Lifecycle

Short description:

The beauty of Lambda is that it supports features like retries, destination definition, and event source mapping. With event source mapping, you can easily handle batching and error handling without writing additional logic. When you upload code, your micro VM is created using File Cracker, an open-source micro VM designed for Lambda functions. It works by loading the code, creating the micro VM, and eliminating cold starts. Once the micro VM is created, your Lambda function runs smoothly without latency. The lifecycle of a Lambda function involves three stages.

And the beauty of this approach is that we're not only running your code, but also we are supporting a bunch of features like you can set up the retries mechanism. So up to three invokes in total. When something is failing you can define the destination, and they also can define that latter queue that is a useful pattern when you want to automatically retry creating your own workflow, the errors, or even doing the debugging of your system.

There is a third method that we didn't talk about it is even source mapping and there are certain services like MSK or Kinesis or SQS or Dynamo streams that are using another service that is available in the AWS Lambda services called even source mapping. The even source mapping takes care of pulling messages from the source and then in synchronously invoke your Lambda function. Obviously, thanks to even source mapping you can do batching, error handling, and way more. The beauty of this is we think about other services, you need to write your own logic. In this case for for Lambda is completely obstructed for you so you just set up how it looks like your batch of messages and then that will be sent into your Lambda function code for operationalize your business logic.

Now, we have seen how the service works, but now let's try to understand when you upload some code how it looks like when your micro VM is been up for running the code. So every micro VM is using file cracker. File Cracker is an open source, let's say micro VM that we created for Lambda function, especially for serverless computing, and now is also available for other AWS services. It's completely open source, as I mentioned before. Therefore, you can look into it. You can see how it works, but this is a fantastic piece of software that we are using for operationalizing your code for Lambda functions.

So how it works. So usually it works in this way. When there is an input, someone is triggering and invoking a Lambda function, we select a worker host, the event input arrives into the worker host. The first thing that it does is loading the code that you have uploaded. This is usually called cold start and it means that is cold start mainly because it takes slightly longer than usual retrieving the code at runtime. And then creating your micro VM with Frye, cracker and obviously the runtime that you selected or the container. There are different levels on how these things are cached inside the system. But in general, the idea is you have a cold start when you build the first time the micro VM with the code and everything. After that you have your Lambda function worm. So you start to hit the Lambda function of the micro VM multiple times with different inputs, and suddenly you don't have cold start anymore. You have a lot of responses that are not handling. It doesn't have any latency on generating my micro VM. And that is great. So now I think is a good time thinking about the lifecycle part or how the Lambda function works when you generate a new micro VM. This is probably the best diagram that I can show you, and it's totally available inside our website. Usually, when you have a Lambda function, you go through three stages.

4. Lambda Initialization, Invocation, and Shutdown

Short description:

You have the initialization, invocation, and shutdown phases in Lambda functions. During initialization, extensions and runtimes are loaded, and function-specific tasks like retrieving secrets and establishing database connections are performed. In the invocation phase, the execution environment is warm, allowing for fast response to requests. Finally, during shutdown, the runtime and extensions are shut down. For more information, refer to the documentation.

You have the initialization, then the invocation that happens multiple times till your execution environment is still up and running and available, and then the shutdown, that is when, let's say, your Lambda function is not called anymore. Therefore, as we said, you pay for your execution time. After a while, we claim back the infrastructure, and you just go ahead till the next invocation, obviously.

Now, in every single of those steps, you have different things that are happening. So, let's start to look into the initialization phase. So, in the init phase, we have, in this case, the extension because you can use extension for your Lambda function. Think about sidecars that are available in your execution environments, and that's usually the first thing that you load. Then, there is the runtime loading. So, if you're, for instance, you selected node.js, there is the node runtime that is initialized, and then, suddenly, you have the function initialization. Here is one of the key parts of the system. Usually, in the function initialization, what you do is, for instance, retrieving secrets that are used during the invocation of your lambda function, or even parameters that are stored externally of your lambda function. The beauty of this thing is that it's done only at the initialization phase, so you don't need to retrieve every single time certain information. Even establishing a connection with a database, you can do at the initialization phase, and then forget about that. That will be available for all the invocation that the execution environment is doing. So then, we can move into the execution environment, sorry, the invocation phase, where the execution environment is already warm and, therefore, you start to respond to every request. Bear in mind that till the execution environment is up and running, you can definitely skip all the initialization phase. Therefore, your lambda function will be quite fast, because it's just invoking and is there, available for you. At the end, when we claim back the lambda function, you have a runtime shutdown and the extensions shutdown. Pretty easy. If you want to know more, you can look into the documentation at this link and you will be able to see more information about this diagram and how the lifecycle works.

5. Optimizing Code in Lambda Functions

Short description:

To optimize your code in Lambda, reducing the bundle size is crucial. Tools like webpack and ESBL can help eliminate unnecessary dependencies and functions, resulting in faster code start times. Provisioned concurrency is useful for predictable workloads, allowing you to keep Lambda functions warm during peak traffic periods. The new JavaScript AES SDK version 3 offers a smaller package size and embedded TCP connection handling. Lambda caching options include in-memory caching at the execution environment and using services like Elastic File System or Elastic Cache for data storage across multiple functions. The open-source tool Lambda Power Tuning helps optimize invocation time and cost by determining the best setup, including architecture and memory size. Lambda Power Tools streamline observability with logging, tracing, and metrics integration. Lambda is a powerful service that enables you to focus on business logic while handling scalability and traffic spikes. For more information, refer to additional talks and the security white paper.

Now, let's talk about how you can optimize your code in lambda function, and especially in Node.js code. As we said here, we have our input that is receiving externally from the service, then you generate a microvm with your code. But obviously, in order to reduce the time of code start time, one of the key techniques is trying to reduce the size of your bundle. And in this case, how it works? You can use webpack, ESBL, whatever are the tools that are needed in order to reduce the size of your bundle. This obviously will make a faster code start because suddenly your code is just a piece of logic and a bunch of libraries that you're using. But all the dev dependencies, for instance, and all the, let's say, functions that are not using that code that you can eliminate, tree shaking you can apply, is totally helpful for reducing the code start when you're working with Lambda functions.

Another functionality that is available in Lambda, if you have a specific workload, for instance, imagine that you have a workload that is predictable and you know upfront that on Sunday evening from 7pm to 9pm, you are going to have a surge of traffic, you can use provisioned concurrency. Provisioned concurrency basically will allow you to set a time frame and select a specific Lambda function for keeping it warm. So what happens is that we select your Lambda function and we start to initialize upfront for that specific period of time you're going to have some Lambda functions that are available and are already warm for fulfilling the surge of traffic. That is a fantastic way for scaling your workloads without having a cold start and moreover if you have top level awaiting the initialization phase is going to take care about that part and therefore what is going to happen is that stuff like for instance retrieving parameter from external sources like system or parameter store from system manager it will be available at initialization phase and then you will it won't need it anymore so you reduce your execution time of your lambda function.

Another suggestion is to use javascript AES SDK version 3. That is a fairly new release obviously the package is smaller compared to v2 and that is a great thing. Also there are a bunch of other features that are quite handy so for before you were using a version 2 of the SDK and DynamoDB one suggestion that we usually provide is that you need to keep alive the TCP connection otherwise every time that you call and execute the lambda function it establish a new TCP connection. Now these are fading away with the version 3 and are embedded inside the SDK so you don't have any more to handle by yourself moreover that is something that often people don't know is that the run times the node.js run times embed specific version of the AWS SDK therefore if you're not using a specific version of the SDK or you're using let's say already the stream streamlined implementation or something similar you don't need to embed inside your dependency the SDK but you can leverage the one that is shipped alongside the runtime so you have one dependency less to handle.

The caching for lambda you can cache in memory and there are two ways to do so. The memory cache could happen at the execution environment so as we have sent inside your micro VM and that is happening at the initialization phase as we have seen before and the talk extensively. Also you have a temp folder that can store up to 10 gigabyte of data by default is 512 megabyte but you can obviously increase this limit. Also you can store and cache across multiple lambda function if it's needed. For instance you can use services like elastic file system or elastic cache in using Redis or memcache in the case of elastic cache for storing your data across multiple lambda functions. So in that case they are going to retrieve them from EFS or elastic cache and you will be good to go.

Another tool that is worth mentioning and an optimization at a high link ratio is to use an open-source tool called lambda power tuning. Power tuning runs your code and lambda function in a way that allows you to understand what is the best setup for your lambda function invocation time and cost of invocation. Obviously there are different parameters that you can set and very often people are thinking that the lowest amount of memory will be always the cheaper but in reality it's not and that's why we have this tool that you can run it and understand which is the right architecture to run lambda it could be arm or x86 and also which is the right memory size based on the optimization that you want. As you can see in this slide you can optimize for cost you can optimize for time and both are a valid dimension to think about.

Last but not least there is a library called lambda power tools that streamline the integration of observability inside your lambda. In fact logging, tracing, and metrics became a breeze thanks to the usage of lambda power tools and it will allow you to have the best practices embedded inside your code out of the box. If you want to know more you can find the link in this slide. To recap what we have seen so far is that lambda is a fantastic service that allows you to just use what you really need. You optimize your focus, your day-to-day on creating business logic more than operationalize your infrastructure and definitely end those spike of traffic to a certain extent and also other type of workloads. If you want to know more about how lambda works under the hood there is another talk that I really recommend that was done by a colleague of mine Julian Wood and with also a Principal Engineer from the Lambda Team that is going to talk about more in depth what we have seen so far and also you can take a look to the security white paper that is available on AWS website that provides a lot of insight at the security model and how lambda works under the hood.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

React Advanced Conference 2021React Advanced Conference 2021
30 min
Building Dapps with React
Decentralized apps (dApps) are continuing to gain momentum in the industry. These developers are also now some of the highest paid in the entire trade. Building decentralized apps is a paradigm shift that requires a different way of thinking than apps built with traditional centralized infrastructure, tooling, and services – taking into consideration things like game theory, decentralized serverless infrastructure, and cryptoeconomics. As a React developer, I initially had a hard time understanding this entirely new (to me) ecosystem, how everything fit together, and the mental model needed to understand and be a productive full stack developer in this space (and why I would consider it in the first place). In this talk, I'll give a comprehensive overview of the space, how you can get started building these types of applications, and the entire tech stack broken apart then put back together to show how everything works.

React Summit 2023React Summit 2023
28 min
Advanced GraphQL Architectures: Serverless Event Sourcing and CQRS
GraphQL is a powerful and useful tool, especially popular among frontend developers. It can significantly speed up app development and improve application speed, API discoverability, and documentation. GraphQL is not an excellent fit for simple APIs only - it can power more advanced architectures. The separation between queries and mutations makes GraphQL perfect for event sourcing and Command Query Responsibility Segregation (CQRS). By making your advanced GraphQL app serverless, you get a fully managed, cheap, and extremely powerful architecture.
React Summit 2020React Summit 2020
25 min
Building Real-time Serverless GraphQL APIs on AWS with TypeScript and CDK
CDK (Cloud development kit) enables developers to build cloud infrastructure using popular programming languages like Python, Typescript, or JavaScript. CDK is a next-level abstraction in infrastructure as code, allowing developers who were traditionally unfamiliar with cloud computing to build scalable APIs and web services using their existing skillset, and do so in only a few lines of code.
In this talk, you’ll learn how to use the TypeScript flavor of CDK to build a hyper-scalable real-time API with GraphQL, Lambda, DynamoDB, and AWS AppSync . At the end of the talk, I’ll live code an API from scratch in just a couple of minutes and then test out queries, mutations, and subscriptions.
By the end of the talk, you should have a good understanding of GraphQL, AppSync, and CDK and be ready to build an API in your next project using TypeScript and CDK.

DevOps.js Conf 2021DevOps.js Conf 2021
33 min
Automate React Site Deployments from GitHub to S3 & CloudFront
In this talk, I will demonstrate how to create a CI/CD pipeline for a React application in AWS. We'll pull the source code from GitHub and run tests against the application before deploying it to an S3 bucket for static site hosting. The site will then be distributed using CloudFront which will point to the S3 bucket. All of the infrastructure will be built using Terraform. In addition, I'll make use of Terragrunt to show how to create this setup for multiple environments.

DevOps.js Conf 2022DevOps.js Conf 2022
8 min
Serverless for Frontends
In this talk micro frontend expert Florian Rappl will introduce the pattern of creating a Siteless UI. This is a frontend composed of different pieces that can be developed independently and are deployed without having or managing any server. Florian will show you how to get started in that space, what decisions to take, and what pitfalls you should avoid.

Workshops on related topic

Node Congress 2021Node Congress 2021
245 min
Building Serverless Applications on AWS with TypeScript
This workshop teaches you the basics of serverless application development with TypeScript. We'll start with a simple Lambda function, set up the project and the infrastructure-as-a-code (AWS CDK), and learn how to organize, test, and debug a more complex serverless application.
Table of contents:
        - How to set up a serverless project with TypeScript and CDK
        - How to write a testable Lambda function with hexagonal architecture
        - How to connect a function to a DynamoDB table
        - How to create a serverless API
        - How to debug and test a serverless function
        - How to organize and grow a serverless application
Materials referred to in the workshop:,HYgVepLIpfxrK4EQNclQ9w
DynamoDB blog Alex DeBrie:
Excellent book for the DynamoDB:

React Summit 2022React Summit 2022
108 min
Serverless for React Developers
Intro to serverless
Prior Art: Docker, Containers, and Kubernetes
Activity: Build a Dockerized application and deploy it to a cloud provider
Analysis: What is good/bad about this approach?
Why Serverless is Needed/Better
Activity: Build the same application with serverless
Analysis: What is good/bad about this approach?
GraphQL Galaxy 2021GraphQL Galaxy 2021
143 min
Building a GraphQL-native serverless backend with Fauna
Welcome to Fauna! This workshop helps GraphQL developers build performant applications with Fauna that scale to any size userbase. You start with the basics, using only the GraphQL playground in the Fauna dashboard, then build a complete full-stack application with Next.js, adding functionality as you go along.
In the first section, Getting started with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.
In the second section, Building with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.

Node Congress 2022Node Congress 2022
83 min
Scaling Databases For Global Serverless Applications
This workshop discusses the challenges Enterprises are facing when scaling the data tier to support multi-region deployments and serverless environments. Serverless edge functions and lightweight container orchestration enables applications and business logic to be easily deployed globally, often leaving the database as the latency and scaling bottleneck.
Join us to understand how solves these scaling challenges intelligently caching database data at the edge, without sacrificing transactionality or consistency. Get hands on with PolyScale for implementation, query observability and global latency testing with edge functions.
Table of contents
        - Introduction to
        - Enterprise Data Gravity
        - Why data scaling is hard
        - Options for Scaling the data tier
        - Database Observability
        - Cache Management AI
        - Hands on with