Building Reliable Backends with Durable Execution

Rate this content
Bookmark
Sylwia Vargas
Sylwia Vargas
21 min
04 Apr, 2024

Video Summary and Transcription

This Talk explores the paradigm of message queues for reliable backend execution. It highlights the benefits of message queues, such as guaranteed delivery and offloading of long-running processes. The drawbacks of using queues are discussed, including the complexity of managing infrastructure and applications. The solution of using a reliability layer called Ingest is presented, which allows for non-blocking background tasks and provides a dashboard for monitoring and managing jobs. The Talk also emphasizes the importance of reliability in building software systems and introduces the expanding scope and functionality of Ingest.

Available in Español

1. Introduction to Message Queues

Short description:

Hello, everyone. Welcome to my talk about reliability, backend, and execution. I will discuss the paradigm that makes life easier. We are now living in a constant 90s nostalgia. The 90s brought us many great things, but there is one thing we could say goodbye to: queues. Message queues are a form of asynchronous service to service communication. They allow for guaranteed delivery and offloading of long-running processes.

Hello, everyone. Welcome to my talk where, for the next 20 minutes, I will talk about reliability, backend, and execution. Just a quick introduction. My name is Sylvia Vargas. I'm from Poland. I really love pierogi and previously I worked at StackBlitz. Now I'm a developer relations lead at Ingest.

This talk is about the paradigm that makes life easier. But before we talk about the good, let's talk about the bad. We are now living in a constant 90s nostalgia. And, of course, this is no surprise. The 90s brought to us a lot of different things, great stuff that really is still with us. However, there is one thing that possibly we could say goodbye to. And these are the queues.

So let's look at what message queues are. A message queue is a form of asynchronous service to service communication using service and microservices architecture. Messages are stored on the queue until they are processed and deleted. Each message is processed only once by a single consumer. But here I need to interject because in actuality, multiple workers can consume messages from a queue. In order to preserve ordering of tasks, they will need to execute serially. But back to the definition now. And message queues can be used to decouple heavyweight processing to buffer or batch work and to smooth spikey workloads. So you can think about it that once you add something to the queue, it will reach its destination one by one. The delivery is guaranteed. And what's happening in the queue does not impact other parts of the infrastructure. And queues can be really massive.

So let's recap. With queues, you get guaranteed delivery because you know that once something is added to the queue, it will leave it only once it's processed. And queues allow developers to offload long-running processes to the background so that your application does not choke. You would use queues for data-intensive processes or when integrating with external systems.

2. Drawbacks of Using Queues

Short description:

And another benefit of queues is horizontal scalability. However, there are drawbacks to using queues. Building additional infrastructure and managing complex applications can be a lot of work. In times of limited budgets and resources, it's worth considering if managing queues is the right choice. Instead, durable execution allows us to define workflow logic in our application code and ensures reliable execution.

And another benefit is horizontal scalability because multiple messages can be processed in parallel. As workload increases, multi-applications can handle high throughput while remaining reliable.

However, there is a but. So let's look at this Reddit comment. So queues are great in data intensive processes, as I said, that don't need to run on main thread because they execute asynchronously. The tasks are processed in the background and the application is still responsive. However, there are some drawbacks to the queues, which this Reddit user delicately mentions in this quote. Once you take something from the queue, the rest is on you. And queuing service does not care anymore. So what does it even mean? Let's look at that. So queues are great when your application is simple. When it grows in complexity or if it's distributed, you all of a sudden need to worry about a whole wealth of additional infrastructure that you need to build.

And it's going to be you who needs to build it. So, for example, you will need to build concurrency because you want to be able to control how many steps are executed at one time. Or, for example, debouncing because we all know how costly it is when functions execute multiple times. Or state persistence and management because now that you have a distributed or complex application, you have to share state across different functions and queues. Then there's also error handling because what if just hypothetically one service provider has an outage? You will need to include retries and also failures. I mean, retries for failures and also timeouts. And in that case, you also need to recover tooling to understand and process the errors and failed events.

So this already sounds like a lot of work and it's not even an exhaustive list. So you don't have to listen to me on that. In times like this, when engineering budgets and headcounts are slashed down, we as individual developers, engineers need to do more with less. So it is really worth asking at this point, do you really want to be in the business of managing and operating your own queues? Well, Matthew Druker, the CEO of SoundCloud, doesn't think we should. So if this is now a common knowledge, why are people still using queues? Well, we are used to something. It feels familiar and cozy even if it's not the coziest solution. You can make everything work with just enough effort.

Fortunately, there is a better solution that builds on the concept of message queues. So instead of separating our infrastructure, such as queues from our code, what if we could define our workflow logic purely in our application code and ensure it executes reliably? So this is what durable execution gives us. Durable execution is, as the name says, durable. It guarantees that our code will run, it will be completed, even if there are messages failures along the way.

3. Additional Functionalities and Real-World Example

Short description:

This part explains the additional functionalities provided by the system. It also presents a real-world example of building a signup flow using third-party services, emphasizing the simplicity and ease it offers to developers.

So this part is the same as message queues. However, unlike message queues, you also get retro logic, handling errors, or persisting state, which comes out of the box. You don't have to build it. You are given this.

The other part is flow control, which is everything else that is needed for you to be able to run your functions reliably, such as concurrency, debouncing, task prioritization, handling failures, or recovery tooling.

So enough theory. Let's look at a real world example. So you work at a new hot restaurant booking startup. Your boss asks you how long it will take you to build a signup flow. So you get a list of requirements from the product manager. You look at it with an ice latte in hand. You are completely cool.

So, here's the code. And you would create a user in the database. You would send a welcome email. And you would add the user as a member to a mailing list. Well, this looks very simple. And we like simple. Right? We are lazy as developers. Building applications using third party services is smart and makes your life easier.

4. Challenges of Distributed Applications

Short description:

Building applications using third-party services is smart and makes your life easier. However, there's a downside. Sometimes services can be slow, causing blocking code and slowing down the API. This affects user experience. Managing retries, logging errors, and implementing a recovery system adds complexity and increases the development timeline.

Job done. You can take a big sip of your ice latte and go back to playing Wordle or reading one of those thousands of mega threads on Twitter.

So, here's the code. And you would create a user in the database. You would send a welcome email. And you would add the user as a member to a mailing list. Well, this looks very simple. And we like simple. Right? We are lazy as developers. Building applications using third-party services is smart and makes your life easier.

So, we like easy. But where is the problem? Is my presentation finished here? Well, no. Well, there's a downside. Because now we have created a distributed application where we have no control over large parts of the infrastructure that we rely on. For example, sometimes services can be slow. Sending an email or even on a good day can take half a second. So, we have a problem now. We have blocking code in the critical path of our request. As a result, we are making our API slower. In other words, your user is wasting time. User experience shouldn't suffer because of the business requirements.

So, how are we doing so far? Is it fast? Well, no. But is it reliable? Also, no. Imagine that as you're adding a user to a mailing list, a service goes down. You need to manage retries. One, two, three, four retries. And what if something fails permanently? You need to log these errors to a logging service. And you also need to figure out a recovery system. So, the estimation to build this very simple feature is weeks now instead of days. You need to set up the infrastructure and processes.

5. Solving Partial Failures with Ingest

Short description:

Partial failures are bad. Ignoring errors, showing errors to users, or having users retry signing up can lead to lost customers and duplicate entries. This results in a slow and frustrating app. To solve this, we can move non-blocking tasks to background jobs using Ingest. Ingest is a reliability layer that allows you to define and execute functions asynchronously. It provides a dashboard for monitoring and managing jobs.

Also, partial failures are really bad. So, imagine this. You've added the user to the database, but haven't sent them an email. So, now we have three options. First, you can ignore the error, which means that the user is not on the mailing list. Second, even worse, we show the user the error, which will lead to a lost customer. And third, worst case of all, the user will retry signing up. Let's see that.

So, let's assume that the user gets the error and tries to sign up again. But now, the user that creates will error out because there's a duplicate. Well, good luck recovering from that. So, this is the mess we are in. And the app takes forever to work. It takes me forever to build. I will be sore at my work because all of a sudden I have to be dealing with support backlog. We are still not dealing with the persistent failures. And everyone is unhappy. My boss is unhappy. I am stressed. I'm losing my sleep. But there's a solution. We can make our code faster and more reliable.

So, let's move the non-blocking tasks to background jobs. So, first, we'll add Ingest to the project. Ingest is a reliability layer for your app. So, with Ingest, you define functions or workflows using its SDK right in your code base, and then you serve them through an HTTP endpoint in your application. So, Ingest then takes care of reliably executing functions asynchronously. So, there's also a dashboard where you can monitor, debug, and manage your jobs. It's all visual. So, this is how your app looks like now.

6. Using Ingest for Reliable Function Execution

Short description:

We add a reliability layer called Ingest. Functions are wrapped in Ingest.create to be executed when triggered by an event. The same event name is used for multiple functions to fire simultaneously, known as fan out. Instead of invoking functions directly, we trigger events in Ingest via an HTTP endpoint. Ingest executes the functions and provides notifications on the dashboard. Ingest automatically retries failed functions until they succeed.

We are going to add reliability layer which is Ingest. First, we are going to wrap this function in Ingest.create function. As you see, we are providing an event name. Later when the user signs up, we will trigger an event with this name. You will see that in a second.

This will then tell Ingest to execute the function. So, this is how it would look in code. We are creating the function. We provide the event name, and we are invoking our existing MailChimp code from before. And now we'll do the same for the other function. As you see, we are using the same event name. This is because we want these two functions to fire at the same time. So, when the user signs up, we want these two functions to fire. This pattern is usually called fan out.

So, now when the user clicks the button, we will send an event to Ingest. This is how it looks in the code. Instead of invoking these functions directly, we'll trigger an event in Ingest. Like I mentioned before, we expose functions to Ingest via an HTTP endpoint. Ingest uses this endpoint to execute the specific functions when an event is triggered. This is the endpoint. Ingest will use it to download the function definitions and then to execute them.

And here is the complete flow. So, Ingest calls the correct functions at the precise time you want. And then on the dashboard, you will get a notification that there was an event triggered, which in turn called two functions. So, we see that they completed, and also when. So, all this is looking good. But what happens when there is a failure? Well, let's look at that. Ingest invokes the function and let's say it fails with an error code. So, it will retry it and retry it until it finally succeeds. You don't need to worry about it.

7. Exploring Additional Ingest Functionalities

Short description:

You can easily debug errors with Ingest console tool and recover by retriggering failed events. Ingest allows scheduling tasks in the future and orchestrating multi-step processes. Increase user retention by sending activation email drip campaigns.

Moreover, you'll get a detailed log of what happened. I know it's difficult to believe this, but sometimes the errors are persistent not because of the service shortages, but because there are bugs in our code. I know this is very difficult to believe. But in those cases, you can actually easily debug it with Ingest console tool and once you've fixed your function, you can recover by retriggering failed events.

So, how are we doing now? We wanted to make our app faster, so we moved the non-blocking tasks from the user's critical path to background jobs. But also, we got reliability as a nice addition. So, now we have access to this great infrastructure. So, let's see what else we can do with it.

This is our app right now. I didn't tell you yet, but Ingest actually allows you to schedule tasks in the future and also orchestrate multi-step processes. So, let's look at the send email function. Here, we are just sending a welcome email, but it is always nice to increase user retention. We could send them activation email drip campaign in the first week. How would we go about that?

8. Building a Drip Campaign with Ingest

Short description:

We can use Ingest steps to build a drip campaign with sequential progression. Ingest handles scheduling for pausing function execution. The last step is the final email with tips. The campaign can be dynamic based on user actions. Use booking event to determine the course of action.

So far, we have been talking about the fan out pattern, where you have multiple functions firing up on the same event. However, many tasks require sequential progression. Here we are building a drip campaign, so it's convenient that we can express the whole timeline as a procedural code.

So, we will use Ingest steps within this function. We are here, as you can see, we are using Ingest step that run. In this way, the code will get automatically retried if it fails. But the code that runs correctly will never be retried again. We will see it in action in a bit. And first, we are sending a welcome email. This is the part, the same part that we did before. Then Ingest will pause execution of this function for four days. For this, we are using step.sleep. From a programmer perspective, it looks similar to putting set time out, but actually, in the background, Ingest handles the scheduling for you. So, this means that your serverless function does not run for four days. So, you don't have to sell your kidney to pay your AWS bill.

And the last step is the final email with tips. If there is a failure on one of the steps, Ingest will know that the other steps worked and only retry that one step until it works. So, now we have built a successful drip campaign and we can we deserve a round of applause. But we can actually do better. Imagine that someone has already signed up and immediately made a booking. It wouldn't make sense for them to receive the same email as someone who didn't finalize a booking. Maybe they need different types of tips or different CTA. The campaign could be actually dynamic based on the user actions. So, let's do it. Let's delete the last two steps. So, elsewhere in your app, when the user completes a booking, there's an event sent that's called booking.created, just like user.signup. So, now we use this event to determine the course of action. Here, we are waiting for four days to see if this event will even happen. Next, we'll now use a booking event to determine the course of action. If the booking was made, we'll reward this person with power user tips.

9. Expanding the Scope of Ingest

Short description:

If the booking was made, we'll reward this person with power user tips. There are numerous use cases that go beyond just marketing campaign. You can build complex payment flows, LLM, prompt chaining, or multiple step data transformation. Ingest is also framework and language agnostic.

If the booking was made, we'll reward this person with power user tips. And, well, if they need four days to make a booking, they need some basic tips. And this is honestly so much fun, that's why I stopped there. You could go wild and create a lot of emails with a lot of tips.

And speaking of tips, all of this is just the tip of an iceberg. We are talking here about sending emails. But interest is not a tool for sending emails. There are numerous use cases that go beyond just marketing campaign. You can build complex payment flows, LLM, prompt chaining, or multiple step data transformation. Whenever you need to have a bunch of stuff happening in response to a given event, you could consider Ingest.

Moreover, you can also integrate, you can also migrate from one cloud to the other with zero downtime. Ingest is also a framework agnostic. Here are just a few of them. But we also recently added support for BAN and ASTRA, for example. And finally, it is also language agnostic. In this example, we saw a lot of TypeScript and type safety. But in addition to TypeScript, we also have SDKs for Python and Go. We are also looking to add more.

10. Expanding SDKs and Importance of Reliability

Short description:

We recently added support for BAN and ASTRA. Our SDKs are available for TypeScript, Python, and Go, with plans to add more. Mix and match SDKs in your workflows and invoke functions written in one language from another. We also have a local dev server for easy testing. Building reliable systems is crucial for user satisfaction and team productivity. Ingest serves as a reliability layer, but sooner or later, you will need a solution. Reliability should be considered early in architectural choices. Thank you.

But we also recently added support for BAN and ASTRA, for example. And finally, it is also language agnostic. In this example, we saw a lot of TypeScript and type safety. But in addition to TypeScript, we also have SDKs for Python and Go. We are also looking to add more. And by the way, our SDK spec is open source and we are inviting contributions. And fun fact, you can also mix and match all those SDKs in your workflows and invoke functions written in one language in another. There is also, finally, a local dev server, which doesn't require you to log in, so you can go and check it out right now.

But, you know, here I spoke a lot about Ingest. But this talk is not only about Ingest. When building real-world production applications, reliability is really important. Not only does it keep your users happy, but it makes your team more productive. And you as a developer are less backed down by maintenance and operations. Achieving reliability is hard. Every engineer who has ever had to build a reliable system at scale knows the amount of iteration and infrastructure that goes into that. In this example, we use Ingest as the reliability layer. But whether or not you use third-party solutions, sooner or later, you will end up needing one. Reliability is like security. It's hard to add afterwards. So, baking it into your architectural choices from the get-go is usually quite a good idea.

And, yeah, thank you all. If you would like to reach out to me or be friends, here's where you can find me. Thank you.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Vite: Rethinking Frontend Tooling
JSNation Live 2021JSNation Live 2021
31 min
Vite: Rethinking Frontend Tooling
Top Content
Vite is a new build tool that intends to provide a leaner, faster, and more friction-less workflow for building modern web apps. This talk will dive into the project's background, rationale, technical details and design decisions: what problem does it solve, what makes it fast, and how does it fit into the JS tooling landscape.
React Compiler - Understanding Idiomatic React (React Forget)
React Advanced Conference 2023React Advanced Conference 2023
33 min
React Compiler - Understanding Idiomatic React (React Forget)
Top Content
React provides a contract to developers- uphold certain rules, and React can efficiently and correctly update the UI. In this talk we'll explore these rules in depth, understanding the reasoning behind them and how they unlock new directions such as automatic memoization. 
How Bun Makes Building React Apps Simpler & Faster
React Day Berlin 2022React Day Berlin 2022
9 min
How Bun Makes Building React Apps Simpler & Faster
Bun’s builtin JSX transpiler, hot reloads on the server, JSX prop punning, macro api, automatic package installs, console.log JSX support, 4x faster serverside rendering and more make Bun the best runtime for building React apps
The Inner Workings of Vite Build
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
The Inner Workings of Vite Build
Vite unbundled ESM dev server and fast HMR are game-changing for DX. But Vite also shines when building your production applications.This talk will dive into how the main pieces fit together to bundle and minify your code:Vite build as an opinionated Rollup setup.How esbuild is used as a fast TS and JSX transpile and a minifier.The production plugins pipeline.Modern frameworks (Nuxt, SvelteKit, Astro, among others) have chosen Vite, augmenting the DX and optimizations for their target use case.We'll discover Vite as a polished and extendable toolkit to craft optimized modern apps.

Workshops on related topic

Using CodeMirror to Build a JavaScript Editor with Linting and AutoComplete
React Day Berlin 2022React Day Berlin 2022
86 min
Using CodeMirror to Build a JavaScript Editor with Linting and AutoComplete
Top Content
WorkshopFree
Hussien Khayoon
Kahvi Patel
2 authors
Using a library might seem easy at first glance, but how do you choose the right library? How do you upgrade an existing one? And how do you wade through the documentation to find what you want?
In this workshop, we’ll discuss all these finer points while going through a general example of building a code editor using CodeMirror in React. All while sharing some of the nuances our team learned about using this library and some problems we encountered.
Building a Hyper Fast Web Server with Deno
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Matt Landers
Will Johnston
2 authors
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
Database Workflows & API Development with Prisma
Node Congress 2022Node Congress 2022
98 min
Database Workflows & API Development with Prisma
WorkshopFree
Nikolas Burk
Nikolas Burk
Prisma is an open-source ORM for Node.js and TypeScript. In this workshop, you’ll learn the fundamental Prisma workflows to model data, perform database migrations and query the database to read and write data. You’ll also learn how Prisma fits into your application stack, building a REST API and a GraphQL API from scratch using SQLite as the database.
Table of contents:
- Setting up Prisma, data modeling & migrations- Exploring Prisma Client to query the database- Building REST API routes with Express- Building a GraphQL API with Apollo Server
Building a GraphQL-native serverless backend with Fauna
GraphQL Galaxy 2021GraphQL Galaxy 2021
143 min
Building a GraphQL-native serverless backend with Fauna
WorkshopFree
Rob Sutter
Shadid Haque
2 authors
Welcome to Fauna! This workshop helps GraphQL developers build performant applications with Fauna that scale to any size userbase. You start with the basics, using only the GraphQL playground in the Fauna dashboard, then build a complete full-stack application with Next.js, adding functionality as you go along.

In the first section, Getting started with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.

In the second section, Building with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.
Building GraphQL APIs With The Neo4j GraphQL Library
GraphQL Galaxy 2021GraphQL Galaxy 2021
175 min
Building GraphQL APIs With The Neo4j GraphQL Library
WorkshopFree
William Lyon
William Lyon
This workshop will explore how to build GraphQL APIs backed Neo4j, a native graph database. The Neo4j GraphQL Library allows developers to quickly design and implement fully functional GraphQL APIs without writing any resolvers. This workshop will show how to use the Neo4j GraphQL Library to build a Node.js GraphQL API, including adding custom logic and authorization rules.

Table of contents:
- Overview of GraphQL and building GraphQL APIs
- Building Node.js GraphQL APIs backed a native graph database using the Neo4j GraphQL Library
- Adding custom logic to our GraphQL API using the @cypher schema directive and custom resolvers
- Adding authentication and authorization rules to our GraphQL API