Getting Real with NodeJS and Kafka: An Event Stream Tale

Rate this content
Bookmark

In this lightning talk we will review some basic principles of event based systems using NodeJS and Kafka and get insights of real mission critical use cases where the default configurations are not enough. We will cover quick tips and tricks and good practices of Kafka with NodeJS using KafkaJS library and see some real code at a lightning speed.

8 min
16 Jun, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

This lightning talk introduces distributed computing and discusses the challenges, patterns, and solutions related to using Kafka for event sharing. It emphasizes the importance of separating services and using strong typing to avoid broken messages. The talk also covers Kafka's transaction configuration and guarantees, highlighting the need for proper configuration and the use of transaction IDs. Overall, it provides valuable insights into scaling companies, big data, and streaming platforms.

Available in Español

1. Introduction to Distributed Computing and Bitvavo

Short description:

Hello, everyone. This is a lightning talk where I'll be discussing distributed computing, scaling companies, big data, and streaming platforms. I live in the Netherlands and write technical blog posts. I work for Bitvavo, the biggest crypto exchange in the Netherlands, with a mission to bring the opportunity to trade crypto for everyone.

Hello, everyone. I hope this fits. We had some technical challenge, but let's go. This is a lightning talk. So I'll be very fast.

I spent a lot of time, much more than the talk, thinking about what can I say in this short time that will help you at least to go out from here that feels like, OK, I learned something, or maybe he made me think about something. So my background is distributed computing. So I work with scaling companies usually, and helping systems to really scale. Big data, I worked a lot. Streaming platforms, it's my bread and broth for the past like seven years. And currently, I live in the Netherlands for seven years, from Brazil. And I write a technical blog post. Currently this one, I had another three or four different places where I used to write. But you can find my most recent articles there, usually talking about Kafka, about Kubernetes, mostly back end, in my case. I work for Bitvavo. It's the biggest crypto exchange in the Netherlands. So if you are into crypto or want to be into crypto, it's as quick as clicking a button like you see there. And that's the goal from the company. It's really to bring the opportunity to trade crypto for everyone. And that's what we're doing. That's our mission.

2. Challenges and Solutions with Kafka Event Sharing

Short description:

In this section, I will discuss the challenges, common mistakes, patterns, and solutions related to using Kafka to share events between systems. It is important to separate services in a global platform to avoid reliance on databases. Sending events as JSON can be convenient, but without a contract, broken events can disrupt the system. Kafka's event queue can lead to a system halt when a message cannot be processed, resulting in a poison pill.

So what I'm going to try to talk in this short time, I'm going to talk a bit about this world where we live. Many of us, I'm sure a lot of you, use Kafka to share events between systems. And this is a requirement, of course, because in the current world that we go global with our platforms and your applications, we cannot be reliant on the database. So we really need to separate our services, right?

And I'm going to talk about a few challenges, common mistakes and patterns that we use, and solutions for that. So this is a normal services architecture. You can call it microservices. It really depends where you are, how you do it. It doesn't matter. The important thing here is that you have multiple databases, multiple data sources. You are integrating things through Kafka. And that's a common pattern, more and more. I bet many of you have this.

And a common way to do it, and I've seen this especially on the TypeScript, JavaScript world, is that you send events using JSON. That's very easy because everything is JSON, but the problem is you don't have really a contract, right? If you send events to a JSON, with the producers, the sending side might send something that's actually broken, or other producers might send it, and the consumer starts processing that and breaks up. And the way Kafka works is a queue of events. If you cannot process a message, it doesn't go forward in processing those messages. And then suddenly you are stuck, and your whole system stops because you have what we call a poison pill.

3. Avoiding Broken Messages with Dead Letter Queue

Short description:

To avoid broken messages, apply a pattern called dead letter queue and use strong typing. Guarantee that the message type is correct on the producer side to prevent consumer breakage. Implement a dead letter queue approach to handle broken messages and manually use offset committing.

So how do you avoid that? It's quite actually simple. You apply a pattern called dead letter queue and use other schemas to actually define a schema. So it's a strong type.

So now your messages have to comply on the producer side with a specific typing system. So you guarantee that the type is not going to go wrong for that specific topic, and then your consumers will not break for that case. And you can use a dead letter queue approach that if you start consuming a message and it's broken, instead of getting that cycle forever, you try a couple of times, if it doesn't work, you push that to a different queue, and you go to the next message. And for that you might need to use offset committing manually, and I'm gonna go through that and show some because there is a timer here. It's really scary. I have to go fast.

4. Kafka Transaction Configuration and Guarantees

Short description:

Kafka offers exactly one semantics and transaction boundaries for processing messages in a single transaction. To ensure message integrity, configure Kafka to set idepotence to true on the producer side and disable auto-commit offset on the consumer side. Use a transaction ID to establish boundaries and guarantee atomicity. Keep in mind that Kafka transactions are not distributed transactions like XA transactions. Proper configuration of cluster partitions and in-sync replicas is crucial. Start, send, and commit transactions to ensure message processing or rollback. For more information, refer to the blog post and try the provided docker compose with multiple Kafka nodes for local experimentation.

So think about this standard producer and consumer, if you just use Kafka.js or any client that you decide to use. The normal producer and normal consumers, they don't have strong guarantees. What I mean by that, it doesn't guarantee to you that you're going to send a single message on the producer's side, it might have duplicates. It's what we call at least once. That's the guarantee, which means you can have duplicates. And in some systems, you cannot do that. If you're depositing money, you should not do that. Especially with drawing money from a customer, you should not do that. And on the client's side, you can also have duplicate processing. So the defaults, if you use, that's what you can get. There are solutions for that.

So Kafka offers exactly one semantics. That's the EOS there. And transaction boundaries. And it's very common that you have a pattern like that, that you have a processor, a message that initiates as a process, you want to consume the message, do some processing and produce the message to another topic in a single transaction. So you want to make sure when you consume, do processing, something goes wrong. If you reprocess or restart your system, that you process that message again, because you didn't finalize those three steps, which is sending that message to the next step. And for these, you have to do some configuration in Kafka.

And from the producer's side, you have to set idepotence to true. That guarantees that it's not going to be duplicated and sent. And on the consumer's side, you have to disable the auto-commit offset, which means your client reads the message, but now you have the control. When you want to tell Kafka, I really processed this message, and you commit back the offset. And you want to actually set one property called max in-flight request to one, so you don't have parallel processing and you can keep the ordering guarantees. And on the producer side, on the last step, remember, it's consuming and then producing, and you want all this to be in the same transaction. What you want to set is a transaction ID, so the broker can set the transaction boundaries and say the point is true as well. And what this guarantees is that when you consume, process, and send, it's part of a single atomic transaction. Kafka client, this is not a distributed transaction, like XA transaction. It's not guaranteed if you do a database call on the side that will be rolled back. You have to take care of that. It's on the boundaries of Kafka. It's the same as your normal database transaction between two tables that you are used to, so just keep that in mind.

Some configurations of the cluster, and I'm also done because my clock is blinking already. You should have at least three partitions and at least two in sync replicas for your topics for this to work. So if you try locally and you don't have it, it will not work, and you also have to start the transaction like you can see that, start a transaction, send a message, send the offsets, and then commit the transactions, which means everything is going to happen or it will be rolled back. And that's basically what you want to do, and it's really important to take care of that. That's it. If you want to know more about this, I write in this specific blog post about this, and I also have docker compose where I have multiple Kafka nodes where you can play with this type of more advanced work in your local machine.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here. 
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Top Content
Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

Workshops on related topic

Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Workshop
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
Workshop
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.
TestJS Summit 2023TestJS Summit 2023
78 min
Mastering Node.js Test Runner
Workshop
Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.