JavaScript conferences

JSNation 2022

English version

Getting Real with NodeJS and Kafka: An Event Stream Tale

Marcos Maia

Staff Engineer @ Bitvavo

In this lightning talk we will review some basic principles of event based systems using NodeJS and Kafka and get insights of real mission critical use cases where the default configurations are not enough. We will cover quick tips and tricks and good practices of Kafka with NodeJS using KafkaJS library and see some real code at a lightning speed.

node.js backend

Marcos Maia

8 min

16 Jun, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

This lightning talk introduces distributed computing and discusses the challenges, patterns, and solutions related to using Kafka for event sharing. It emphasizes the importance of separating services and using strong typing to avoid broken messages. The talk also covers Kafka's transaction configuration and guarantees, highlighting the need for proper configuration and the use of transaction IDs. Overall, it provides valuable insights into scaling companies, big data, and streaming platforms.

Available in Español: Descubriendo NodeJS y Kafka: Un Cuento de Transmisión de Eventos

1. Introduction to Distributed Computing and Bitvavo

Short description:

Hello, everyone. This is a lightning talk where I'll be discussing distributed computing, scaling companies, big data, and streaming platforms. I live in the Netherlands and write technical blog posts. I work for Bitvavo, the biggest crypto exchange in the Netherlands, with a mission to bring the opportunity to trade crypto for everyone.

Hello, everyone. I hope this fits. We had some technical challenge, but let's go. This is a lightning talk. So I'll be very fast.

I spent a lot of time, much more than the talk, thinking about what can I say in this short time that will help you at least to go out from here that feels like, OK, I learned something, or maybe he made me think about something. So my background is distributed computing. So I work with scaling companies usually, and helping systems to really scale. Big data, I worked a lot. Streaming platforms, it's my bread and broth for the past like seven years. And currently, I live in the Netherlands for seven years, from Brazil. And I write a technical blog post. Currently this one, I had another three or four different places where I used to write. But you can find my most recent articles there, usually talking about Kafka, about Kubernetes, mostly back end, in my case. I work for Bitvavo. It's the biggest crypto exchange in the Netherlands. So if you are into crypto or want to be into crypto, it's as quick as clicking a button like you see there. And that's the goal from the company. It's really to bring the opportunity to trade crypto for everyone. And that's what we're doing. That's our mission.

2. Challenges and Solutions with Kafka Event Sharing

Short description:

In this section, I will discuss the challenges, common mistakes, patterns, and solutions related to using Kafka to share events between systems. It is important to separate services in a global platform to avoid reliance on databases. Sending events as JSON can be convenient, but without a contract, broken events can disrupt the system. Kafka's event queue can lead to a system halt when a message cannot be processed, resulting in a poison pill.

So what I'm going to try to talk in this short time, I'm going to talk a bit about this world where we live. Many of us, I'm sure a lot of you, use Kafka to share events between systems. And this is a requirement, of course, because in the current world that we go global with our platforms and your applications, we cannot be reliant on the database. So we really need to separate our services, right?

And I'm going to talk about a few challenges, common mistakes and patterns that we use, and solutions for that. So this is a normal services architecture. You can call it microservices. It really depends where you are, how you do it. It doesn't matter. The important thing here is that you have multiple databases, multiple data sources. You are integrating things through Kafka. And that's a common pattern, more and more. I bet many of you have this.

And a common way to do it, and I've seen this especially on the TypeScript, JavaScript world, is that you send events using JSON. That's very easy because everything is JSON, but the problem is you don't have really a contract, right? If you send events to a JSON, with the producers, the sending side might send something that's actually broken, or other producers might send it, and the consumer starts processing that and breaks up. And the way Kafka works is a queue of events. If you cannot process a message, it doesn't go forward in processing those messages. And then suddenly you are stuck, and your whole system stops because you have what we call a poison pill.

3. Avoiding Broken Messages with Dead Letter Queue

Short description:

To avoid broken messages, apply a pattern called dead letter queue and use strong typing. Guarantee that the message type is correct on the producer side to prevent consumer breakage. Implement a dead letter queue approach to handle broken messages and manually use offset committing.

So how do you avoid that? It's quite actually simple. You apply a pattern called dead letter queue and use other schemas to actually define a schema. So it's a strong type.

So now your messages have to comply on the producer side with a specific typing system. So you guarantee that the type is not going to go wrong for that specific topic, and then your consumers will not break for that case. And you can use a dead letter queue approach that if you start consuming a message and it's broken, instead of getting that cycle forever, you try a couple of times, if it doesn't work, you push that to a different queue, and you go to the next message. And for that you might need to use offset committing manually, and I'm gonna go through that and show some because there is a timer here. It's really scary. I have to go fast.

4. Kafka Transaction Configuration and Guarantees

Short description:

Kafka offers exactly one semantics and transaction boundaries for processing messages in a single transaction. To ensure message integrity, configure Kafka to set idepotence to true on the producer side and disable auto-commit offset on the consumer side. Use a transaction ID to establish boundaries and guarantee atomicity. Keep in mind that Kafka transactions are not distributed transactions like XA transactions. Proper configuration of cluster partitions and in-sync replicas is crucial. Start, send, and commit transactions to ensure message processing or rollback. For more information, refer to the blog post and try the provided docker compose with multiple Kafka nodes for local experimentation.

So think about this standard producer and consumer, if you just use Kafka.js or any client that you decide to use. The normal producer and normal consumers, they don't have strong guarantees. What I mean by that, it doesn't guarantee to you that you're going to send a single message on the producer's side, it might have duplicates. It's what we call at least once. That's the guarantee, which means you can have duplicates. And in some systems, you cannot do that. If you're depositing money, you should not do that. Especially with drawing money from a customer, you should not do that. And on the client's side, you can also have duplicate processing. So the defaults, if you use, that's what you can get. There are solutions for that.

So Kafka offers exactly one semantics. That's the EOS there. And transaction boundaries. And it's very common that you have a pattern like that, that you have a processor, a message that initiates as a process, you want to consume the message, do some processing and produce the message to another topic in a single transaction. So you want to make sure when you consume, do processing, something goes wrong. If you reprocess or restart your system, that you process that message again, because you didn't finalize those three steps, which is sending that message to the next step. And for these, you have to do some configuration in Kafka.

And from the producer's side, you have to set idepotence to true. That guarantees that it's not going to be duplicated and sent. And on the consumer's side, you have to disable the auto-commit offset, which means your client reads the message, but now you have the control. When you want to tell Kafka, I really processed this message, and you commit back the offset. And you want to actually set one property called max in-flight request to one, so you don't have parallel processing and you can keep the ordering guarantees. And on the producer side, on the last step, remember, it's consuming and then producing, and you want all this to be in the same transaction. What you want to set is a transaction ID, so the broker can set the transaction boundaries and say the point is true as well. And what this guarantees is that when you consume, process, and send, it's part of a single atomic transaction. Kafka client, this is not a distributed transaction, like XA transaction. It's not guaranteed if you do a database call on the side that will be rolled back. You have to take care of that. It's on the boundaries of Kafka. It's the same as your normal database transaction between two tables that you are used to, so just keep that in mind.

Some configurations of the cluster, and I'm also done because my clock is blinking already. You should have at least three partitions and at least two in sync replicas for your topics for this to work. So if you try locally and you don't have it, it will not work, and you also have to start the transaction like you can see that, start a transaction, send a message, send the offsets, and then commit the transactions, which means everything is going to happen or it will be rolled back. And that's basically what you want to do, and it's really important to take care of that. That's it. If you want to know more about this, I write in this specific blog post about this, and I also have docker compose where I have multiple Kafka nodes where you can play with this type of more advanced work in your local machine.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Node Congress 2022

26 min

It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder

Top Content

Feross Aboukhadijeh

Feross Aboukhadijeh

Feross is the author and maintainer of WebTorrent, StandardJS, and 100s of other open source projects

Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.

security node.js

Towards a Standard Library for JavaScript Runtimes

Node Congress 2022

34 min

Towards a Standard Library for JavaScript Runtimes

Top Content

James Snell

Workers team @Cloudflare

You can check the slides for James' talk here.

javascript node.js component library

ESM Loaders: Enhancing Module Loading in Node.js

JSNation 2023

22 min

ESM Loaders: Enhancing Module Loading in Node.js

Gil Tayar

Microsoft, Israel

Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.

Out of the Box Node.js Diagnostics

Node Congress 2022

34 min

Out of the Box Node.js Diagnostics

Colin Ihrig

Member of the Node.js Technical Steering Committee

In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here.

Node.js Compatibility in Deno

Node Congress 2022

34 min

Node.js Compatibility in Deno

Bartek Iwanczuk

Bartek Iwanczuk

Deno core team member

Can Deno run apps and libraries authored for Node.js? What are the tradeoffs? How does it work? What’s next?

node.js js runtimes deno

Multithreaded Logging with Pino

JSNation Live 2021

19 min

Multithreaded Logging with Pino

Top Content

Matteo Collina

Author of Pino & Fastify, Platformatic.dev Co-Founder & CTO

Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

node.js observability

Workshops on related topic

Node.js Masterclass

Node Congress 2023

109 min

Node.js Masterclass

Top Content

Workshop

Matteo Collina

Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate

Build and Deploy a Backend With Fastify & Platformatic

JSNation 2023

104 min

Build and Deploy a Backend With Fastify & Platformatic

WorkshopFree

Matteo Collina

Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/).
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.

graphql fastify cloud node.js

0 to Auth in an Hour Using NodeJS SDK

Node Congress 2023

63 min

0 to Auth in an Hour Using NodeJS SDK

WorkshopFree

Asaf Shen

Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher

javascript authentication node.js

Building a Hyper Fast Web Server with Deno

JSNation Live 2021

156 min

Building a Hyper Fast Web Server with Deno

WorkshopFree

Matt Landers

Will Johnston

2 authors

Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.

node.js backend deno

GraphQL - From Zero to Hero in 3 hours

React Summit 2022

164 min

GraphQL - From Zero to Hero in 3 hours

Workshop

Pawel Sawicki

How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.

web development graphql node.js beginner friendly

Mastering Node.js Test Runner

TestJS Summit 2023

78 min

Mastering Node.js Test Runner

Workshop

Marco Ippolito

Node.js test runner is modern, fast, and doesn't require additional libraries, but understanding and using it well can be tricky. You will learn how to use Node.js test runner to its full potential. We'll show you how it compares to other tools, how to set it up, and how to run your tests effectively. During the workshop, we'll do exercises to help you get comfortable with filtering, using native assertions, running tests in parallel, using CLI, and more. We'll also talk about working with TypeScript, making custom reports, and code coverage.

typescript node.js testing

Follow us

Upcoming events

Korben
Dallasvisa@gitnation.org

Want to have access to all events for 4x less?

JSNation US 2024

November 18 - 21, 2024

React Summit US 2024

November 18 - 22, 2024

React Advanced Conference 2024

October 25 - 28, 2024

Productivity Conference 2024

November 7 - 8, 2024

React Day Berlin 2024

December 13 - 16, 2024

Node Congress 2025

February, 2025

JSNation 2025

June, 2025

React Summit 2025

June, 2025

C3 Dev Festival 2025

June, 2025

TechLead Conference 2025

June, 2025

React Advanced Conference 2025

October, 2025

JSNation US 2025

November, 2025

React Summit US 2025

November, 2025

TestJS Summit 2025

November, 2025

React Day Berlin 2025

December, 2025