GraphQL Subscriptions with Debezium and Kafka

Rate this content
Reacting on data changes and publishing those changes as GraphQL events with subscriptions can be hard, especially in a multi-service environment with multiple databases or when scaling your GraphQL server with multiple instances. GraphQL clients shouldn't miss events or receive them twice, no matter how your backend architecture looks like or what trouble (service goes down, database connection lost, ...) they might have when serving a subscription request.

In this talk, I will show you, how Debezium and Apache Kafka can help you building reliable subscriptions from changes in your database. Debezium is a change data capture (CDC) tool that can forward changes from a database' transaction log in to the Kafka message broker.

In my talk I will use a GraphQL backend implement in Java with "Spring for GraphQL", but as Debezium and Kafka are not tied to java the idea is usable also with other GraphQL frameworks and programming languages. You do not need to have knowledge of Java or Spring for GraphQL" to understand the talk.

7 min
08 Dec, 2022

AI Generated Video Summary

This lightning talk explores the use of GraphQL subscriptions with Kafka and Debezium. By adding a message broker like Apache Kafka and a change data capture tool like Debezium to the deployment, issues with multiple service instances and database writes can be resolved. Debezium picks up changes directly from the database and sends CDC event messages to the connected message broker, ensuring that any change in the database will be published to Kafka and received by the service instance. This technology stack can also be used for queries by building a dedicated read model database for the GraphQL API.

1. GraphQL Subscriptions with Kafka and Debezium

Short description:

Welcome to my lightning talk about GraphQL subscriptions with Kafka and Debezium. We have three clients and a service that provides a GraphQL API. When client one adds a new customer, the service can send events to clients two and three. However, there can be issues when multiple service instances are involved, or when writing data to a database. To solve these problems, we can add a message broker like Apache Kafka and a change data capture tool like Debezium to our deployment.

Hello and welcome to my lightning talk about GraphQL subscriptions with Kafka and Debezium. My name is Nils and I'm a freelance software developer from Hamburg in Germany.

Let's have a look at this image here. We have three clients and we have a service that provides a GraphQL API. Client number two and client number three send subscriptions to the service to get informed about new customers. When client number one sends a mutation to add a new customer, our service and our GraphQL API can send events to client number two and three informing them about new customers.

In real life, this setup might be a little bit more complex because we might have more than one instance of the same service like in this case. In this case, client number two sends the subscription request to service instance number one, while client number three sends its request to service instance number two. Now when client number one executes the mutation in service instance number one, service instance number one can inform client number two about the new customer. But unfortunately, client number three does not receive an event because service instance number two does not know anything about the new added customer about the executed mutation.

To solve this problem, service instance number one must inform service instance two about things that happen like the mutation. We can solve this problem by adding a message broker like Apache Kafka to our deployment. In this case, client one still sends a mutation to service instance number one. But service instance one instead of sending the subscription directly to client two, sends a message to the message broker. The message contains the information about the new customer and both service instance one and two are listening to this message from the message broker. When they receive the message they can send out the subscription data to both their connected clients two and three. Both clients are happy now.

In real life, things are a little bit more complex because we are writing data to a database. In this case, service instance one and two should write to the same database, and when service instance one wrote something to the database, still the message will be sent to Apache Kafka and both clients two and three get informed about the new customer. But in real life, things can go wrong. For example, after committing the new customer, service instance number one is not able to send a message to Kafka for whatever reason. In that case, none of the clients will receive an event. Also, what can happen is that we have another application that writes directly to the database so that service instance number one does not know about these changes and thus cannot send a message through the message broker. And again, client two and three are not informed about the change to our data.

To solve this kind of problems, we can add a change data capture tool like Debezium to our tool stack. A change data capture tool reads everything that happens in your database like inserts, updates, and deletes and writes events for these actions to a message broker. In the case of Debezium, Debezium publishes change events to Apache Kafka. A Debezium change event might look like this. It has a source attribute where the table, for example, is set. It has an operation like update, delete, or insert that describes what has happened in the database, and it has the before and after data.

2. Architecture with Debezium and Kafka

Short description:

In this case, Debezium picks up changes directly from the database and sends CDC event messages to the connected message broker. The service instances receive these events, interpret them, and send subscription data to the clients. Thanks to Debezium and Apache Kafka, we can be sure that any change in the database will be published to Kafka and received by our service instance. We can also use this technology stack for queries by building a dedicated read model database for our GraphQL API.

In this case, the before and after data of an update operation. Our architecture with Debezium would look like this. Client one still sends the mutation directly to service instance one. Service instance number one writes the new customer to the database or another application writes something to the database.

And in both cases, Debezium picks up the changes directly from your database and sends a CDC event message to the connected message broker. Both service instance number one and number two receive this CDC change data capture events, can interpret this events, and send subscription data via their GraphQL API to client number two and client number three. And both clients are happy now.

Thanks to the delivery guarantees that Debezium and Apache Kafka gives us, we can be sure that any change in the database, any update, insert, delete, will be published to Kafka and will be received by our service instance so that we can be sure that we can send a subscription for any change in the database for whatever reason the database has been changed.

If you want to try out this yourself, I built a small sample application built with GraphQL Java and Spring for GraphQL. You find the source code in the GitHub repository in the URL below.

By the way, this technology stack we can not only use for subscriptions, but I think also for queries. We could build a dedicated read model database for our GraphQL API. Imagine we have a list of microservices each connected to their own database. Using Debezium and Apache Kafka, we can pick all changes to all databases and build a dedicated optimized database only for our GraphQL API. The GraphQL API then can read the data from this specific database and does not need to query all the microservices to get the data that is requested in a GraphQL query.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

GraphQL Galaxy 2021GraphQL Galaxy 2021
32 min
From GraphQL Zero to GraphQL Hero with RedwoodJS
We all love GraphQL, but it can be daunting to get a server up and running and keep your code organized, maintainable, and testable over the long term. No more! Come watch as I go from an empty directory to a fully fledged GraphQL API in minutes flat. Plus, see how easy it is to use and create directives to clean up your code even more. You're gonna love GraphQL even more once you make things Redwood Easy!
Vue.js London Live 2021Vue.js London Live 2021
24 min
Local State and Server Cache: Finding a Balance
How many times did you implement the same flow in your application: check, if data is already fetched from the server, if yes - render the data, if not - fetch this data and then render it? I think I've done it more than ten times myself and I've seen the question about this flow more than fifty times. Unfortunately, our go-to state management library, Vuex, doesn't provide any solution for this.For GraphQL-based application, there was an alternative to use Apollo client that provided tools for working with the cache. But what if you use REST? Luckily, now we have a Vue alternative to a react-query library that provides a nice solution for working with server cache. In this talk, I will explain the distinction between local application state and local server cache and do some live coding to show how to work with the latter.
GraphQL Galaxy 2022GraphQL Galaxy 2022
29 min
Rock Solid React and GraphQL Apps for People in a Hurry
In this talk, we'll look at some of the modern options for building a full-stack React and GraphQL app with strong conventions and how this can be of enormous benefit to you and your team. We'll focus specifically on RedwoodJS, a full stack React framework that is often called 'Ruby on Rails for React'.
GraphQL Galaxy 2022GraphQL Galaxy 2022
16 min
Step aside resolvers: a new approach to GraphQL execution
Though GraphQL is declarative, resolvers operate field-by-field, layer-by-layer, often resulting in unnecessary work for your business logic even when using techniques such as DataLoader. In this talk, Benjie will introduce his vision for a new general-purpose GraphQL execution strategy whose holistic approach could lead to significant efficiency and scalability gains for all GraphQL APIs.

Workshops on related topic

GraphQL Galaxy 2021GraphQL Galaxy 2021
140 min
Build with SvelteKit and GraphQL
Featured WorkshopFree
Have you ever thought about building something that doesn't require a lot of boilerplate with a tiny bundle size? In this workshop, Scott Spence will go from hello world to covering routing and using endpoints in SvelteKit. You'll set up a backend GraphQL API then use GraphQL queries with SvelteKit to display the GraphQL API data. You'll build a fast secure project that uses SvelteKit's features, then deploy it as a fully static site. This course is for the Svelte curious who haven't had extensive experience with SvelteKit and want a deeper understanding of how to use it in practical applications.

Table of contents:
- Kick-off and Svelte introduction
- Initialise frontend project
- Tour of the SvelteKit skeleton project
- Configure backend project
- Query Data with GraphQL
- Fetching data to the frontend with GraphQL
- Styling
- Svelte directives
- Routing in SvelteKit
- Endpoints in SvelteKit
- Deploying to Netlify
- Navigation
- Mutations in GraphCMS
- Sending GraphQL Mutations via SvelteKit
- Q&A
React Advanced Conference 2022React Advanced Conference 2022
95 min
End-To-End Type Safety with React, GraphQL & Prisma
Featured WorkshopFree
In this workshop, you will get a first-hand look at what end-to-end type safety is and why it is important. To accomplish this, you’ll be building a GraphQL API using modern, relevant tools which will be consumed by a React client.
Prerequisites: - Node.js installed on your machine (12.2.X / 14.X)- It is recommended (but not required) to use VS Code for the practical tasks- An IDE installed (VSCode recommended)- (Good to have)*A basic understanding of Node.js, React, and TypeScript
GraphQL Galaxy 2022GraphQL Galaxy 2022
112 min
GraphQL for React Developers
Featured Workshop
There are many advantages to using GraphQL as a datasource for frontend development, compared to REST APIs. We developers in example need to write a lot of imperative code to retrieve data to display in our applications and handle state. With GraphQL you cannot only decrease the amount of code needed around data fetching and state-management you'll also get increased flexibility, better performance and most of all an improved developer experience. In this workshop you'll learn how GraphQL can improve your work as a frontend developer and how to handle GraphQL in your frontend React application.
React Summit 2022React Summit 2022
173 min
Build a Headless WordPress App with Next.js and WPGraphQL
In this workshop, you’ll learn how to build a Next.js app that uses Apollo Client to fetch data from a headless WordPress backend and use it to render the pages of your app. You’ll learn when you should consider a headless WordPress architecture, how to turn a WordPress backend into a GraphQL server, how to compose queries using the GraphiQL IDE, how to colocate GraphQL fragments with your components, and more.
GraphQL Galaxy 2020GraphQL Galaxy 2020
106 min
Relational Database Modeling for GraphQL
In this workshop we'll dig deeper into data modeling. We'll start with a discussion about various database types and how they map to GraphQL. Once that groundwork is laid out, the focus will shift to specific types of databases and how to build data models that work best for GraphQL within various scenarios.
Table of contentsPart 1 - Hour 1      a. Relational Database Data Modeling      b. Comparing Relational and NoSQL Databases      c. GraphQL with the Database in mindPart 2 - Hour 2      a. Designing Relational Data Models      b. Relationship, Building MultijoinsTables      c. GraphQL & Relational Data Modeling Query Complexities
Prerequisites      a. Data modeling tool. The trainer will be using dbdiagram      b. Postgres, albeit no need to install this locally, as I'll be using a Postgres Dicker image, from Docker Hub for all examples      c. Hasura
GraphQL Galaxy 2021GraphQL Galaxy 2021
48 min
Building GraphQL APIs on top of Ethereum with The Graph
The Graph is an indexing protocol for querying networks like Ethereum, IPFS, and other blockchains. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.

In this workshop you’ll learn how to build a subgraph that indexes NFT blockchain data from the Foundation smart contract. We’ll deploy the API, and learn how to perform queries to retrieve data using various types of data access patterns, implementing filters and sorting.

By the end of the workshop, you should understand how to build and deploy performant APIs to The Graph to index data from any smart contract deployed to Ethereum.