Hello and welcome to my lightning talk about graphql subscriptions with Kafka and Debezium. My name is Nils and I'm a freelance software developer from Hamburg in Germany.
Let's have a look at this image here. We have three clients and we have a service that provides a graphql api. Client number two and client number three send subscriptions to the service to get informed about new customers. When client number one sends a mutation to add a new customer, our service and our graphql api can send events to client number two and three informing them about new customers.
In real life, this setup might be a little bit more complex because we might have more than one instance of the same service, like in this case. In this case, client number two sends the subscription request to service instance number one, while client number three sends it requests to service instance number two. Now when client number one executes the mutation in service instance number one, service instance number one can inform client number two about the new customer, but unfortunately, client number three does not receive an event because service instance number two does not know anything about the new added customer, about the executed mutation. To solve this problem, service instance number one must inform service instance two about things that happens, like the mutation.
We can solve this problem by adding a message broker like Apache Kafka to our deployment. In this case, client one still sends a mutation to service instance number one, but service instance one, instead of sending the subscription directly to client two, sends a message to the message broker. The message contains the information about the new customer and both service instance one and two are listening to this message from the message broker. When they receive the message, they can send out the subscription data to both their connected clients two and three. Both clients are happy now.
In real life, things are a little bit more complex because we are writing data to a database. In this case, service instance one and two should write to the same database. And when the service instance one wrote something to the database, still the message will be sent to Apache Kafka and both clients two and three get informed about the new customer. But in real life, things can go wrong. For example, after committing the new customer, service instance number one is not able to send a message to Kafka for whatever reason. In that case, none of the clients will receive an event. Also what can happen is that we have another application that writes directly to the database so that service instance number one does not know about these changes and thus cannot send a message to the message broker. And again, client two and three are not informed about the change to our data.
To solve this kind of problems, we can add a change data capture tool like Debezium to our tool stack. A change data capture tool reads everything that happens in your database like inserts, updates, and deletes, and writes events for these actions to a message broker.
In case of Debezium, Debezium publishes change events to Apache Kafka. A Debezium change event might look like this. It has a source attribute where the table, for example, is set. It has an operation like update, delete, or insert that describes what has happened in the database. And it has the before and after data. In this case, the before and after data of an update operation.
Our architecture with Debezium would look like this. Client one still sends a mutation directly to service instance one. Service instance number one writes the new customer to the database. Or another application writes something to the database. And in both cases, Debezium picks up the changes directly from your database and sends a CDC event message to the connected message broker. Both service instance number one and number two receive this CDC, change data capture events, can interpret this events, and send subscription data via the graphql api to client number two and client number three. And both clients are happy now.
Thanks to the delivery guarantees that Debezium and Apache Kafka gives us, we can be sure that any change in the
database, any update, insert, delete, will be published to Kafka and will be received by our service instance so that we can be sure that we can send a subscription for any change in the
database for whatever reason the
database has been changed.
If you want to try out this yourself, I built a small sample application built with
graphql Java and Spring for
graphql. You find the source code in the GitHub repository in the URL below. By the way, this technology stack we can not only use for subscriptions, but I think also for queries. We could build a dedicated read model
database for our
graphql api. Imagine we have a list of
microservices, each connected to their own
database. Using Debezium and Apache Kafka, we can pick all changes to all databases and build a dedicated optimized
database only for our
graphql api. The
graphql api then can read the
data from this specific
database and does not need to query all the
microservices to get the
data that is requested in a
graphql query. So far, thanks a lot for today. See you and have fun building
graphql APIs.