GraphQL exposes application data as a graph which can introduce challenges if your backend isn't graph-ready (think slow JOINs as a result of nested GraphQL queries and the dreaded n+1 query problem). The Neo4j GraphQL library enables developers to build GraphQL APIs backed a native graph database using only GraphQL type definitions. In this talk we'll see how to build a GraphQL API without writing any resolvers, add custom logic, and deploy to the cloud.
Putting the Graph In GraphQL With The Neo4j GraphQL Library
From:

GraphQL Galaxy 2021
Transcription
Hi everyone, the title of this talk is putting the graph in graphql with the Neo4j graphql library. You can find the slides for this at dev.neo4j.com slash graphql dash galaxy. So my name is Will, I work for a company called Neo4j, which is a graph database. We'll talk about what that is in just a minute. Best way to get ahold of me online is probably on Twitter, which I've linked there as well as my personal site with a blog and a newsletter that I publish. I also host the graphstuff.fm podcast, so if you like podcasts and graph technology, definitely check out graphstuff.fm. So Neo4j is a graph database that is similar to other databases that you may be familiar with like relational databases or document databases, but the biggest difference is that the data model is a graph. So nodes are the entities, relationships connect them, and with Neo4j, we use a query language called Cypher, which we'll take a little bit of a look at today as well. There are lots of interesting things that we can do with graph databases and implications for their different performance optimizations from other databases. But of course, what we're gonna talk about today is building graphql APIs backed by a graph database. So really what I wanna talk about today is the graph part of graphql. So fundamentally a graph is a data structure where nodes, these are the entities in the graph and relationships, connect nodes. To work with data in the graph, we often use a data model called the property graph, where we add labels to nodes that describe the type of thing that they are and key value pair attributes that describe the actual data that we're working with. You also might hear about knowledge graphs. So knowledge graphs, I think are really an implementation of a property graph for a specific domain that put things in context is how I like to think about it. Google, when they announced the Google knowledge graph api, published this blog post called things not strings, but I think did a really good job of explaining what a knowledge graph is, how we can work with the data in a knowledge graph. So in this example, we have a graph of news articles and their topics and the geo regions that are mentioned in these articles. So when we're looking at an article node, we know what it is, we know the attributes of it, but we also have the context around it. We know what geo region it's referring to, we know what people and topic the article is about as well. So I pulled some data down from the New York times api and built a knowledge graph of this in Neo4j. So we have information about articles, topics, people mentioned in the articles, this sort of thing. And I thought this was interesting data set for a number of different reasons and a number of different applications. But what I want to use this today is to talk about different ways to query this news graph if I'm interested in building an application. So I could use graphql to query this news graph. In fact, there's a graphql api at news-graph.vercell.app that we can use to write graphql queries. In this case, we're searching for specific geo regions within a hundred kilometers of a certain latitude and longitude. What are the articles mentioning that region? What are the topics of those articles? We could also query this news graph using Cypher, the query language that we use with graph databases like Neo4j. This is a screenshot from Neo4j browser, which is like a query workbench for working with Neo4j. So you can see we've written a Cypher query here, searching for a specific article, something to do about cryptocurrency in Ukraine. Then we're looking for the topics in other articles that share those same topics. And you can see with Cypher, we draw this sort of ASCII art representation of the graph pattern that we want to work with. And there's other tools for querying the news graph as well. We could use visual tools, for example, Neo4j Bloom, which is a graph exploration tool. All of these are querying the same data, this news graph, but in sort of different modalities and in different contexts. So let's zero in on comparing Cypher and graphql in the context of this news graph data. So this is a question that comes up a lot. What's the difference between Cypher and graphql? They both seem to do something with graphs. Well, fundamentally Cypher is a graph database query language. As I said earlier, very much focused on declarative pattern matchings. We draw these ASCII art representations of the graph to describe the pattern we want to work with. Cypher has lots of functionality that we would expect in a database query language. So things like working with aggregations, math functions, database operations, like creating indexes, importing data from CSV format, and then lots of graph specific operations like the concept of variable length paths, node and relationship functions, these sorts of things. If we compare that with graphql, graphql is very much a query language designed for working with APIs. So we have a type system that describes exactly the data that's available to the client, how it's connected. This is the data graph. And then to describe traversals through the data graph, we define a selection set in graphql. So let's look at some examples. So let's say I want to see all of the articles in the news graph. In Cypher, I write a ASCII art like pattern, parentheses represent nodes, in this case, find all the article nodes and return them. With graphql, in my selection set, I start with the articles query field and then describe the fields of the articles that I want to return. What if I want to see the 10 most recent articles? Well, Cypher has the functionality for ordering, limiting, skipping for basic pagination. This isn't built into graphql, but we can work with these things as field arguments. So perhaps our articles query field has a sort order argument and a limit argument that allows us to accomplish the same thing. What if I want to see the 10 most articles and their topics? Well, in Cypher, I add a more complex graph pattern. So we can see here first we're matching on all of the articles and returning the first 10 by date published. Then we have another graph pattern where we're traversing out from this article node along this has topic relationship to the topic nodes and returning both of those. And now we see the 10 most recent articles and their connected topics. In graphql, we would just add to our selection set to describe this traversal now from the articles to the topics. We're starting to create a nested selection set here. But if I also want not only the 10 most recent articles, their topics, but also what are other articles in those topics, well, in Cypher, I just add on to my graph pattern. So now I want to traverse along this has topic relationship again to find articles that share similar topics. And in graphql, I add to my nested selection set now going from the topics to the articles and in this case returning the title of those articles. But what if I want something like finding the shortest path in the graph between two nodes? In this case, the National Park Service and the FAA? Well, Cypher has shortest path functionality and variable length path functionality built into Cypher. So I can say find the shortest path connecting these two organizations, in this case, following any relationships. So with this asterisk in brackets there, it's saying sort of follow any number of relationships to find the shortest path. And I can find it through a couple of articles about labor shortages that both of these organizations are facing. This functionality isn't really built into graphql. So graphql doesn't have a sort of a native way to express this idea of a shortest path. So we could certainly implement this functionality and expose it through certain fields in graphql. But it's not something built in. What about recommended articles? So a lot of news sites, as I'm reading an article, they show me something like here are other articles you may be interested in, this sort of thing. Well, in Cypher, there are lots of ways to express those sorts of things. I could look at other articles that similar users are viewing. I could look at an overlap of topics. I could look at the overlap of geo regions based on my reading history, this sort of thing. So here we're describing a traversal through the graph from an article that I'm reading to articles that have either the same author or similar topics or are about similar geo regions. graphql doesn't really expose these concepts. And again, we could implement a sort of recommended field on the article type in our graphql api to expose this. But again, it's not something that is inherently built in. So graphs, I think, are really everywhere around us in different technologies. We see it in graphql. We see it in graph databases. But I think a question comes up for developers is knowing when to leverage the right graph technology at the right time. So let's say, for example, we're building a react application that is going to be the front end for our news organization. So we want to show articles. We want users to be able to log in, to save articles, to view recommended articles, this sort of thing. We've talked about how to query this news graph in Cypher directly from the database or graphql. How should we sort of structure the basic architecture of our app to do this? Well, looking at these examples that we saw, we don't really want to expose the database to our clients' applications and have them free to sort of query whatever they want in the database. Instead, that's where graphql really shines, that we can give this all of the benefits of graphql to our client application, but also have this layer that sits between the client and the database where we're able to add things like authorization, caching, custom logic, these sorts of things. So our architecture looks something more like this, where our react application is querying a graphql api. Maybe this is deployed as serverless functions or edge workers, something like that. And then our api layer is the layer actually going out to the database. So to make this type of application architecture easier to build, Neo4j has released the Neo4j graphql library. So this is a javascript library for building node.js graphql APIs backed by Neo4j. There are a lot of really powerful features in the graphql library. Let's go through a couple of those. So one is this idea of graphql first development. So we start with our graphql type definitions that defines the data model that we're working with, and then the Neo4j graphql library will use that to drive the data model for the database and the api. So I don't need to maintain two separate schemas, one for the database, one for the api. Everything is driven from these graphql type definitions. The Neo4j graphql library will take those type definitions and then generate a full CRUD graphql api with create, read, update, delete operations for each type declared in the schema. Of course, there's a lot that can be configured in what is generated, but by default, we get query mutation fields for each one of our types, ordering, pagination, relay connection, pagination, complex filtering, as well as the geo and date types that are supported natively in the database. And this is how we built that news graph graphql api that I linked earlier. Now one of the really powerful features of the Neo4j graphql library is generating database queries. So what this means is that for any arbitrary graphql request, a single database query is generated by the library. So as a developer, I don't need to implement resolvers. I simply need to define my type definitions and then the library will generate a single database query at query time. And this is great for developer productivity because I don't have to build these resolvers, but also for performance, this basically solves the N plus one query problem where I need to think about batching or caching in my graphql implementation so that I'm not making multiple round trips to the database. Well, instead I can just rely that a single database query is generated, sent to the database, and the database is going to optimize how to handle that query. So we've talked about the CRUD functionality that is generated for us. What about custom logic? How do we add that? Well, this is probably my favorite feature of the Neo4j graphql library, and this is the Cypher graphql schema directive. So schema directives are graphql's built-in extension mechanism. So with directives, I can indicate that there's some custom logic that should happen on the server. And there are a lot of different directives that are available for configuring our schema with the Neo4j graphql library, but the Cypher graphql schema directive I think is the most powerful because it allows us to basically annotate fields in our graphql api with Cypher queries. So here we've added a similar field to the article type. So this is kind of like that recommendation query we saw earlier. If you're reading this article, what are similar articles you might be interested in? And in this case, we're using Graph data Science, Jaccard similarity to find similar articles based on topics. So this is super powerful. We can basically expose any of the functionality of Cypher through graphql using the Cypher schema directive. So that's the basic functionality of the Neo4j graphql library. There's lots of other interesting things in there. There's a super powerful authorization model. I mentioned relay cursor pagination as well as working with unions and interfaces. So lots of interesting, powerful things in the Neo4j graphql library. Let's take a quick look at some code. So this is a link to Code Sandbox. It was just pulling from this GitHub repo. So you can find all the code on GitHub or this Code Sandbox. But let's take a quick look to see what's going on here. So this is our index.js file. We're just pulling in some dependencies and reading from this schema.graphql file, which is our graphql type definitions. We're passing that, those type definitions to the Neo4j graphql library. We create a Neo4j driver instance just to create a connection to the database. And then we pass that schema that we created with the Neo4j graphql library to apollo server, which is handling our serving our graphql api. If we look at our type definitions, this is basically where the interesting bits are. So we've defined types for article, author, topic. So all those nodes that we saw in our news graph and how they are connected. So we didn't write any resolvers. We've just basically defined our type definitions. And here's graphql playground. Here's a graphql query that I'm going to run. The searching for 100 most recent articles and authors, photos, geos. So basically everything connected to those articles. And we can see the results that come in. One thing I want to point out here is we can see the generated database query that's logged to the console here. So that's generated at query time for any arbitrary graphql request. So as we modify the graphql query, maybe if we're only querying for articles, we are only going to see in the generated database query that we're only fetching article nodes. So super powerful for developer productivity to get a graphql api up and running without writing any resolvers. And we're sort of leveraging all the power of the graph model with the Neo4j graphql library. And again, all the code for that is linked on GitHub. As a bit of an aside, you may be wondering how these graphql database integrations work under the hood, how they are able to generate database queries from a graphql request. And the answer is inside every resolver, one of the arguments passed is the resolve info object that contains a lot of information about the graphql schema and the currently resolving graphql operation. So here you can see sort of all the things that are in the resolve info object. And so basically what these database integrations do is inspect this resolve info object, look at the nested selection set for the query, and essentially iterate through that and generate a database query at the root resolver, which is a super powerful pattern. I gave a talk at graphql Summit about this a while ago. So the recording is there if you're interested in digging into that in a bit more detail. Great. Well, I think that's all the time we have for today. I want to end on just talking about a few resources if you're interested in learning more. So one place that's good to start is the Neo4j graphql landing page that has links to documentation, examples, as well as a Graph Academy, which is a self-paced online training that goes into a lot more detail, all focused on building graphql APIs with Neo4j. For trying out Neo4j, the best place I think to go is the Neo4j Sandbox. It allows us to spin up Neo4j instances with preloaded datasets. We also have graphql exposed through links to prebuilt code sandbox examples as well in Neo4j Sandbox. I also want to mention that Neo4j is hiring, but specifically the graphql team at Neo4j is hiring and looking to grow. So if this sounds like interesting things to work on, definitely please reach out. You can find the postings on our job site or just email graphql.neo4j.com. So thanks so much for joining us today. And again, please reach out to me on Twitter if you'd like to follow up. Cheers. Are you using graphql in production was the question and 60% of our audience is saying yes. That's amazing. Were you expecting so many people that already are using graphql at a conference or is this not what you were expecting at all? Yeah, I guess that makes sense. I mean, since we're at a graphql focused conference, I guess there's kind of like two personas of people that are interested in a graphql conference, right? It's like the, I'm using graphql in production. I'm ready to kind of like level up and think about like scale and these sorts of things and the advanced things and the other kind of like, well, I know about graphql. It's something we're thinking about. So I kind of want to learn the more introductory type things. So yes, I guess I, for a graphql focused conference, I guess maybe I was thinking it would be a little more than that, but I guess that shows there's a good mix of like both of those kinds of personas, right? Like people that are ready to scale up and in the introductory sort of persona. So yeah, I get that. I guess one thing I've really noticed, I think in the graphql community overall, looking at this in the last, I don't know, few years that I've been kind of involved in working with graphql is it seems like the graphql community is really maturing. If you look at the types of tooling and I guess maybe like best practices and trends that you're seeing, I think people are kind of hitting that issue of, okay, I'm using graphql in production and now I need to think about how do I scale? How do I address more advanced problems? So yeah, it makes sense to me. Yeah. Well, I was also quite surprised as there's 20% that just said no and not the no, but planning to. So you're not using graphql and you're not planning to, but you're here. So still happy to have you, of course. But yeah, that was surprising for me. So enough talk about the poll questions. Let's jump into the Q&A. And I lost my window. Where are you? Question window. Here it is. So first question is from a friend. That's a nice name. You mentioned that Neo4j is hiring for the Neo4j graphql integration work. Can you describe a bit what the position is like and what kind of background experience is necessary? What profile are you actually looking for? Oh yeah, that's a great question. Yeah. So in my talk, I talked about this Neo4j graphql library, which is a node.js library that makes it easier to build graphql APIs backed by Neo4j. We talked about some of the features in that. So the team that works in that library is hiring engineers in Europe to work on the library. And in typescript, so some familiarity with typescript and kind of the node.js ecosystem and graphql ecosystem as well. I think there's also thinking about, we have this library that you can use to build graphql APIs, but kind of what are the next steps? Like how looking at some of those more advanced graphql use cases we were talking about earlier, like pushing scale, pushing performance. So those I think are the kind of things that that team is thinking about next. So certainly if you've scaled a graphql in production or that sort of thing, having that kind of experience, I think that team would really be looking forward to add. But yeah, certainly typescript is kind of what that team does on a day-to-day basis. If that doesn't quite fit, Neo4j is hiring for a lot of different roles and engineering talents and skills. So I certainly check out the careers page that I linked in the slides. You'll probably see something that might match your skillset for sure. There's something for everyone. That's true. When you're a database company, a database is so central to your infrastructure application that there's really so many different pieces that you have to work with. I mean, we have core Java engineers who work on optimizing the database. The Cypher query language is written in Scala. We have desktop tooling, graph visualization tooling, where you're sort of working on high performance webgl kind of thing. So yeah, there's lots of different skills and tool sets out there. Yeah, that's always funny. I always find it nice to hear how many different types of competencies you need for such a tool. Like you say, that you have Java and Scala developers working for you at your company. Yeah, it's nice to see so many people coming together from different backgrounds building more products. Next question is from user, I guess, anonymous user. What about authorization? How would we handle application authorization when using the Neo4j graphql library? Is it just up to developer to implement it themselves? Yeah, good question. So there's a few options here. I guess one of the things about the Neo4j graphql library design principle, I guess, is to be as flexible as possible. So you can certainly implement your own authorization layer as you would building any other graphql service. There's lots of different options there. But there is an authorization feature that's built in to the core of the Neo4j graphql library that uses graphql schema directives. So in my talk, I showed a few examples of using graphql schema directives to kind of configure the api a little bit. We looked at the relationship directive for defining relationships in the schema and also the Cypher graphql schema directive for adding custom logic to your api. So there's also a auth graphql schema directive that you can use to define authorization rules in your schema. So for example, you can create a rule that says only authors of a blog post should be able to edit the blog post. Or maybe if you have the role admin, you can also edit that. These sorts of things. And it works with JSON Web Token, JWT. So you can use any sort of identity provider as long as that's generating a JSON Web Token. So again, meant to be kind of as flexible as possible, but still have these features focused on developer productivity. Because it's quite nice, I think, to be able to define these sorts of things in your graphql schema that are quite powerful. So I guess that would be my first approach would be to look and see at the features supported by this auth graphql schema directive with the rules we can create. Does that match your needs for adding authorization and authentication? If so, that can be super powerful feature that's built in. Nice. Thanks, William. Next question is from Sam. How could we deploy the graphql api layer? You said that the Neo4j graphql library is for building node.js graphql APIs. But what if I want to deploy it as a serverless function? Yeah. So again, going back to this idea of trying to design the library to be focused on flexibility, really you can use any javascript graphql implementation with the Neo4j graphql library to take advantage of the things like the database query generation, the graphql schema augmentation process. Basically, what you get is a graphql executable schema object that you can then use with apollo server or really any javascript graphql implementation. So it's easy to use as a lambda function or deploy as a serverless function. I like to use next.js for building full stack applications. next.js is this framework that's built on top of react. So I can build my front ends with react and next.js. But next.js also has this really cool feature called api routes. So in the same code base, in the same framework, I can define an endpoint for my graphql api. I can take advantage of the Neo4j graphql library with that. And then when I go to deploy that, there are different ways to deploy a next.js app. But Vercel, who kind of works on next.js, will deploy your api endpoints as serverless functions without you having to think about that. So that's a good combination that I like to use for my full stack graphql applications is next.js and Vercel. It's, again, I think something focused a lot on developer productivity, which is super nice if you're building full stack apps. Yeah, that's a really nice combination. Really nice way to work. Also a big fan here. Great. Thanks. We have time for some more questions. Yeah. Next question is from Daria. What about authorization? How would we handle application authorization? Wait. I think we got that question right. It's the same question, but differently asked. Oh, no. I'm just reading double. I'm sorry. That were the questions that we have from our audience. So if there's anyone still that wants to know anything from William, now is the time to speak up or forever hold your silence. Otherwise, I'm going to let you go, William. We'll have a short break and be back in five minutes. If you want to talk to William, William is going to be going to his speaker room on Spatial Chat to discuss anything you want on Neo4j. William, it's been lovely talking to you. Have a nice day. Bye-bye. Great. Thanks a lot. Thanks for having me.