So just a little bit about me. My name is Will. I work for a
database company called Neo4j. Neo4j is an open source graph
database. We'll talk about what that means and we'll do some hands-on exercises with Neo4j as we build our
graphql api. I work on the developer relations team at Neo4j. Roughly, that means helping people build applications with graphs and Neo4j. I also work on integration
tooling, so making sure that you can use Neo4j with tools in the
ecosystem.
graphql is one of those. I also recently wrote a book published by Manning called Full Stack
graphql Applications, which you can actually download for free. This link, dev.neo4j.com, slash
graphql-book. What we're going to talk about today is largely the
backend piece of a full stack application, right? So we're going to work with the
database. We're going to build a
graphql server and we'll use some
cloud services to host that in the
cloud, but we're not going to talk about how do we integrate
graphql into our front-end application. The book goes into a lot more detail on that. So the rough outline for today, we have a few different
modules to work through. So I'll have a mix of some slides and examples, and then we have at least one or two hands-on exercises for each one of these
modules that we're going to go through. So we're going to start off with what I think for some folks will be a recap of an intro to
graphql, but just make sure that we're all roughly on the same page starting out here. Then we'll take a look at Neo4j. We're going to use a hosted
database as a service called Neo4j Aura. There's a free tier. We can just click a couple buttons and spin one up. So that's going to be our
database for today. And then we're going to look at using the Neo4j
graphql library, which is a
database integration for
graphql and Neo4j that makes it easier to build
graphql APIs backed by Neo4j. So we'll look at a couple of different ways to get started using the Neo4j
graphql library. We'll look at how we can add custom logic to our
graphql api using the Cypher query language. Cypher is the
database query language that we use with Neo4j, but there's a very interesting way that we can combine that with
graphql to add custom logic. And then if we have time, I'm not sure if we'll get to this, but we'll at least talk about this, is how do we add authorization to our
graphql api using some of the features in the Neo4j
graphql library to define authorization rules in the
graphql schema and JSON web tokens. So you don't need to worry about setting up a local development environment or cloning a GitHub repo or anything like that. We're just going to use hosted services. So I mentioned we're using Neo4j AuraDB, the free tier of that. We're going to use a tool called the Neo4j
graphql Sandbox, which is a sort of in browser tool for working with
graphql and Neo4j. And then we're going to use Code Sandbox to run some
javascript code. Links to slides here and various resources,
documentation, but I'll talk about these as we go through them. Cool. So let's talk a little bit about Neo4j. I think some folks said they're not familiar with Neo4j. So Neo4j is a graph
database. So unlike other databases that use tables or documents as the
data model, Neo4j and other graph databases use what's called the property graph
data model. So a graph, we have entities, those are nodes and relationships that connect nodes. That's the basic
data model that we work with in a graph
database like Neo4j. We use the Cypher query language primarily to work with Neo4j. There are other ways of interacting with the
database, which we'll see today, but primarily you can think of Cypher as kind of like SQL, but for graphs. There's an example here in the upper right, talking about addresses, something in New York, registered address, connected to officers and entities, and there's this sort of ASCII art graph pattern drawn here. So what is this about? Well, graph pattern matching is kind of a core part of Cypher. So the way that we define
patterns in Cypher is to use this ASCII art representation. So the parentheses first around address, that's indicating a node. So we're looking for address nodes where the address contains New York. So find addresses in New York. And then this arrow that we've drawn here with the registered address, that's representing a relationship connected to officer nodes. And then we have the shorthand for an outgoing relationship now to these entity nodes. So this is saying find addresses in New York, find officer nodes connected to those addresses through this registered address relationship, and then find any entities connected to those officers with an address in New York. I like to use this query as an example because this comes from the Panama Papers
data set, which was a
data journalism investigation a few years ago, where the
data journalists at the ICIJ use Neo4j to make sense of these leaked documents about offshore companies, because that was a very graphy problem with sort of nested structures of offshore companies and things like that. So we'll talk a bit more about Cypher after we sort of dig into
graphql a bit. On this slide, this is really showing that there's a spectrum for use cases for graphs in Neo4j. I'd like to just show that there's a large
ecosystem of
tooling, right? So as a
database, we sit kind of in the middle of infrastructure and
architecture for lots of different use cases and users. And there's this spectrum on the left for sort of application development to analytics on the right, so things like graph
data science, graph algorithms, these sorts of things. And depending on where we are in that spectrum, what you're trying to accomplish, what your use cases are, the tools in the
ecosystem that you're integrating with are a bit different. So for things like analytics and
data science, we may be more interested in integrating with some visualization
tooling, with some
machine learning pipelines, things like that. On the left end of the spectrum, where we're talking about application development, building APIs for transactional use cases, these sorts of things, that's more where we are today, talking about building a
graphql api for our
database. But there's lots of
tooling in the
ecosystem because the
database is so core to your infrastructure, you need to be able to work with
data and lots of different tools. Cool. So let's talk about
graphql. I think most folks, it sounds like, have some level of familiarity with
graphql, maybe just on the client side, maybe just the server side. So let's talk about some
graphql concepts. And then we'll take a look at a running
graphql api to write some
graphql queries. So first of all, what is
graphql?
graphql is an
api query language. We use a strict type system to define the
data that we're working with in
graphql. They're called type definitions. Those type definitions define the
data we're working with, how those types are connected, and that's where the graph part of
graphql comes in, right? We can reference other types on fields. At query time, the client asks for exactly the
data needed for building an application to render that view, whatever that may be. And then the response, the
data sent back, matches the
data matches the shape of the query. So we know exactly what
data we're getting back from the
api. So in our type definitions, we typically use the schema definition language or STL to define our types. Of course, we can define the types programmatically. SDL is nice because it's I can use SDL if I'm building my
graphql api in
javascript, Python, Go. There's a few interesting concepts in the type definitions. So first of all, we define types. Types have fields that have a type as well, right? So here we have movie, genre, and actor are our
graphql object types. And movie has things like the title, that's a string, years, an integer, so on. And then here we have a relationship field, genres, that is connecting the movie type to the genre type. And we'll see in a minute when we start working with the Neo4j
graphql library that we can annotate these type definitions using what are called
graphql schema directives. And schema directives are
graphql's built-in extension mechanism that allow us to say, hey, some custom logic should occur here. And we'll see what that is and why we use that in a minute. So that's type definitions. Let's talk about
graphql operations or commonly called queries. So operations can actually be of three different kinds, a query, a mutation, or a subscription. And these match to fields on special types, the query type, the mutation type, and the subscription type. So here's a
graphql query. So the entry point, which in this example is the movies entry point, that maps to a movies field on the query type. And we can pass arguments there. So here we are passing a where argument to filter for only movies with the title, a river runs through it. And then the rest of our
graphql operation, the rest of our
graphql query is a nested object called the selection set. And the selection set specifies the, I guess it specifies two things, really. One is it's specifying a traversal through the
data graph. So we've said that the graph and
graphql are these relationship fields that are types referencing connections to other types. And that traversal in a selection set is expressed through nesting. So here, for example, we start with the movie, a river runs through it, we then bring back the title of the movie, and then we traverse to the actors and grab the act, the name of each actor connected to this movie that we're searching for. And here, when we go to directors, we find directors of this movie, and then other movies that those directors have directed. So this is a traversal through this
data graph, just by sort of nesting our selection set, then the response matches the shape of our selection set. So on the right, there is the JSON object that we get back from the results of this
graphql query. We can see exactly the fields specified in our selection set are returned. Okay, so that is talking about type definitions, we talked about
graphql operations, it's concept of a selection set. How do we actually define the logic for fetching
data from the
data layer when we're building a
graphql api? Well, that comes in with these resolver functions. So resolver functions were the logic for resolving a
graphql request live. So here in this example, we have a resolver map of functions for a conference app. And so we have one entry point, so one field on the query type called session. And in that resolver function, we are accessing some
database ORM layer, and we're searching for sessions by some search strings. So imagine you have a conference application, and we want to allow users to search for like
graphql or something in the conference schedule, and then see the sessions, what room they're in, similar sessions that they might be interested in, and so on. And we have resolvers for each of these fields. So once we've found the sessions in the
database that match our search string, well, then if the user has selected it, we need to then go back to the
data layer to find out what room that session is in. We need to do the same thing for the theme, and then need to go back again to find any sessions that are recommended based on the session that we found searching the schedule. And this is due to the nested way that resolver functions are called. So resolver functions are called first starting at the root level, in this case, the session field on the query type. And then in this case, we have three session, three resolvers on session, room, theme, and recommended. So if those fields are requested in the selection set, we're going to make three more round trips to the
data layer for each session that we've found. And you can see where some
performance issues might come up here. We don't want to be making potentially expensive round trip queries to the
data layer multiple times for each
graphql request. And we certainly don't want to do it in the case where we found a whole bunch of sessions that match our search string, and then make multiple requests for each session that we found that could potentially be really slow. So this is pointing out what's commonly called the n plus one query problem, where we end up making lots of requests to the
data layer for any arbitrary
graphql request. There are a few different ways to address this. One common approach is called
data loader, which allows us to batch and cache the query so that we're reducing the number of requests sent to the
data layer. That can add some additional complexity. So another way to address this problem is to use
database integrations for
graphql that can generate a single
database query from the root level resolver. And that's exactly what the Neo4j
graphql library does. So instead of calling multiple nested resolvers this way, when we're using the Neo4j
graphql library, we'll actually generate a
database query at the root resolver so we don't have to worry about this n plus one query problem. So some benefits of
graphql. I think folks have probably seen most of these things before. The most commonly talked about benefit of
graphql I think that I see is this idea of overfetching and underfetching. So being able to make just one request to the
graphql layer to render all the
data needed in a view of an application rather than making multiple requests to the
backend. And then slimming down the network response so that if I'm only showing, I don't know, like in a list of blogs, I'm only showing the title and author of the blog, I don't need to fetch all this other metadata or even the full content of the blog post, things like this, that are not being used to render
data in my view. So
graphql addresses both of those issues. Also lots of big developer
productivity boosts, I think, as well with
graphql. Of course, there are some challenges that come up. I think the biggest category of challenge that can come up with
graphql is that a lot of things that are well understood from the world of REST don't necessarily apply when we're building
graphql api. So things like HTTP status codes don't quite mean the same thing. Error handling and caching can be done a bit differently. We talked about this N plus one query problem, which can be problematic if you're not anticipating that when you're getting started building your
api. So there are
best practices and
tooling, so software libraries and packages that address all of these things, but these are some of the most common things that can come up. So one of the things that is really nice about the
graphql ecosystem is
tooling. So there's one feature of
graphql that we haven't talked about yet that powers a lot of the cool
graphql tooling that we see, and that is this idea of introspection. So when we have a running
graphql api, we can send what's called an introspection query to the
api. Introspection query basically says send me all of the types that you have available in the
api, send me the schema of the
api, and then we can use that to build really powerful developer
tooling. So things like client-side syntax checking for
graphql queries, we can have autocomplete based on the introspection, and then these in-browser tools, so things like
graphql Playgrounds, GraphiQL, or
apollo Studio.
apollo Sandbox is the part of
apollo Studio that has this sort of in-browser query functionality where we can basically view
documentation of the
graphql api generated from the introspection query, and then also have these nice autocomplete syntax checking features as we're writing
graphql queries and working with the results. So today we're going to look at
graphql Playgrounds in a couple of different examples, but the functionality is similar for using
graphql Playground, GraphiQL, or
apollo Studio, similar concepts apply. Okay, cool. So let's dive in and take a look at a running
graphql api. So let's go to movies.neo4j-
graphql.com. Maybe this is easier. Maybe I can have the slides and
graphql Playgrounds side by side so we can keep track of what we're doing. So let's see if we can write some
graphql queries. So we'll open up this movies.neo4j-
graphql.com, and we'll see, you'll probably have a blank
graphql editor there on the left. And so this is
graphql Playground. This is a
graphql api that's up and running that has
data about movies. And if we go to the schema tab, we can see the full schema for the
api. But if we go to docs, we have a little more human readable for the
data available in our
api. And so first off, we can see the query fields that are available. So things like movies, genres, we have some aggregations and connection fields as well. We'll talk about those later on. But we have things like movies, users, actors, directors. And if we drill down and see we have detailed information about movies, movies are connected to things like actors. And we can go from actors to other movies they've acted in and see all the fields available. We can also see the arguments that we can pass in for things like filtering, apparently, or options at the query field level. Okay. And so our exercise here is to explore the docs tab, learn more about the
api schema. Well, we did that. And then write a few
graphql queries to answer a few questions. So we find the titles of the first 10 movies ordered by title. Who acted in the movie Jurassic Park? What are the genres of Jurassic Park? And what are other movies in those genres? So let's spend a couple minutes see we write queries to answer those questions. And then we will move on to the next section, where we will dive into see how this
graphql api is built and take a look at Neo4j. So I'll pause for a minute or two to give folks some time to play around with this. And then we will take a look at the answers for this. Okay, did everyone get this first query, which is find the titles of the first 10 movies ordered by title? Well, we can look in the docs to see that we have the movies entry points, we'll start there. So movies, and then we can see with control space, the different fields available as the title, you know, we want that. But we know we want just the first 10. So if I look in the docs, I can see for the movies query field, there's a couple of arguments to explain more down here. So there's a where arguments, which has things like filtering functionality, it looks like. So that's not quite what we need in options, though, we have sorting and limiting. So I think that's what we want. So let's add an options field argument here, we'll do a limit, what do you say first 10 movies, let's test this, that gives us 10 movies, but we want these to be sorted by title. So let's add sorting. And then we need to specify a sort direction and then we need to sort direction. And so in this
api, we can sort by title, they're in ascending or descending order. And do that in ascending order, we can see here, got some special characters that start first. So here's our first 10 movies ordered by title. Cool. Anyone get this next question, which is who acted in the movie Jurassic Park? Did anyone get that one? Well, let's take a look. So we saw previously that movies has a where argument. And if we look at the fields in that we basically have filtering for each of the fields available on the movie type. And you can see for string fields, we have these various contains starts with ends with and for numeric fields, we have things like greater than less than, and so on. And so here, let's do a title is Jurassic Park, let's work, it's going to find us. Okay, so here's Jurassic Park, found it, but we want to know the actors. And so again, just searching for things by control space autocomplete helps us to kind of see what's available actors, that sounds like what we're looking for. And actors will bring back the name field. So here's our actors in Jurassic Park. And then similar, what are the genres? So let's bring back the genres, Jurassic Park, adventure, action, sci fi, thriller, and then what other movies are in those genres. So now we can traverse from the genres to other movies by adding this movies field. However, if I just run this, well, we have potentially, I don't know, 10s of 1000s of movies in the
database, maybe 1000s of movies in the
database. So there's probably, I don't know, 1000s of thrillers, 1000s of sci fi movies. So let's add a limit here. So we can also add arguments at the field level in our selection set. So let's just add a limit 10s over each of these genres. Let's just bring back 10 other movies in each of those genres. Cool. Did everyone get those? Anyone stuck or confused on those? I think for those familiar with
graphql, hopefully this was a bit of a review of the concepts, although maybe I haven't seen this specific
api before. He says, caught up. Okay, cool. Great. Well, let's go ahead and move on then to talking about
data and Neo4j. So that's writing some
graphql queries and overview of some
graphql concepts. Let's talk about building our own
graphql api. But first, we're going to set up our
database. And so for this, we're going to use the Neo4j AuraDB, which is a hosted
database as a service from Neo4j that gives us hosted Neo4j instances. And I'll share a link here in the chat with dev.neo4j.com slash Neo4j-Aura, or you can just search Neo4j Aura in Google. And so we're going to select the free tier. We can sign up with Google and maybe a couple of other services, or you can create a username with email and password. And then you'll be asked to choose free tier. There's a few different tiers. There's free, professional, like
enterprise tier. So free tier allows us to create databases that are private to us. So we get the connection credentials private to us. But the
database stays around. So this is good for hobby projects if we want the
database to stick around for a while. We can also start with loading an existing
data set. So we're going to choose the graph-based recommendations
data set. That's going to load actually the same movie and user rating
data that we were just working with. So let's see what this looks like. So I'm going to go to dev.neo4j.com, Neo4j-Aura, and we'll just move in a bit. So I'm going to click on start free. And I was already signed in. I think I've signed in with my Gmail. And initially, I think you'll see this screen. If you don't, just click create instance. And we want the graph-based recommendations. This is going to load some initial
data in our
database. So I'll hit create. This is an important step. So this is the password for my
database. And I can copy the password or I can download a .env file that has the connection credentials. So I'm going to copy this and I will also download the .env file. And yes, I have copied that. All good. Okay. And this will take a few minutes to spin up in the
cloud. So let's pause for a minute or two everyone will want to create your own Neo4j-Aura
database. We're then going to build a
graphql api on top of your private instance. So go ahead and follow these steps. Be sure to save your generated password. You download the .env file, copy and paste the password somewhere, because you will need that in a moment. So we'll pause for just a minute for everyone to go through this process. And then we'll take a look at the
data that we just created. Let us know in the chat if you have any problems or get stuck. If he's asking, can I save my password here in the chat? I wouldn't recommend that in the chat. I guess that there's a way to send a chat just to yourself. Maybe that makes sense. But you generally don't want other folks to see password or instances up. Great. Let's take a look at what we got then. Okay. So once your instance is running, you'll see this green dot saying that, hey, this is running. So there's a few things. The free tier of Aura is limited by the number of nodes and relationships you can store in the
database. So that's what these 14 and 42% things are. And then the connection URI, this is how we connect to this
database from the
database drivers or some of the
data that we have And if I click on this open button here, that's going to take me to Neo4j Workspace, which will ask me for my password here. That's the password for my specific instance. Now, you may be wondering why I'm doing this. I don't know. I don't know why I'm doing this. I don't know why I'm doing this for my specific instance. Now, you may see, instead of open, you may see something that looks like this. Explore query import. This is the classic view. These are three different developer tools broken If you don't see where it just says open, if you click here on your user icon, it'll tell you if you're using the classic experience or the workspace, which I guess is, I think should be should now be the default for folks. This is a relatively new way of organizing the developer tools. It's basically combining those three query, explore, and imports into one tool. I would recommend using the workspace view since that's the new one. But again, I think that's the default. But anyway, you should see something kind of like this with, let's zoom in a bit, with explore query and import at the top here. So explore, this is a visualization tool that allows us to explore the graph visually without writing any code. So I'm just going to generate a perspective here. It's just going to inspect the
data and allow us to visualize the
data. The perspective is what basically allows us to map the
data to the visualization. So maybe I don't want all of the
data to be available in the visualization tool, that sort of thing. But what's nice about this explore tab, which this is a tool called Neo4j Bloom, if you've seen this before, and B says yes, the scene is empty. Yeah, that's fine. The scene is empty, it just means that we haven't returned any
data yet. So let's try to search for a movie. So what's nice about this is we have this sort of natural language way of searching. So I'm going to search for movie by title. We were looking at Jurassic Park earlier. So let's look for Jurassic Park. And you can see as I'm typing, it's giving me some options here. So let's find what are the genres of Jurassic Park. Oh, we end up finding a couple. We have Jurassic Park and Jurassic Park 3. And we can see the genres of those. We can double click to get details on the nodes. We can expand out to see the actors in the film and so on. So this is just a way to interact visually. This is useful for if we're building something like a tool for an analyst who is not going to write
database queries, but we want to give them a visual tool to explore our
data, we can configure this visualization, we can add custom logic to it as well to sort of set it up for more of an analyst role. We're going to also use this query tab. Let's zoom in quite a bit here. Maybe not that much. So the query tab, this is where we can write Cypher queries and then work with the results. Before we start writing some Cypher, I think I have just a couple of slides to talk a little bit about some Cypher concepts. Okay, so we did all this. This is just walking through setting up Neo4j Aura. So we saw that example at the beginning on kind of one of the first slides when we were looking at an example of a Cypher query looking at folks with an address in New York that were connected to offshore legal entities. And the foundation of Cypher is this idea of graph pattern matching. So here we're using the match keyword, match is saying, hey, you'll find this pattern where it exists in the graph. And we can build up these
patterns using this ASCII arts representation of a graph. And so the most fundamental pattern we can define is a node. And so that's an open and a closed parentheses. That's sort of drawing a circle, right? Drawing a node. We can add onto that pattern by adding the label of the node. So labels are a way to group nodes. You can think of them, if you're familiar with relational databases, as similar to a table from a relational
database. So this pattern on the second line says find nodes with the label movie. Once we've matched on a pattern, it's useful to then be able to refer to pieces of that pattern later on in our Cypher query. And so that's where the M here on the third line comes in before the colon. And so this is saying, bind the variable M to any pieces of the graph pattern, in this case, the node, so the movie node, that I can then use to refer to this piece of the pattern later on in the query. So M is now my movie node. I can add inline predicates, or I can also use this where clause to define predicates. But for inlining property equality predicates, I do that inside curly braces. So here, these curly braces now inside the parentheses says, find movies with the title Jurassic Park. This is equivalent to match M movie where M.title equals Jurassic Park. So you can see here, we're using the M variable that we bound to the movie M over here in this part of the query. So these two with the curly braces or the where clause, these are equivalents. But with the where clause, we could also instead of the equality operator, we could say where M.title contains Jurassic Park or starts with Jurassic Park, things like that. And then once we've matched on a pattern, we can return it, we can return the results to see what
data we've matched on. So that's nodes. But we're working with graphs. So relationships connect nodes. So how do we define relationships in the graph? Well, relationships are square brackets. So here in that first line, that's the most basic form of relationships that we can define, which is just a blank relationship. So it's basically saying match on all relationships in the graph. We have a similar concept for relationships similar to the labels of nodes, relationships have a type. One difference between relationship types and node labels is that we can have multiple node labels. So if I was asking if we're saying search all tables tagged as movie. Yeah, so we are specifically searching for nodes that have the label movie. So we don't have tables in a graph
database. But conceptually, we can think of the concept of labels as similar to tables, I guess the equivalent analogy would be nodes are like a row in a table, if that makes sense. But yeah, so this this is basically saying find all nodes that have the label movie. And we'll look at some examples of this in a moment. Hopefully that will clarify a bit. And then the concept of labels for relationships with the relationship type is very similar, but we have a single relationship type. Whereas with nodes, we can have multiple node labels. And in fact, in our
data set, we do we'll see that in a minute. We have nodes that can be both movie. Not in this case, we have what actor we have actor and person, or you can also be an actor and director in the
data in the
database as well. Anyway, now relationships also have a direction. And that's where this carrot comes in to kind of draw an arrow to indicate the direction. We can treat our graph as undirected. So when we're searching for
patterns, if we don't want to include direction as part of the pattern that we're searching for, we just leave that arrow off. But when that relationship is stored in the
database, there is a direction. This idea of direction and relationship type is important, because we'll come back to these when we start working with
graphql to see how we map this piece of the property graph model that we use in the
database, how we map that to
graphql. So as I mentioned, the relationship schema directive earlier, we're going to use that to encode the relationship type and direction. But anyway, that's getting a little bit ahead of ourselves. And so similar, we can see how we're building up our pattern here to now search for a relationship connecting an actor and a movie. We can bind variables to our relationship, just like we did with nodes. So here we're using R to refer to the relationship that connects the movie Jurassic Park and any actors. And I linked here the Cypher cheat sheet, which is like a reference card for Cypher. There's lots of examples here. This is helpful for seeing what's available in Cypher a lot more concise than the official
documentation, which while we're talking about
documentation, let's bring up the Neo4j
documentation, which I'll drop a link to in here. There's a section specific to Cypher that goes into a lot more detail than the Cypher cheat sheet, which is mostly just sort of showing syntax examples. And then the other piece of
documentation that we want to just be aware of for today is the Neo4j
graphql library here. Drop a link to that. This is linked in the slide, but this we're going to start using in a moment as well. Okay, so we talked about some basics of Cypher. Let's see how we use this. So I'm going to go back to Neo4j Workspace. So here, so we looked at the Explore tab. This is for visualizing the graph and exploring the graph visually. The Query tab, this is where we can write our Cypher queries and visualize the results. So match m colon movie title, Jurassic Park, what we're looking for. And then we can return m. So this is what we saw on the slide a minute ago. And we get back this single node. So we get a graph view. If we double click on this, we can start to explore the
data a bit, but this is perhaps not as helpful as what we want because we've now also added all of these users to the
data. We've added all of these user rated. So here's a user and they rated Jurassic Park a 4.5. So we can store properties or the attributes. So that's the actual
data. So things like title, we can store those on both nodes and on relationships. So here's this rated relationship that has a rating property of 4.5. It has the timestamp of when this was created. So we can store arbitrary key value pairs. We call them properties on both nodes and relationships. Okay. Let's modify our query a little bit to include the relationship piece of things. So acted in actor. And then I need to add the direction. So note that we can write our queries kind of either way. We just have to make sure we get the direction of our relationship correct. So actor acted in movie. And I want to return, let's just return star to turn out let's just return star to turn everything, not just the movie node. Cool. So here we go. So here's Jurassic Park. We have the actors. And again, I can kind of double click to explore the graph. So here's Jeff Goldblum. Here's a bunch of other movies that he was in as well. But I can, rather than sort of just clicking through, I can define any arbitrarily complex graph pattern here using Cypher. So if I wanted to find who acted in Jurassic Park and what other movies they acted in, I would just add on to that pattern like this. Oh, and the reason we're seeing these as disconnected is because I didn't bind anything to this second acted in relationship. So I can call that R2. And that will give me Jurassic Park, the actors of Jurassic Park, and then what other movies those actors acted in. Interestingly, it doesn't look like at least these four Jeff Goldblum, Sam Neill, Richard Attenborough, and Laura Dern were in any other movies together, at least not in this
database. So I can bind pieces of the graph pattern to variables, or I can just bind the entire path, I think. So I could do something like this, where P is this path, and then I'm returning the path, and then I don't have to think about binding all of these or declaring variables in Cypher for all of these pieces of the pattern. So I can remove those. Okay, let's take a couple of minutes for folks to go through the exercises here. So let's write some Cypher queries to find the movie The Matrix. Can we find other movies in The Matrix series? Who acted in The Matrix movies? What other movies were they in? What's the average user rating of each of The Matrix movies? And then this last one, this is going to be useful for when we start adding custom logic to our
graphql api, is what movies would we recommend to someone who likes The Matrix? So starting to think a bit about, can we write a recommendation query? So use the Cypher
documentation that we saw, the cheat sheet, to try to write some graph
patterns to answer these questions. And we'll pause here for a few minutes to give folks some time to work through these, and then we'll come back and go through the solutions together. And then just to give you a look ahead of where we're going next, the next thing we're going to look at is the features of the Neo4j
graphql library, and we're now going to start to build a
graphql api on top of the Aura
database that we just created. So the first question is, find the movie The Matrix. Raphel said he got the first two. Okay, cool. To find The Matrix, we're going to say, match on movie where title is, and we specifically wrote this out as Matrix, comma, the, because that's how, I think, we have it in the
database. Yep, here's The Matrix. Cool. So if I do The Matrix, I'm not going to find it because we're just searching for an exact equality. So I get no changes, no records. And then the next question, I think, was, can I find all the movies in the Matrix series? So that's where I can change my syntax a little bit and say where m.title contains Matrix. And return in, let's give me where there's three, at the time this
database was created, there were three Matrix movies. So here's three Matrix movies. Cool. Okay. Good. And then who acted in The Matrix movies? What other movies were they in? Okay, that's going to be kind of similar to what we saw before, where we added on to our pattern. So movie acted in, or the other way, rather, right? So actor acted in movie. So I want the arrow going into the movie node. And here's our actors. Oh, but I'm just returning in. Let's return everything. So Keanu Reeves is in all three. Cool. Carrie Ann Moss is in two of them. Okay, cool. So that looks right. And then the other question was, what other movies are they in? Well, and we saw that again, when we're doing Jurassic Park. So we just add on to the graph pattern that we are creating. And I'm going to switch now to finding to the whole path and returning the whole path. So here's The Matrix. Here's the actors in The Matrix. Here's other movies that they acted in. So we can see the graph. Cool. The next question was the average user rating for each of The Matrix movies. Well, if we take a look here in this click on this
database information slide out, this will give us some information about the
data that we have. We have actors, director, genre, movie person user for node types, we have active and directed in genre and rated as relationship. So we want to calculate the average of the rating. So let's take a look at this rated relationship. So I just clicked on rated here in the relationships in this
database. I know we call this a drawer a slide out by clicking on this
database icon. And this just executed a simple query to find 25 rated relationships. And we know this because we we looked at this earlier. But if we click on one of these rated relationships, we can see that we were storing properties. So the rating, and then a timestamp as properties on the relationship. So we know that's, that's what we want to calculate the average of when calculate the average of this rating property. Okay, so let's go back to the query that we're building up here, maybe we can remove some of this. So we know we want, we know we want to do this for all matrix movies. So we're going to maybe we'll remove this piece, we're not looking at actors and other movies, we're looking now at the movies. But then now we want this, we'll say r colon rated, and then it's a user who rates a movie. So now, what we want to do is return. And if we look in the cipher cheat sheets, and if we search for, I think, these are in the aggregating functions. Yeah, and aggregating function sections, we have things like count, collect some. And then also average min and max. So this is how we can figure out that we have an average function in cipher. Turn the average of r dot rating. And this should give us 3.543 blah, blah, blah is the average of all movies. Oh, I deleted where m dot title contains matrix. Okay, so now for just the matrix movies 3.83. Okay, so better matrix movies are rated better than average in the
database. But now this is the average across all three of the matrix movies, right, because we had three matrix movies, we're not searching for just the exact title. So it'd be nice to be able to group by the results of each movie. So for each of the matrix movies, what are the ratings and I would guess I would anticipate my hypothesis is that the first matrix movie has higher rating than the others. So in SQL, to do this, we would write what's called a group by where we are executing an aggregating functions, in this case, average is our aggregating function. And we want to run that over groups of intermediate results. In cipher, there is a implicit group by that anytime we return results of an aggregating function alongside some value that we implicitly group by that value. So if I return the title of the movie, this as keyword, this is new. This is just aliasing the results to a new variable called movie. And then the average r dot rating grouped by the distinct values of title. And let's call this average rating or something like that. And we can see that yes, my hypothesis was correct, the matrix 4.18 average and the others are 3.2 and 3.0. So notice that previously we were getting this graph representation of the results. Now we're getting a table view and that's because we're returning tabular
data returning just rows here. Oftentimes, the answer to our question is a table, even though we are traversing the graph to get those results. So depending on what
data we're returning, if it's graph
data, we'll have a little graph visualization. If we're returning tabular
data, we'll work with tables. Okay, now this last question, what movies would you recommend to someone who likes the matrix? Well, this question is getting at a common sort of use case that we have for graph databases, which is recommendations or personalization. And we see these on
e-commerce sites. So users who bought this book also have bought these other books. So we're going to give those as a recommended product that the user might be interested in. And when we're generating these recommendations, there's a couple of different ways to approach this. There's the category of what's called collaborative filtering, where we're looking at user interactions in our
data sets. In this case, we have user ratings. So we're looking at user ratings to help us determine what would be a good recommendation. Basically, the approach here is to find similar users based on my rating history and what are similar users to me purchasing. Another approach would be to look at content based options. And in this case, we have things like the genre of the movie, the director, the actors. So if I like movies of a certain genre, recommend me more movies of those genres. So let's take a look at how we could modify this query to let's look at the content based approach. So I like the Matrix movies. What are other movies that might be a good recommendation for me? So we can get rid of our user rated here. And we'll do a fairly simple query here. So in genre, so this is basically, go ahead and return that this is saying find the genres of the Matrix movies and things like sci fi, action, thriller, adventure. Well, to recommend other movies, we just traverse out now following that incoming in genre relationship, and we'll call these our recommendations. So rec.title is going to be our recommendation. And we can use another aggregating function here called count to count the number of overlapping genres. We'll say count star as score, and it's ordered by score in descending order. Cool. So here's our recommendation. So if we like the Matrix movies, the top recommendation is the Three Musketeers, followed by King Kong, Total Recall, Godzilla, and so on. I don't know, I'm not sure how good those recommendations are. What do you what do you all think? That's a simple content based approach. We're going to use this idea, maybe we'll try the collaborative filtering one when we get to adding custom logic to our
graphql api. So what we're going to do now is build a
graphql api, exposing the
data in this recommendation
database. And we're going to see how to do that using the Neo4j
graphql library. And then we're going to add custom logic to that to be able to show personalized recommendations that if I like a certain movie, show me other similar movies based on user ratings in the
database. And let's talk a little bit about the Neo4j
graphql library. So we talked a little bit about sort of the approach of building
graphql APIs by first defining our
graphql type definitions that defines the
data that we're working with in our
graphql api. And then we write resolver functions that define the logic for how to go to our
data layer to either search for
data in the
database or to create
data for using mutations. We saw a couple of issues that came up. One was this idea of the n plus one query problem where because of the nested way that resolvers are called, we make multiple requests to the
data layer. Or honestly, there's a lot of boilerplate code, a lot of boilerplate
data fetching code for fetching things from the
data layer, which can be a bit of a drag on developer
productivity. If I want to get something up and running fairly quickly, I want to focus on areas of my app that I have more competitive advantage, like adding custom logic, these sorts of things. And so to address those issues, I would say the boilerplate and
performance aspects. Specifically, there's a whole crop of
tooling. So
database graphql integrations. And one of those is the Neo4j
graphql library that makes it easier to build
javascript. So specifically
node.js javascript graphql api is backed by Neo4j. So we're not talking about querying the
database directly with
graphql. We're talking about a library that we use alongside our
graphql server code to build that
api layer that sits between the
database and the client. So let's talk a bit about the high level goals of the Neo4j
graphql library. The first is to support this idea of
graphql first development, where we start with
graphql type definitions, and those type definitions can then drive our
database data model. So we don't need to maintain two separate schemas, one for our
api, one for our
database. Instead, we're letting
graphql, the type definitions, define both the schema for the
api and the
database. Now we'll see where we can configure the
data model for the
database using schema directives. So it's not the case where we want to have an exact one-to-one mapping all the time. Sometimes we want to have some configuration options or some nodes we don't want to expose in the
database and so on. So we'll see how we can configure those using schema directives. The next high level goal of the Neo4j
graphql library is to take those type definitions where we describe just the types, the fields on those types, how those types are connected, describe that basic graph, and then auto-generate all of the common
graphql api operations that we would need to work with that
data. So these are the CRUD, create, read, update, delete operations, so the query and mutation fields, but then also all of the little pieces that go along with that. So the field arguments to support ordering and pagination and complex filtering and also adding the types for things like datetime or the geospatial
data types, adding functionality for aggregations, these sorts of things are generated from those basic type definitions and added to the
api schema. Then at query time, the Neo4j
graphql library is able to take any arbitrary
graphql request and translate that into a single
database query, in this case, Cypher. And that is important for a couple of reasons. One is we're able to avoid that n plus one query problem, so we basically make one request to the
database and a graph
database like Neo4j is optimized for traversing the graph, so from one node to any other that we're connected to, which is the equivalent of the nesting in our selection set. So oftentimes our
graphql queries, because we're fetching all of the
data to, for example, render a view in our application, we can often have lots of nested structures in our selection set that would be equivalent of a lot of joins in a relational
database, which can have some
performance issues when I have a lot of
data and a lot of joins, things start to break down. But a graph
database, that's exactly the thing, graph databases are optimized for these traversals through the
data graph. So that's one benefit is addressing this n plus one query problem, letting the
database optimize that traversal through the
data graph. But the other big advantage here is this means that we don't need to write these resolver functions because the
database queries are generated at query time. That's taken care of for us. So big developer
productivity boost here that in order to get started, we basically just need to define
graphql type definitions and point the package at our
database. Now we mentioned a few times this idea that we want to be able to add custom logic to our
graphql api. So far, we've just seen sort of basic CRUD logic. But here's an example where we have taken a cipher statement. So this comes from a business reviews. Actually, I think this is an example from the full set
graphql book, where we have a site about businesses and user reviews of businesses. And we've added a recommended field on the business type that's going to return a list of businesses. And so here we're traversing through user reviews to find similar businesses. So a recommendation query, kind of similar to the one for the Matrix movies that we wrote just a moment ago. Now this cipher schema directives, you can see here the syntax is at cipher and then a statement argument. And that is our cipher statement that has the logic for that query. So we said earlier that schema directives are
graphql's built in extension mechanism. And so we're extending the type definitions for our
graphql api to use this cipher schema directive to attach custom logic to our
graphql type definitions. And then at query time, this query, the cipher query that we've attached to here the recommended field, that gets picked up in the overall generated
database query. So we're still able to generate just a single
database query at query time. We add this as a sort of sub query in the overall generated statement. Okay, so how do we use this U of J
graphql library? Well, it is published as a
npm package. So we commonly would use this alongside something like
apollo server or
graphql yoga, if you're familiar with the
node.js graphql ecosystem. There's a couple of peer dependencies, the Neo4j
javascript driver, and then the
graphql javascript implementation. Here's the basic snippet. So this is kind of the minimum code that we need to get started. Yeah, there we go. We drill in a little bit more. So first we're pulling in some package imports, we're pulling in the Neo4j
graphql library, the Neo4j
javascript driver, which just allows us to make a connection to our
database. And then
apollo server, which is, I think, probably the most common
node.js graphql server implementation.
graphql yoga is another popular one, but this works with any of the
javascript graphql server implementations. Then we define some type definitions. So here we have movies and genres, which is a piece of the
data that we're working with today. And then we have this at relationship with type and direction. So I said earlier that in the property graph model that we work with in Neo4j, every relationship has a single type and a direction. And so because of the conventions for naming that are a bit different from the property graph model that we use with Neo4j and
graphql. So
graphql, like a field name, we're typically using camel case like this, where we start with lowercase and then uppercase every word. The convention in the property graph model with Neo4j for relationship types is to use, I think, is this called snake case, where we have everything in all caps and then underscores for the spaces. So encoding this information in a relationship directive allows us not to mix these conventions, but then also it allows us to capture the direction of the relationship, which we don't have that similar concept in
graphql. So here we use that relationship schema directive to encode a bit more information. Then we create a Neo4j
javascript driver instance with our connection credentials to our
database. Here it's just running locally, username and password. Then we create a new Neo4j
graphql instance that we're calling NeoSchema here, passing our typedefs and our driver. And this goes through a schema augmentation process where we're basically generating all of those query and mutation fields, things like the field arguments for pagination sorting, filtering, all of those things, and generating the resolver functions. Then we pass that schema object, in this case, off to
apollo server and serve our
graphql api. So we didn't have to write any resolver functions. Those are all generated for us. Okay, so that's the high level goals of the Neo4j
graphql library. Let's take a look at an example and we'll kind of walk through the query
api to see how we can use that. We're going to look at a different example here working with an
api for an online bookstore. So we have orders, customers, addresses, books, and reviews. So you can see the graph model that this defines. So on the left, we're defining our
graphql type definitions. And in the
database on the right, that's mapping to the equivalent property graph model in Neo4j. So let's look at a couple of these directives that we're using and some of the types that we're using. So I think these are quite interesting. In general, the way that we configure the generated
graphql api that we want to work with is by using these schema directives. So these schema directives are really powerful. So we already talked about the relationship schema directive where we're using to encode the type and direction that we wanted to find in the
database. Another one we're using here is this ID directive. So in this case, we're saying that type order has an order ID that's of type ID is non-nullable. So the order ID field is required to exist on every order. And by adding this ID directive, this will allow us to anytime we create an order object will automatically generate a UUID. So generate a random ID and store that on the order node. Similar idea for timestamp when we create or update the order by adding this timestamp schema directive, the play stat field will automatically be generated for us. We're also working with datetime and point scaler types that are available in the
database. So those we can define in our type definitions. Okay. So let's look at the query
api. And then we will we'll see how we can use this for our movie recommendations
data that we're working with. So let's talk through a few of the generated query fields and how we can use those. So by default, each type that we define in our type definitions is mapped to a node label. And then we get a and then we get a query field for that type, but pluralized. So in our example here, we have type book. And so that maps to a books. So pluralized because that's returning an array of books. So you can think of these query fields as kind of the starting point for a traversal through the
data graph. Note that the response matches our selection set, just as we would expect in
graphql. So any fields that we add are included in this generated
database query behind the scenes. For sorting and pagination, well, we saw a little bit of this early on in the first movies example
data set we were using, where we have the limits, we also have offset that allows us to do pagination. And then here we're sorting by price of books. We can also do cursor based pagination based on the relay connection type. So if you've seen the GitHub
api, I think is a common
graphql api that uses the relay spec for cursor based pagination, where instead of specifying limits and offsets to page and say like groups of 10 or something, we get a cursor back from each query that we can use to fetch the next page based on that cursor. So we have a couple of different options for pagination available. We saw a little bit of this again when we were looking at our first movies
graphql example, but we have complex filtering options that are available in this where predicate. So here we're searching for books where the price is less than 20. And we saw that these filtering inputs are generated for each field on each type. It's based on the type of the field. So for string fields, we have the string comparison operators for numeric fields, we have things like greater than less than, and so on. We can use this filtering nested in the selection set. So here we're searching for books with a price of less than $20. And then for the reviews of those, filtering out reviews are only showing reviews that were created after a certain date. So note though that this filtering, when we include the filter arguments in a nested piece of the selection set, the filtering is applied at the level where we've inserted it. So we're not filtering books, we're getting all the books that are less than 20, but then the reviews is where the filtering is being applied in this case. We'll see an example where we can have the filter applied at the root level. We can also search by geo distance, so radius distance. So in this case, we have addresses that have a latitude and longitude. Here we're searching for address nodes that are within a kilometer of a certain point. And that's available on any of the point types in the
api. So here's an example where we are applying the filtering at the root level based on a traversal through a relationship. So in the previous example here, where we have this filter on the reviews field, the filtering is being applied at the reviews field. So not at the books. The books, we're getting any books with a price less than 20, and then we're excluding some reviews. But what if we want to find orders that are within a kilometer, where the address for the order is within a kilometer of this point. So address is its own node connected to the order. So in this case, we have a ship to relationship to traverse. So we basically just move up that filter to the root level. But we specify the relationship that we want to traverse for it to apply to. So we wanted to apply to the ship to address. And here we have the location less than distance one kilometer from this point. So now we're only going to find orders where the ship to address matches this distance filter. And we can apply this nested logic, not just with distance filters, but any of the filter arguments. Okay, so that's a quick look at the query
api. We kind of skipped over the generated mutation
api. But that's okay. We'll see some examples of that in a minute. So let's get back to some hands on stuff. Let's take a look at a couple of different ways to use the Neo4j
graphql library. So one way is going to be saw some code snippets, we can create a
javascript graphql server application using something like
apollo server
graphql yoga. Or we can use the Neo4j
graphql toolbox, which is a low code in browser tool for development and
testing graphql APIs with Neo4j. So let's start with that one, the Neo4j
graphql toolbox. So this is a hosted web tool. I'll drop a link to that in the chat here. So a question from B, how do we know the distance is in meters and not feet? Oh, yeah, that's a good one. So we know that simply because that is the convention used by the distance function in Neo4j. So it's specified in the
documentation that that is the default unit. And I think that's also specified in the
graphql documentation as well, if we dig into that. So yeah, simply convention. That's how we know that. Yeah, good question. Cool. So let's use the Neo4j
graphql toolbox to connect to our Neo4j Aura instance that we created in the previous exercise. And let's see how we can use that to develop and test a
graphql api using the Neo4j
graphql library. So Neo4j
graphql toolbox, I dropped a link in the chat,
graphql-toolbox.neo4j.io. It's also right here on the screen. This is a neat tool. It's currently in beta. So the naming and features are not finalized, but I thought it would be worth exploring because it's a great way to kind of get our hands dirty quickly without having to write much code. And basically what this does is bundle the Neo4j
graphql library in a way that allows us to sort of generate a
graphql api and test queries against a
database without actually creating a
graphql server. There are some exercises defined here. Basically, what we want to do is open up the toolbox app and put in our connection credentials for our Neo4j Aura instance that we created. So hopefully we saved those connection credentials. And then the first thing that's going to pop up is an option to generate type definitions by inspecting the
database. And initially we're going to hit cancel and just use, there's like a default movie type that we're going to use. And I want to start with that because that's going to simplify some things for us. So then there's basically, if we look at, maybe I'll just go through this to make it a bit more clear what we're doing. So
graphql-toolbox. And the first thing I need to do is put in my connection credentials. So if you downloaded the .env file, that will have your connection URI. Or we can go back to Aura. Where's my Aura instance? Oh, I got, I get logged out. Let's log back in. There's another way to find our connection URI, which is this part right here. So in the Aura dashboard, there's connection URI for your instance. We can click that little icon here to copy it. That's the connection URI I want. And then the password, hopefully you saved that. I saved mine over here somewhere. There it is. But it's also will be in that .env file. If you downloaded the .env file, paste that in, hit connect. And then you'll see this, generate type definitions. This is basically saying that there's
data in the
database and we can generate
graphql-type definitions to match the
data in your
database. I'm going to hit cancel because I want to start simple and see this mapping a bit more clearly of how we define type definitions to drive the
database. So starting off, we should see just this single movie type with just one field, title, it's a string, which happens to match the
data that we have. So if I hit build schema, this will now take me to a different view and we see we toggle back and forth over here between type definitions and query editor. But that build schema step, by clicking that, basically that is generating the
graphql api. So going through that process of adding the query and mutation fields, adding the arguments for filtering, pagination, these sorts of things on a fairly simple
api. Just the type movie with a single field. But we can see what is generated for us. So we're using in
graphql Toolbox the Explorer integration for
graphql that combines this idea of showing us the
documentation and giving us just kind of a one-click option for writing our selection set, which can be kind of nice. And then I can click this little play button to run this query. And so this runs against the
database. Give back my 10 movies. So this runs against the
database, even though no server is running. It's all kind of managed in the browser. If I go to the developer tools, so let's go to the
javascript console. And if I run this query again, do I need to turn on
debug mode? Where is that? Oh, yeah, enable
debug. So what is trying to show is the generated query. So if we go back to type definitions and enable
debug mode, now when I run the query, we should see, oh, do I need to hit build schema again, maybe?
debug mode, build schema. There we go. So if we enable this
debug mode, hit build schema again, then when we run the query, we'll see the generated Cypher statement. Let me zoom in on this. So the
debug mode will show us the generated Cypher statement before it is executed. So in this case, we generate this Cypher statement. So it's fairly simple, match movie, return limit 10. But that's helpful to see what is being generated in terms of
database queries. Okay, so that's the basic idea. So there's a few things to work through here. One is query to find all movies, and then find just Jurassic Park. Well, we can do that one now. We can, this is very similar to what we did before. So that's fine. So we're going to add, actually, let's do it by clicking through on Explorer. So we know this is going to be a where, we know this is going to be a where, where title is Jurassic Park. Let's run that. Here's our movie. And now we can see the generated Cypher statement includes this Jurassic Park predicate where this.title is Jurassic Park. Okay, so now, though, what we want to do is update our type definitions to include actors and query to find who acted in Jurassic Park. And there's a reminder here to use the relationship schema directive with a link to the
documentation. So basically, what we want to do here is just kind of go back to the type definitions and edit our type definitions here to include the actor type. And the connection between actor and movie, so that then we can write a query to say find everyone who acted in Jurassic Park. And then we can add a query to find who acted in Jurassic Park. That point is just to get across how we define our type definitions to drive the
graphql generated
api. And then once we've done that, then we're going to check out this introspection feature and see how that works. But let's stop here for a couple of minutes. We'll give folks time to work through these couple of steps here. So if you haven't already opened up this
graphql toolbox, connect it to your Aura instance using the connection credentials. And then first hit cancel. Don't use the generated type definitions. See how the
graphql toolbox
tooling works. And then we'll write some queries and take a look at that. So we'll pause here for a couple of minutes. And then the next thing we're going to look at is sort of the equivalent of this, but actually writing a
javascript node.js graphql server application. And then we're going to take a look at custom logic and adding the personalized recommendation feature to our
api. What we wanted to do here was start off with just the movie type and we wrote some queries. We saw how the arguments for filtering are generated with just a simple field like the title. Let's add the actor type. So we know that actors have a name. We'll make that a required string. And then we want to be able to connect movie to actor. So we're going to say actors is going to give us a array of actor objects. And we need, because we're using the Neo4j
graphql library, we need to add this relationship directive for the connection between movie and actors. And that has a type argument, which is going to be acted in and direction, which is an enum. So two options in or out. And so the way to think about this is, well, it's acted in. So we're going from the actor to the movie, the actor acted in a movie. So in that case, that's going to be coming in to the movie type. And now if we hit build schema, we get this error that says list type relationship fields must be non-nullable and have non-nullable entries. And that's just saying that this needs the exclamations. And the reason for that is that we will always return an array from this actor's field. It may be an empty array if we don't have any actors connected to this movie, but we're not going to return null. And then inside the array, the only thing that will be inside the array are actor objects or actor nodes from the
database. And so that's why we have the inner non-nullable. We're never going to return an actor, an actor, and then a null inside the array. So that's just some type checking to make sure we do a good job there. And now if we look at the generated fields here, we have actors and on the movie type as well, we have an actor's field. So let's add actors and the name. And then for the movies, we want, are we searching for Jurassic Park here? I think that was the, there we go, that was the assignment. So who are the actors of Jurassic Park? Well, we've seen this one. So we, just to recap, if folks couldn't quite hear me, we added the actor's type in our type definitions here. And we added the relationship directive here on the actor's field and movie to allow us to query from the movie to the actors. Then in our query that allowed us to write this query to search for Jurassic Park, who are the actors of Jurassic Park? Cool. And we can see here the generated cipher. So now this cipher, the generated cipher statement includes this traversal from actor to the movie. So you can see first the filtering is applied to find the movie Jurassic Park. And then we have this connection from movie to the actors. Cool. One thing to note here is that this doesn't give us the ability to go from an actor node to see what movies the actor has acted in. To see that, we would need to add a relationship field to the actor type as well. So that would look something like this. Now the same relationship type is going to be acted in, and this time the direction is going to be out going. So like that. Now we could go from, actually if we hit build schema, we can test this, because now for the actors, we can now see what other movies they have acted in. So here are the other movies that Jeff Goldblum has acted in. Cool. And I think the last question on our exercise here is now select the introspection button, which we could have done the first time. So if we go back to type definitions, we have this option here introspect that says this will overwrite your current type definitions. That's fine. We're now inspecting the
database. So we're looking at the
data model in the
database and generating the
graphql type definitions for all of the
data in the
database. Which is cool. We can see here, though, there's a few things that we haven't seen before and some things that maybe read a little bit funky that we might want to tweak. Before we actually use this. So one thing that we see right away is there's an interface here called active and properties that's now using this relationship properties directive. And what's going on here is in our property graph model, and we should have one of these for the rating as well. Let's look at the rating one. That's one that we actually used here. We have rated properties. That is the relationship properties. And what's going on here is that because we in the property graph model, we stored the rating as well as the actor's role in a movie. That's information that is relevant for the relationship or the connection between two nodes. And so we have a different way of modeling relationship properties. In
graphql, we use the relay connection type, which introduces the edge type, if you've seen this before, to work with sort of meta information about the connection between two types in
graphql. That's a good use for the relay connection type. And so in order to define those, so what are the properties that we've stored on relationships for that, we model that using this relationship properties directive, which then gets mapped to relay connection types. So that's one thing we haven't seen before. The other thing we haven't seen before are these at node additional labels directives on actor. Well, we said earlier that in the property graph model, we can store multiple labels on a node. So for example, actor also has the person node indicate that this is a person similar for director and so on. And so in
graphql, we have a directive to add that additional information to capture that additional label there. You'll also know we generate an actor to type, and this is because there are some nodes that have both the actor, director, and person label. So that gets added as a separate type, which we don't really need. So we can remove that. We can remove the acted to acted in to connect the movie type. And then things like this actors acted in the introspection function uses some convention to basically encode the relationship type and the type of the other node that we're connecting to. So we don't need to be that verbose. We can just say this is the actor's field here and similar user ratings is maybe a better way toward this. So you can see that introspection feature is really powerful when I have existing
data in the
database and I want to generate the
graphql api based on that. But oftentimes it's something that's generated, something that we might want to tweak a little bit, which we can do in
graphql toolbox. So now if I hit build schema, now we can see the full schema that is generated for us. Cool. So we've got 45 minutes left. Let's move on to the next exercise, which is connecting our or instance to actual
node.js graphql server. So
node.js graphql server. So in this case, we want to actually write some code that has the logic for our
graphql api rather than using
graphql toolbox. Toolbox is nice for
testing and development, but it doesn't give us like a
graphql api that we can then deploy somewhere. So for this one, let's take a look at a code sandbox. And so I'm going to copy the link, drop this in the chat. Have folks used code sandbox before? If you're not familiar, code sandbox, I'll show you what this looks like. Code sandbox allows us to run
javascript code in the browser. So in this case, because it's
node.js code, it spends up a container for us and allows us to run that container and write code and edit it in the browser. And so that link that I shared, that's pulling some code from GitHub to open up a code sandbox. You can then sign in to code sandbox with GitHub. And if you click this fork button, then it will make that sandbox private to you. And the first thing that we want to do here is take a look at this .env file. And there's kind of a dummy
database connection in there already. So we want to update this to point to our aura instance. And I grab the connection URI, replace that environment variable here. So this .env file, this is setting environment variables. So in this case, the secrets, the things that I don't want to check in to version control. So the user is going to be Neo4j. And then the password I saved. There we go. That's my password. So I'll save that. And that will restart my sandbox. And now I should be able to write some queries. Let's try a simple one. Oops. Movies, options. Let's do just a limit 10, just to test that we're connecting to our
database, pulling in some movies. Access on. Oh, let's restart that. So I think we need to restart that after we update our connection credentials, tracing query plan. Don't need that. Never mind. Let's restart the whole container. The point I want to share here is really to show, there we go, to show what this looks like to run outside of Toolbox connected to our Aura instance. So I'm getting this error message. Is access on
database Neo4j is not allowed to run outside of Toolbox. So I'm getting this error message. Is access on
database Neo4j is not allowed for user recommendations with roles public. And that is the expected error to get for the previous
data that I had loaded in here. But I think, did we not properly restart the container perhaps? I'm going to go to restart sandbox. So I think it did not pick up my changes perhaps to the .env file. We don't need this
database. We're just using the default
database. While we're waiting for that to restart, let's take a look at the code that we have in here. So let's start with index.js. So what's going on here? Oh yeah, there we go. So now I restarted the sandbox to pick up my changes to the .env file and now we're connected to my Aura instance. So that looks good. But let's look at the code. So this is similar to the code snippet we saw before, perhaps just with a few extra pieces. So we're pulling in the Neo4j
graphql integration. We're pulling in
apollo server. We're pulling in the Neo4j driver instance. A couple of helpers. One fs, this allows us to read from the file system. .env allows us to read that .env file and set those environment variables. We are reading our type definitions from this schema.
graphql file. And this should look similar to the type definitions we were working with in Playgrounds. This is a simplified version. I'm not including all the relationship properties and things like that. So we read those environment or those type definitions. We create a Neo4j driver instance using our environment variables. We create an instance of the Neo4j
graphql package with our type definitions and driver. And then we go through the schema generation process. So that's taking our type definitions, generating a fully functional
graphql api from that. And then that includes generating our resolver functions. And then we pass that to
apollo server. I'm also pulling in the
apollo server plugin that just allows us to use
graphql Playgrounds instead of which is now the default is
apollo Studios. Just trying to be consistent with our using of tools there. So Rafael is asking, with Neo4j schema, can we also use the envelope framework? I'm not too familiar with envelope. That is the
graphql guilds
tooling for... What is that exactly? Is that for combining
graphql services? Oh, for plugins. I'm not sure. I haven't used this myself. So that is the answer. Not sure. Good to try, though. I will add it to the list. I would imagine... Basically, all the Neo4j
graphql library is doing is generating resolvers from type definitions, combining those to create a Neo4j
graphql schema object. So an executable schema object. And it is then... Gives you that object to then work with, which follows the
graphql spec and all of these things. So you should be able to use that schema object then with other pieces of
tooling from the
graphql ecosystem. So not sure specifically about envelope, but also for extending the context. Okay, so you can have more information injected in the context object then with envelope. Okay, cool. Yeah, that would be one to try. I'm not something I've tried. Cool. So that is a look at connecting our code example running in code sandbox to our Aura instance. I had an exercise for this one. Maybe we'll skip that and just talk a bit about custom logic and maybe we'll talk through authorization as well. We have about 30 minutes left. I'll make sure we can cover everything. Even if we don't have time to work through some of the exercises, that can be maybe some things to play with afterwards. Cool. So let's talk about custom logic in our
graphql api. So, so far, we've just seen sort of what we call CRUD, create, read, update, delete, that sort of basic functionality for generating a
graphql api. But we haven't seen how to add custom logic. Maybe we have some business logic. I always like to use the example of recommendations, right? Because that's something that can apply to lots of different domains and so on. But this can really be any sort of custom logic beyond your basic CRUD functionality. And there's two ways to do this with the Neo4j
graphql library. One is use the one is use the at-Cypher
graphql schema directive. So we saw an example of this when I was talking about the features of the Neo4j
graphql library that allow us to take custom Cypher statements, bind those to our
graphql schema, and then those statements are included in the single generated
database query. And then the other approach is we can always implement resolvers. So we can implement custom resolvers and then include those in our
graphql server and have those resolvers called alongside the generated resolver. The thing to be aware of here is that those resolvers will still be called in that nested fashion. So we may end up making multiple round trips between the generated
database query and our custom resolver. But this is common if we have another system that we need to call out to outside of Neo4j. So let's look at some examples of using this Cypher directive. So we're going to go back to looking at the bookstore
api example that we saw earlier. So here we're adding a subtotal field to the order type. So remember we had orders and orders were connected to books. We could place one or more books in an order. But it would be helpful to know what is the subtotal. So what is the price of every book? Add those up. That's the subtotal of our order. And so we can easily define that logic in Cypher with this Cypher statement and match on this. So with the Cypher directive, there's a variable this that is automatically defined in these Cypher statements. And this refers to the currently resolved object. So in this case, this refers to the order that we are currently resolving. Similar to the first argument passed to a resolver, which is the object that we're currently resolving. But anyway, that's available in the Cypher statements here with this keyword. So match where this order contains books, return the sum of the book price that gives us the subtotal, which here we've indicated is a float field. So once we define that in the schema, then that subtotal field is available just alongside all the other fields. There's no indication to the client of the
graphql api that there's anything special about this field. So we can return scalars from the Cypher statement. So we can return a string. We can return an integer. We're returning a float in the previous one. We can also return nodes or objects. So here's the recommended field, which I like to use as an example. So here we're adding a recommended field on the customer type. So a customer based on their purchase history, what are other books they may be interested in. And you can see the query that we're using here goes through their orders. Their orders contain books. What are other orders placed by other customers that contain books that this user did not order. Those might be good recommendations. So of the people reading the books that I'm reading, what other books are they reading that I haven't read yet, that might be a good recommendation. That's basically what the Cypher query is saying. But the important part here is that we're returning rec, which is, you can see here, rec is one or more book nodes. So we indicate here in our
graphql type definitions that recommended is resolving to an array of book objects. And to make that work in the Cypher statement, we just return node objects. And here we are searching for customers recommended books, we get an object field, so we need to select the title from that. We can use field arguments with the Cypher directive fields. So any field arguments that we add in the type definitions are passed to the Cypher statements as Cypher statements. As Cypher parameters. So here we've added a limit argument, which takes which is an integer, setting a default value of three. So this allows us to specify the number of book recommendations to return. So here, we're saying we want to return by default three books, but at query time, we can change this. It's useful to note here, if we go down this path, it's helpful to set default values for these field arguments. Because we then will know that they're always defined when we're writing our Cypher statements as Cypher parameters. So here we've specified, we just want one recommended book. Now we can also return objects that are not actually stored in the
database from a Cypher directive field. And this sounds, it sounds a little odd, but this is something we can do. This is something we can do. Oftentimes, maybe we want to change the shape of the
data that we're returning. Or in this case, as example, we're calling out to an external system using a built-in Cypher procedure. So Cypher has a standard library called APOC that adds some additional functionality to the query language. And one of those procedures allows us to call out to other JSON APIs. So here we're doing an APOC load JSON calling out to this weather
api. And so in this example, for our address, so the idea is for our delivery drivers that are delivering orders, orders are connected to an address node. And on that address node, we want to add this weather object that is going to tell the driver what the weather is at that address. So we add a new type, weather that has things like temperature, wind speed, precipitation, these sorts of things. And then we're calling out to an external weather
api to fetch that
data. And when we return that, you can see in the return clause in our Cypher statement, this is a function of Cypher we haven't talked about yet, but we can return objects or maps or dictionaries that sort of are projected from
data. And so in this case, we're returning a weather object, even though that doesn't map to
data that we've stored in the
database, as long as we've defined that in our type definitions, that will work with
graphql. And again, that's added as an additional object field on the address type here. So we can see the weather for these various orders. So, so far, the fields we've added have been in fields on types like book or address, we haven't seen custom root level query fields or mutation fields, but we can use Cypher directives on query and mutation fields. So here, we're creating a full text index in the
database, and then adding a query field, so a new root level query field to do a full text fuzzy search. And this is a bit of a contrived example, because we can actually use a directive to enable full text search with Neo4j
graphql library. But if we had some other logic for defining root level queries, this allows us to do fuzzy matching against the title of book. So here, we misspell the title of the book, and it uses a full text index to find any matchings. And we can do the same thing for mutations. We didn't talk much about the mutation
api that is generated when we use Neo4j
graphql library, but it follows the convention that we generate a create, update, and delete operation for each type that we define. But if we want to have some more custom logic, we can take an input object and then define a Cypher, how we want to handle that
data in a mutation. And then, as I said earlier, we can use the Cypher schema directive, or we can write custom resolvers. So here, we've written a custom resolver to add an estimated delivery time for our order. This, again, is quite contrived, because this is just a
javascript function to find that at random. But imagine that we had a more complex logistic system or something like that, where we had to write some custom code to call it to that system. We write the resolver function, and then in the type definitions, we add this ignore schema directive, so that we are basically telling the Neo4j
graphql library that we're going to use this function, and then telling the Neo4j
graphql library that we're going to ignore this field, we're not going to try to fetch it from the
database. We're writing a custom resolver to find that. So we have an exercise for this to add a similar field to the movie type. I think we have 20 minutes left. I think what we'll do instead is we'll leave this as an exercise, and instead, we'll talk a bit about authorization, and I'll just talk through this and then answer any questions that folks have. So if you have any questions, drop them in the chat. Anything we didn't cover, I'll try to address and at least find follow-up links to share with folks. But let's talk through some of the authorization functionality, and then we can answer some questions at the end here. So there is some authorization functionality built into the Neo4j
graphql library. It's powered using another schema directive. So these schema directives are really quite powerful. I think the
documentation... Yeah, here we go. This page in the
documentation lists all of the schema directives that are available, and like I said, this is how we can configure the generated
api for the Neo4j
graphql library. In the auth directive... Actually, let's take a look at the docs. The auth directive... Here we go. This is how we define authorization rules for protecting
data, so ensuring that only users that should have access to
data are able to access it through our
api. So the convention with the auth directive is we use JSON web tokens, or JWTs, or JOTs, as they're commonly called, and the idea with a JSON web token is we take a JSON payload that has what are called claims, and so this can be things like the user ID, the email, the roles that a user has, and then those claims are cryptographically signed by the server. And we can then verify that those claims are accurate, that those are valid claims in the JSON payload because we can verify that that was cryptographically signed by our application. And so we pass this JSON web token as an authorization header in the
graphql request, and the Neo4j
graphql library will then decode that JSON web token and apply those claims to the
graphql request. And here's an example. It's just showing an encoded JSON web token and the JSON payload that is available, and you can find these oftentimes if you go into your browser network tab. You can see some of these tokens that are used in a lot of web applications. You can copy the token, paste it into a tool like this, or decode it, and see what claims are in there. It's kind of an interesting thing to do. So you don't need the cryptographic key to decode a JSON web token. You can always read the claims that are in it, but it is signed to know that those claims are verified by the server. So let's talk through a couple of the auth rules, and again, this is for the book
api that we're looking at. So the simplest auth directive rule is called isAuthenticated, and this just means that the
graphql request contains a valid signed JSON web token. So in this case, in order to access the subject type, so books have a subject, in order to access that, the user just needs to be authenticated. And the use case for this might be in your web application, you show some basic information to unauthenticated users, but if they want the more detailed information, they need to at least register and sign into your application. So here's showing the difference when we are an unauthenticated user requesting a protected field versus the authenticated user, and here you can see how we are attaching the authorization token here as an authorization header in the
graphql request. The next authorization rule is called roles, and what this means is that in order to access the protected field or the protected type in the
graphql api, the JSON web token must have a certain role in the roles claim, and so here we're saying for the book type, in order to do any of the mutation operations, so create, update, or delete, the user needs to be an admin, needs to have an admin role. So only admins can update or delete
data. Allow is the next rule that we'll talk about, and this means that in order to access the information, a value from the JWT must match some value in the
database, and in this case, we're looking at the subclaim. This is how we access the claims of the JSON web token. We use this dollar sign JWT dot and then the claims in the JSON payload. Sub is the convention for the subscriber in a JSON web token. So this is saying that the username stored in the
database needs to match the subscriber claim in the JSON web token in the
graphql request in order to access customer orders. This is following a relationship going from the order node to the customer node, ensuring that the username matches the value in the token to be able to see the order. So here's an example where we're looking for a specific customer and their orders, and we are properly authenticated as that user. However, if we are authenticated as a different user, here we're asking for all orders, we get an error here, and this is a bit problematic because maybe the client doesn't know what
data it doesn't have access to. So instead of throwing an error, what would be nice is if we asked for all orders, if we just got back whatever orders we had access to. And so that's where the next rule that we'll look at, one more thing on allow, allow any of these rules really, we can combine them. So the rules array here that we're passing in the auth directive, this is or logic. So this is saying in order to access an order, you need to be authenticated as the user that placed that order, or you need to be an admin. But to go back to the issue here where we're returning an error because the client has requested information that it doesn't have access to without really knowing that it doesn't have access to that information. To get around that, we can use the where rule. So here we're saying in order to access customer information, the username of the customer node needs to match the subscriber claim in the JSON web token. So this means that a predicate will be added to the generated cipher statement to filter out any
data that the user doesn't have access to. So here we're authenticated as a user, we're asking for all customers and their orders. But because we have that where authorization rule, the generated query only is including
data that the authenticated user has access to, which is just that user's information, the currently authenticated user. BIND is useful for mutations. It basically allows us to define a rule that must exist when we are executing a mutation. So here for a review, we're saying that the author of the review must be the currently authenticated user. We don't want you to be able to update, create reviews, and then say that someone else wrote it. The author needs to be the currently authenticated user. We can also use authorization rules on cipher directives. There's a couple of ways to do that. One is apply rules like the isAuthenticated and the role directive. Those will still be checked before executing that cipher statement. Or we can also actually access the JWT payload in our cipher statement as cipher variables, cipher parameters. So here we're using the isAuthenticated rule on a cipher statement. So books for current user, doesn't make sense to try to look up books for an unauthenticated user because that would give us an error by looking up customer by this username. You can see here how we're accessing auth.jwt.sub. So this auth.jwt, this is a special parameter that we're using here to inspect the payload of the JSON web token that we've attached in the header of the
graphql request, decoded now, injected into our cipher statement for adding custom logic. So we look up the customer by username, and then we traverse the graph to find, what is this basically, all books. Oh, this is a recommendation query. So this is all books that they've ordered, finding the subject of those books, then using the subject to find other books that they might be interested in. You can see I like to use the recommendation query. It's helpful, sort of a use case that applies across many domains. Okay, and then there's an exercise here for authorization, which we will skip as well in the interest of time. So there are lots of things that we didn't cover today. There's a lot of interesting functionality that we didn't touch on, things like using the abstract types in
graphql, so unions and interfaces. Those also are available with the Neo4j
graphql library that can give us some very expressive way of working with abstract types. There is an entire OGM or object graph mapper that is available for programmatically building
database queries, but we use
graphql type definitions to define the models that we work with. We talked a little bit about relationship properties. We talked a little bit about aggregations, and then I mentioned some of the other directives that are available. If you're interested in seeing some of the functionality, just sort of cruising the
documentation linked here can be helpful. So just to reiterate some of the problems that come up with
graphql, this idea of
performance, of the N plus one query problem, of developer
productivity and boilerplate. Hopefully you can see how that is addressed by Neo4j and the Neo4j
graphql library. There's a few resources linked at the end of the slides here. So earlier I mentioned I wrote this book, Full-Stack
graphql Applications. This book is available for a free download at this link here, which I'll share in the chat. And again, the book goes into a lot more detail of how we not only build the back end of our application, but also how we integrate a
graphql api that we've built into our front end application. We go into a lot of detail on things like the authorization functionality, how we get that to work with Auth0,
cloud services, and so on. So it's definitely worth a download, worth a flip through if you are interested in the Full-Stack
graphql aspect. This is a good landing page for all things related to Neo4j
graphql. There's a few resources there linked, as well as some examples to kind of explain some of the functionality that we talked about today. And then, of course, slides, links to the code, links to the docs, as well. So thanks so much, everyone, for joining today, and we will see you next time. Cheers!