
106 min
Relational Database Modeling for GraphQL
Workshop Free
In this workshop we'll dig deeper into data modeling. We'll start with a discussion about various database types and how they map to GraphQL. Once that groundwork is laid out, the focus will shift to specific types of databases and how to build data models that work best for GraphQL within various scenarios.
Table of contents
Part 1 - Hour 1
a. Relational Database Data Modeling
b. Comparing Relational and NoSQL Databases
c. GraphQL with the Database in mind
Part 2 - Hour 2
a. Designing Relational Data Models
b. Relationship, Building MultijoinsTables
c. GraphQL
&
Relational Data Modeling Query Complexities
Prerequisites
a. Data modeling tool. The trainer will be using
dbdiagram
b. Postgres, albeit no need to install this locally, as I'll be using a
Postgres Dicker image
, from
Docker Hub
for all examples
c.
Hasura
Transcript
Intro
&
Prerequisites
Welcome to today's workshop. I am Adron. This is relational modeling with GraphQL and mine. There's a few prerequisites pointed out. Of course, nobody has to follow along. However, I have this material very much oriented toward speaking about particular topics and then actually working through with all of you through the content. So there's going to be some coding. There's going to be some design, and then there's going to actually be some spinning up of services and things like that. So if you do want to follow along to get the most from this workshop, you can absolutely do that.
And I put here that I'll be using Postgres Docker a little bit, but I have actually made it so that I don't even need to use that. The only thing that you'll need is an account, which you can sign up for even right now, it's pretty quick to be able to do so with dbdiagram and Hasura and prospectively, Heroku. Heroku will be providing the Postgres database back in for Hasura when I get to that part of the workshop and there will be time to sign up for each of those quickly, if you haven't done so yet.
[01:44] I've split this into two parts. We'll go through the first part, get a database schema built, design out some things. Thank you Daria and then we will take a short break, basically a bio break before getting into part two. Part one, again, relational modeling, we'll build ourselves out an actual schema, something like this that's dbdiagram, the way. So, that's what we'll be working in. And then in part two, we'll get more into the relational data models. What we get when we do like multi-join queries and the relationships within the database, but also from what we would get using a tool like Hasura, which I'll be using to show how it sees relationships and how it builds those together for the GraphQL queries, based on the relationships and the modeling of the underlying database.
That is what the Hasura tool looks like, will be using primarily the GraphicQL section, which makes it super easy to build out the GraphQL queries and we'll look at some of the other parts of it, like the Explorer, the code exporter, things like that. We'll also be looking at the data section to actually interact directly with the database in some of the design and everything and to execute some of our initial SQL that we'll write out.
[03:22] And I may use the Hasura CLI a little bit. It's a great tool to get if you're going to use Hasura and you're going to build yourselves the development workflow around it, definitely get the CLI too. It's got a lot of great features around doing migrations, building out the metadata and just working with and interacting through an automated way or being able to build an automated pathway for your tooling.
Database Concepts
[03:49] All right, we're going to start off with relational concepts. I will mention a few things about no SQL databases. Primarily though this entire workshop is going to be relational database oriented. So focused on the ideas behind that, but I'll mention some of the others because they are quite prevalent these days and there's a lot of shared concepts between a lot of these and it helps to know the difference between the concepts there versus the concepts in the relational models. So I'll mention that in a few parts and those things that I'll mention primarily are only focused around the columnar and document based distributed databases like Mongo and Apache Cassandra and I'll mention a little bit about graph databases like Neo4j or something like that. Neo4j, Titan and the other ones. I'm not going to bring up time series. I'm not going to bring up some of the other, no SQL niche databases at this point. However, if you are curious, be sure to follow me and sign up for future workshops and events at Hasura and under my accounts which I will provide at the end of this. So no reason to worry about those right now. And I will be, I do a lot of streaming and other things like that around all sorts of database types. So done a lot of work in that, love to share it.
Schema Design
[05:17] All right. So starting off in the relational world to cover the standard way in which things are designed for relational databases. We have schema design, which lays out the relations between tables. Now in relational database these are the tables. These are the core elements within the database that are going to be used to store data. That's you can think of a table almost like as a spreadsheet or just whatever it's going to store all of the things that we will put in and it does so splitting up things into columns. So like the source table here has an ID, a date stamp, name, URI details, and they have a data type, timestamp text, et cetera.
The tables then have relationships which are of various types. In a relational database, it's primarily you have a one to many relationship or a many-to-many relationship or you have a recursive relationship, i.e a table back to itself. If you look at the NoteJot table there and you see on the far side, that's actually a recursive relationship line being drawn back to it.
[06:46] If you're ever working with graph data, graph databases, you will notice that graph data is oriented toward edges and sides of the data and how it connects so that relationships vary and can have many different types of ways in which the relationships occur beyond just one to many and many-to-many things like that. But in a relational database, it's focused around related one to many, many-to-many or many to one, depending on which way you look at the relationship.
Here, I've connected these tables where the source ID, you'll see source ID and the source notes here, it is connected back to the ID here to draw that relationship and in the tool that we will be using dbdiagram, there is the primary key designated with a one and an asterisk which designates a foreign key, which would then make this a one to many relationship between that table.
[07:53] There's another where I pointed out directly what that is. So this NoteJot goes one to many to source notes. And what I have created in this particular situation shows a many-to-many relationship because of that middle table, I can relate source to NoteJots and there can be one or many sources for each NoteJot or one or many NoteJots per source. And that's what a many-to-many is where you can relate that back and forth, but there no limit to how many of the thing can be related to the other thing.
With the recursive relationship drawn on NoteJot here, it means that a note which I've called NoteJot can be related back to another note, which would make one of whatever it's related back to would be the parent of that note. And the ones that are related to that parent would be the children of that note. And in that sense, you can draw out a recursive relationship with that data and the easiest way that I've always found that's shown in computers all the time of a recursive relationship. Just think of a folder tree within your directory structure on your computer. That is a recursive relationship. You have as many folders as you want, as many files as you want in the folders generally speaking, and it just nests inwards as you go. So that's a recursive relationship and you can simulate the same kind of thing in a relational database doing the same thing there.
Building a data model
[09:35] All right. So building a data model, This is what I'm going to build. So somewhat of a minimal database, we were just looking at these three tables: Source, Source Notes, and NoteJot, but here there's a parent table called Connection and then three others called Formatter, Schema, and Action. And I'm going to show you in dbdiagram how to build a structure like this and we'll talk through some of the particular pieces of this model and why, I'll elaborate on why I picked certain data types, certain column names, et cetera, as we go through this.
All right. So open up dbdiagram, if you haven't already. When you go to the site at first, you'll probably, you'll either end up here and you'll need to create an account. So go ahead and do that if you haven't already. If you have, when you click the go to app, it shoot you right over here like this. Generally speaking, you saw that like ghosted schema that it showed a great way to work with dbdiagram is to use one of the reference diagrams that it gives you. And if you scroll over here to this dropdown and just click load sample, it'll load the sample for you. And it gives you information in the markup over on the left hand side of the screen here about how to draw out all the relationships and build the tables here in this tool.
[11:19] You can see, here's a primary key with an auto increment value. They're using a varchar here, a timestamp here, and I'll get to more of that in just a minute as I start to work through this and it shows some of the references just right here is a many-to-many relationship between the country code to countries and then merchants to countries. Merchants can have many countries. Countries can have many merchants and the respective country codes, et cetera. And there's some other reference points in here also. So this is a great tool and we're going to be using it to build out our initial schema. So let's do that. I'm going to just click on the dropdown and click on a new diagram.
And then over here, I'm going to start my first table like this, and I'm just going to call it source, open close brace, and then let's see, we'll go ID, UUID. So a UUID is a 16 bit, entered to your value. It looks, usually it's represented with dashes in it to make it a little bit more readable. Every database has it. It's one way to ensure that your ID is unique and unlike the auto increment, where you can run into conflicts, because it will go 1, 2, 3, 4, 5, and then if more than one person is trying to insert something, you may run into a conflict where somehow they get the five and then five gets overwritten or five gets deleted and then someone else tries to add it. And there might be a collision between those because they're not particularly unique numbers. Someone may end up with the same ID in that situation.
[13:14] A UUID, however, if you use a UUID generator, which every database has a function or something that will allow you to create it automatically and most code bases, whether they're using JavaScript or Python or whatever, there's something that will enable you to create a UUID. So you can insert without worry of conflict. And you can reference without worry of running into needing to de-duplicate, if for some reason, somebody turns off the key around the UUID that keeps it unique.
So in this case, I'm going to start off right off the bat setting this UUID as the primary key, PK. Then I'm going to add a stamp. So there's often and this is something very important to data modeling. It's a good idea to have a default set of columns that are basically duplicate across all tables that offer you audit capabilities, especially when working in the enterprise environment where audits happen on a semi-regular basis, it's always great to have certain things that the audit will require such as timestamp to represent when the data was last changed and other pertinent information that may be required the audit, like the user who made the last change. And it gives a path, a trail that the audit can be done on so that you know, all of the things that have changed or altered or manipulated the data in any way throughout the database. Some of the default things that I always add is timestamp and often I'll put in the system user, not particularly the database user, but the system user because a lot of the time the database system will keep track of the user that's changing the data anyway.
[15:11] So if you want to make it easy to be queryable in the tables themselves, you want to add a timestamp and you want to add a user, maybe the system user or the database user, depending on whatever your requirements are. So in this case, I'm just keeping it simple and I'm just going to have the timestamp not going to get into the users and everything. So that's what that is for and again, the database has a function which we can use to have it timestamp it. It puts the timestamp stamp in the column. So we don't have to like manually enter that or derive it from code or anything like that. It can be overridden, but the database can do it for us too.
I think I answered the UUID question problem. So if you have any other questions about that, feel free to ask and I will get into that more. And we'll also look at an example. So beyond just talking about this, you'll see the database generate one when we enter some data into the database shortly.
[16:16] All right. So the next thing in source, I want to do name, so put text and then URI, so like the name of this source, I'm thinking of this table as something that provides sources of where data is at or a reference to data where data is at, think of it as something where you had to look up in a dictionary or in a library or something like that. But in this case, it's the internet. You've looked it up on the internet and you want the URI.
So the actual location of the universal resource indicator, and you want a name to be able to give it a friendly name that us humans can read and understand what it is. And then finally I'm going to add details and details will basically be something like just notes that we might keep about a particular source that we've collected for some reason, like if I find an awesome page on GraphQL and React or something like that. I'd put the URI in there, add a friendly name and then add more details about what it is specifically. So basically a very glorified table of bookmarks.
[17:25] Then, let's add one for source notes like this. I'll add a source ID. So that source ID, the reason I named it that because it's not the ID of the table, it's the ID that references back to another table that this table will be related to, so I'll do a UUID because it needs to be the same as the table that it references back to and then I'm going to add a reference, an ndbdiagram, that's ref: chevron, the name of the table that it's related back to.id, the column that it's related to. So there's that and then details, text, and then stamp. I'm going to have a time stamp for this table too.
There was a question Navitech, what is faster UUID or auto increment. If you're talking about a large volume of data being inserted, you're going to win out with UUIDs. And I'll tell you a reason because if you have one system only one system with just that one system throughput writing inserts to a database with auto increments, it can be pretty fast and auto increments can be fast in all sense of reality. However, often if you have a lot of inserts, you also have a lot of clients trying to do the inserts. They're going to get very slow with auto increments because they're going to run into collisions and they're going to have other conflicts come up as you work through these inserts. If you also have updates and deletes, even more conflicts. UUIDs are not going to have the conflicts. If you're really worried about inserts though, don't have the database call any functions to create UUIDs or to do auto increments and create the unique identifier purely on the client side, make the client do that work. However, again, that can be complicated in nature because you're again taking something that is going to be a database reference, specifically the UUID in primary key foreign key relationships and you're putting the onus onto the client, which is not the database. It doesn't know why it would be creating this.
[20:12] You remove a lot of the conflicts and issues at the database level, but you also remove some of the database's responsibility. I like to have the database, use the database function to create a UUID. But sometimes you do need to have the client code or the client whatever is creating this data, you need it to actually create the UUID but as far as performance goes, the fastest way is to not have the database do it. If you have one client writing data only, auto increment is pretty fast. If you have multiple clients writing the data, you need to move to UUIDs regardless of the performance.
The other thing I would add is the difference between a UUID function and an auto increment function in almost every single relational database is so negligible that it shouldn't harm your inserts.
[21:08] The other thing to do is to take into account like batch inserts if you actually have large volumes of data to insert into the database. So that's some of the things to take into account there. Unfortunately, the answer isn't straightforward, it depends. It Depends on how you're going to write that data into the database.
All right. So we have source and source notes, and now I'm going to get the note taker thing in there. We're going to add NoteJot like this and again we'll have an ID in there, UUID and we'll make it the primary key. And then add a stamp again, like that timestamp. Then let's add the note ID. And this is that elephant ear, that recursive relationship back to the table itself, NoteJot.id, and you see it pop into the diagram there to the right. Really cool feature of this is that it draws it as you do this.
[22:19] I'm going to add details and then let's go back to source notes and we'll relate this. Let's put note ID UUID reference NoteJot.id. So now let's move this around so we can look at it better. You can see, we have that many-to-many relationship and that recursive one over here.
So that was the first part of the schema that we looked at when I discussed the specific details. Well, the details for the note would be the body of the note itself. Like if you wanted to have a recursive wide range of notes around a particular source. And in the source, the details is specific to just the correlative thing that's held in that table, in this case, the URI only. And also it's just an example. I probably wouldn't have details in both, unless there was very specific reason to have them tightly coupled to each of those entities like that. But in this case, it just seemed like an easy thing to add that could be of use as this is fleshed out. Like when someone starts to do act application development against it, they could prospectively use the details in both of those related specifically to the note or the source.
[23:56] All right. So next thing I want to create the connection. Let's see, we'll do ID UUID make that primary key and then we're going to have a stamp again, another timestamp. Because you know, as again, with timestamps it's for any updates or changes to the data at all. Stamp and then let's see what we want. We're going to do two other things. So the connection, just to describe what I'm thinking with the connection is these things here, these three tables we'll change their color. Let's change them to this color. You can see in the markup, they change it over here, like this to head color, just so we can see these differentiated from what I'm about to do. So this is just purely the source and the notes about the source that are collected. With the connection I'm going to connect the sources to particular actions.
So the idea is if the source is, let's say not just a webpage, but an FTP location, like a file transfer location or some other REST API location or a GraphQL location, I could set up actions and other things let's say, tell it what type of format it is, is it XML or JSON or something like that or is there a schema to the data that it could commit an action against and pull down from that particular source.
[25:39] So with the connection, I'll add a stamp and then there's going to be an action, a format and schema. So let's add the tables for that and then we'll go back and add the IDs to this.
So first let's table, schema, again ID just like everywhere. We'll go PK and then we'll go table formatter and I'm just going to copy this, put it in there since it's the same and all and then table, oh yeah, action. I want to do an action like that. Okay. So those are there and now we can relate these back to here with, let's see if it's over the connection we want to do. How do I want to do this? I think what we'll do is, we'll do an action ID and then we'll do a source ID. We'll need ID, we'll need that. Oh, changing the colors of the pro feature. I must have been what's the term great grandfather in or whatever. I'm not paying, yeah, I have the upgrade button. So I don't know. I must have gotten pulled in there somehow. Sorry about that. Sorry about that. Reading the questions real quick. Give me just a second.
[27:22] Nick, so, this is true. However, you can deduce that 41 likely exists, but as you stated, emphasis on "likely" exists, but it may not. It may not have even ever existed. This is a thing that we don't know, because the database may have killed that and jumped to the next increment. So even though it seems like a good thing for an audit trail, it's not super reliable. And I'm not saying that "UUID" fixes this, as per your reference there, but in an audit, one definitely needs to take that into account from the audit perspective, that just because there's an auto increment, doesn't mean that there's always the next succession of numbers in order.
The database could have skipped things for various reasons. There could have been conflicts and it skipped them and it never got that entered. A whole bunch of things. A database administrator could have gone in and said: "Skip a bunch of things. Start incrementing from 4,000," when it only got up to like 399 or something. So, from the audit perspective, one definitely needs to take into account that there's not really a smooth back and forth. There's no linked list of the data flow. The best way to get a data flow is you're going to have to have a timestamp, you're going to have to have additional data, and information and logs about the table and the data in the table. But that's a good point to bring up. Good call-out there.
[28:56] All right. Here, put that over here, here, and here. And then, let's get this in. So, this is "ref." This can be "Action.Id." And this can be "ref," "Source.Id." Now, as you can see, my naming generally follows an approach with IDs where I can easily ascertain what the relationship is without looking at the physical relationship drawn in the database. I can just look at the name and know that "Action.Id" points to the action table. I can look at "Source.Id" and know that it points at the source table.
The only one that's a little weird is "NoteJot." And the reason I called it NoteJot is because one might jot down a note that just kind of a slang way to say it. But note, however, is often in most databases, a reserved word I have found, so you can't really call something a "note" as the table name. Or at least in some databases. I don't know about all of them. And in that case, I called the table NoteJot to remove that conflict, but I routinely refer to that table when I'm calling a relationship just as "note-dot-something." Because at the core, it's just intended to store notes that someone would write about a particular URI.
[30:19] All right, so we got these in there. Schema. Oh, let's see here. Schema. Schema. What else do we want in there? Basically, we'll do the standard timestamp real quick. Oops. Timestamp. And then, we'll have the ConnectionId. We want to relate back to that, like so. And then, let's do a Formattermap. And then, because this is going to be backed Postgres, we'll do JSON as the data type. So, that's just an open-ended... It's going to describe the schema in whatever way it needs to so that we can know what the schema is of the underlying data that's in the URI, or that would be pulled from the URI.
All right, what does it not like about my thing here? Nothing like errors while you're doing stuff live. Let's just release it for a second, and then Connection. Oops. ConnectionId. Oh, I know what I did. I didn't set the type. Derp. There we go. Okay. All fixed. And then, for Formatter, we'll go with Stamp again, and then I think we'll need the ConnectionId there, too, so I'll just copy that in. And then, we'll do "Formatter," we'll do a map of that too and just make it a JSON map, basically. Let's see here. And then, action. We'll add Stamp. Do we need the ActionId? No, because multiple actions can take place on a connection, so per the relationship in connection, it will be a one-to-many relationship to action. So that should give us what we need there. And then, action. Oh yeah, I need to declare something to be the action, so we'll make it JSON, too. So then, we can define actions and really extend and elaborate on what that would be, because I don't really know what the client system would be like here. This is just, again, kind of an idea of how to build out a model for a system that would prospectively do all these things.
[32:56] So, there we go. We had that data model that I had previously shown in the slides, so this is how you would build that out. Now, the next big thing is for all of you to build out a model yourselves that we will then use momentarily as we spin it up in the Hasura product.
So, everybody take 10 minutes, take a bio break, grab a drink real quick, and also, if you haven't already, begin to build out the data model, which has at least some primary key foreign key, and some many-to-many relationships as we have done here. You could, if you want, basically just use this, and I'll make it super easy. Let's see here. I will paste it in the Discord. Boom. There it is. So if you want to grab that from Discord, you can have this exact data model or you can build out whatever you want. So, take 10 minutes. It is currently 9:38. We will come back at 9:48 and begin part two. Also, in this interlude, if you have any further questions or thoughts about this, or questions about how to model these things, please ask those questions and I'll cover anything that comes up whenever I get back. So, I will be back in exactly 10 minutes at 9:48. Or 48 after the hour wherever you're at. Whatever time zone you're at. All right. So, see you in just a few.
Data Modeling
[34:47] All right. Hope everybody was able to get a drink, take a bio break, et cetera. A few questions that I'll verbally go over right now.
Nick's asked: Why was I using Pascal case? Was it just a personal choice or some specific reason? In Discord, I also posted a blog entry where I talked about that a bit more, especially in the case of database naming conventions and such. In databases, I would strongly always stick with Camel or Pascal case. It doesn't always matter which one you go with. However, I personally tend to stick with the case-naming that is used in the tech stack.
So if it's .NET, I often use Pascal case because a lot of.NET code generally tends to follow the convention of being Pascal case. With Java, for example, I often will switch to Camel case so that then... The database, the tables, the entities that might be generated through GraphQL or whatever, generally tend to be cased. However, the coders that are writing code against these tables and these entities and other things that will be created from the database, they'll be using the same case that they already use in their code base, so then it will just look like whatever it is for them, their objects or their entities or their tuples or elements or whatever. Encourages familiarity from that perspective.
[36:24] Oh, interesting. Nivertech mentions they use ERD diagramming and CLI tools and VS Code extensions, but it doesn't generate SQL DDL. That almost seems odd. I would check into JetBrains' tools. I would check into a number of other tools. Because many, many, many ERD diagramming tools that I have seen almost always have DDL spec that they'll export. So I would definitely check out a few new tools here or there. There's got to be something somewhere.
There was another question, too, from Melanie Berger. It says: How generally do I share, communicate, and iterate on the data model with the team? So, the dbdiagram tool, which we'll look at in just a second, actually does export of the TL, gives you a SQL, it'll give you images, and then you can put that into the repo. Also, on the tool itself, you can share these diagrams, the actual diagram itself, in dbdiagram, out to the team via a link. And then, if you give them permissions, they can go and they can edit it or add to it or whatever they need to. A lot of the other tools that I use tend to have this type of capability. So, you can at least put an image in the repo or put the SQL in the repo. I mean, you need the SQL at some point, anyway, especially when you're building out that initial design and you need to get it into the database. If you don't have that, you then need to recreate hand all of the tables and everything in the database. That's quite frustrating. That would be a lot of redundancy in work. So at this point with the dbdiagram, we'll get the SQL and then we'll export it into the tool that we're going to use, which is the Hasura GraphQL server.
Hasura.io
[38:34] With that, let's get rolling here. So, we have our database schema, so we're going to get that running and actually get it built, executing some SQL. We're going to look at some GraphQL queries around that. We're going to review some of the queries and execution plans of those queries, put some data in the database, things like that. We'll further discuss the data model and how that would or wouldn't apply to the GraphQL. And also be sure if you have any NoSQL questions, be sure to add ask me, too, because I'm going to leave that kind of open-ended. But one of the things I will broach a little bit toward the end here is how you'll deal with what you get from a distributed database from a GraphQL perspective, because it can depend on the distributed database, can depend on the graph database or whatever it is, and you really have to do a lot of the heavy lifting for the GraphQL yourself to determine relationships or graphs back to things.
Like in the case of, say, Apache Cassandra, there's no relationships. There's also no joins in that database. Vastly more scalable for many reasons, because it's distributed, horizontally scalable, et cetera. But then, in the GraphQL, to get the value and the power of being able to relate the entities and such, you'll need to make those connections in whatever that server is that you'd be using for your GraphQL. So a little tricky.
[40:09] With Hasura in a relational database however, you can draw those, which makes it very, very powerful in that sense. So, before doing anything else, if you don't have a Hasura account and the requisite database backend via Heroku, go sign up right now for an account on Hasura and Heroku. With Heroku, all you need to do is go create that sign in, and then in Hasura you'll be able to do an authentication against that to get the database. Very easy in that sense. Just like a click of the button. So, if you don't have that, start signing up for those right now, and I'm going to continue just a little bit and we'll come back to that.
This is what the cloud interface looks like in the Hasura solution. We'll be clicking on New Project, which will create us a new project that looks just like that. And then, we'll use the free tier. Again, Heroku and Hasura combined gives you a free database and a free GraphQL API. And in it, you just enter the project name, the region, and then you go through, there's a button to click. It will create the Heroku database for you. As long as you have that account authenticated, which the button will do that. It'll initiate that process and allow you to just easily finish it clicking some buttons. And then, you click "Create project," and it spins up a project that'll look just like that. And then, we'll click on Launch Console and that's what we'll be working in through most of the remainder of this workshop.
[41:51] And again, this is what I built and prospectively you built, or you can grab the dbdiagram code from in the Discord chat. In dbdiagram, right up on the Export, you can export a PDF to Postgres, MySQL, SQL Server, or to PNG. Of course PNG is an image format, so you'll get an image of just the diagram. I believe it's what that is. And Postgres, MySQL, and SQL Server will all give you SQL that is specific to that relational database that you can execute against that particular database. In our particular situation, we'll be using Postgres, because Hasura primarily works against Postgres at this time. And we don't even need to deal with the PDF. We don't need a PDF of any of this. But it's there if, if you need it
I don't know if you can without a paid account, or whatever account that I'm grandfathered in as, I've been able to save the diagram. So that's another thing you can do. You can save it and you can share it and it'll give people the link that'll point back at the diagram that you just created.
Executing SQL in Hasura
[43:07] In Hasura, when we get logged in... And don't worry, I'll be navigating through this. I'm just showing you a quick overview of what we're about to do. We'll go in and we'll paste the SQL code that we get under the Data tab. Under the SQL section of the Data tab is a place where we can just post in raw SQL, and then execute that raw SQL. A little bit more about that in just a second. We'll do that and we'll end up with a database. We'll be able to draw connections, look at relationships, look at the entities, and execute GraphQL queries against it, and do all sorts of whatever we need to do at that point.
We'll also take a look at the query analysis. As we execute some GraphQL queries, we'll be able to see those and look at the SQL that is actually being executed the GraphQL server. It is very important, as with any data scheme and data diagram, you want to know how the server's actually executing the SQL, how it's executing the query, whatever that might be. In this particular case, it's just a basic query against something that I had executed. Below the generated SQL, you see the execution plan. Because even if you have the SQL statement, you don't particularly know how a database server will execute something unless you look at the execution plan. The execution plan shows you the timing and other characteristics around that. Postgres has it. Oracle has it. SQL Server has it. Every relational database has some way to look at the execution plan. So, Hasura provides that capability to look at what SQL statement it's created and it provides the execution plan just below that. So we'll look at that and talk about that somewhere.
[45:00] All right. To the action. So if we go back over here to Export, Postgres. And even if you didn't do any of the steps so far, I have an easy cheat for you. It'll get you right up to where we are at. I'm going to copy all of the SQL into the Discord chat. So in two seconds, one, two, boom, all of it is now in the Discord chat. So even if you didn't do any of that, you can still follow along from this point, if you would like to.
Now, I just want to cover a few tidbits. Little pedantic detail around relational database, especially if you want to speak in great minute, pedantic detail about a database. These are tables, as I have mentioned. The things in that tables are columns. The data that is stored in those columns within a table are actually referred to as a tuple. It's a single row of a table, and within that it's that single row of data. So each single row is considered a tuple. Now in programmer speak, that could be a little confusing, because in some, a tuple may be a thing with two or three values or whatever. Kind of varies just a little bit in the programming languages. But in databases, relational databases, a tuple is the row.
[46:42] Okay. The relation schema is the schema with all of these relationships here. A schema could be just tables without the relationship, so sometimes you hear very specifically stated "relation schema" versus just schema. A degree is the total number of attributes. Okay? So the individual values within a row... Attributes. That's what an attribute is. Which in relation is called the degree of the relationship, so like references. You see those. Those are your degrees within your attributes, et cetera. The cardinality, that's a word that comes up quite often with relational databases. The cardinality is the total number of rows present in the table. Okay? So high cardinality would provide a lot of data around various things. A column, which I've stated, that's these. Let's see. What else? Making sure I haven't missed anything. I'm going through a little punch list in my head here.
The relation key. Sometimes things are referred to specifically as, "This is the relation key." Okay? And the primary key or the foreign key, this other thing that it's related to, are both referred to generally as a "key." Okay? Now, from a performance perspective, these receive an index. When you make something a primary key or foreign key, they receive a type of index. You can add indexes at the database level to other columns. I'm not going to go into that, but it's something that's very important to know, especially in relational databases, because indexes can dramatically improve the performance of scans, which in turn, as we look at these queries, you'll see in the execution plans what the scans are doing. So if you see something taking a lot of time on a particular column, like if you say "something-something based on this thing," you can improve that performance often ensuring that a particular column that you're doing that scan against is indexed.
[48:57] Yes. Navertech points out: There's also a cardinality of the distinct values in a specific column. So cardinality is one of those words that can mean a thing or many things, depending on the specific context in which you're talking about. And it is quite often used. You'll see it in newspaper articles, you'll see it in database articles, et cetera. So it's important to note what it specifically means in a general sense, and then specifically what it's applied to in the particular context, what it's used for. That's the same thing with tuple. And I bring that up because as you go from the development tech stack in JavaScript or C# or Java or Python, or whatever language you may be using, to the database and then back again, you may see these similar words, which do have either very distinctively different meanings or just slight variations on the meaning, it helps to know the difference so that you know what the context of it is with either the tech stack or the database.
All right. I think that covers the basis of just some of that pedantic detail there. All right, so we have the SQL, and hopefully you have created your Hasura account and your Heroku account. And I'm going to go in and we'll create ourselves a new project. And I'll call this the "awesome-workshop-project." Just going to click on Free Tier. Select the region. The free region is Ohio. There's a lot of other regions, depending on where you're at, that you might want to use, but I just go with the free tier just because then all of us can use it and we can do exactly the same thing and get it to work. I have named the name too long. I should know better than that. Let's just call it the "awesome-workshop."
[51:00] There we go. Okay. So then, go to "Database setup." Going to click "Try with Heroku." I'm going to click the Heroku button. Right here, if you haven't done so already, this is where the auth would begin authentication. You'd click "okay," et cetera, and then it would bounce through, and then create the Heroku app, install Postgres, get the database URL, put that into here, and then... Oh, quick note. You can take this, this database link, and you can use it in a Postgres connector in any tool that you may be using against Postgres, if you're doing that. So it's helpful to know that's what that is. And click "Create project." Did I just delete something? Well, I'm just going to quickly refresh. I'll have to do that over again.
There we go. I don't know. I think when I highlighted that, I might have messed up the link there. But anyway, here we are. While we're in this, just some quick context while it's generating the server. It's useful to know where the environment variables are. If you do this, one of the first things you often want to do is to remove the console from the live view, because literally anybody... Like everybody right now could go to each other's console if you're creating this, because it's initially created wide open. But what you do is you just add a new environment variable, and you go down, and... Well, I'm going to type it because it'll do it faster. Yeah. Enable console. And you'd want to set that to "false." I'm going to leave it enabled for now just so we have something to work through here.
[53:01] Now, looks like we're all set. All good. And "Launch Console." There we go. So, right now it is a GraphQL. We have nothing in here. If you click on the data, it'll go and it'll show that we just have these two schemas in our database. The "hdb_pro_catalog." Unrelated, don't have to deal with that at all. "Public" is just our general public schema that we have. And of course, there's nothing in here because we've created nothing here. It's an empty database with a respectively empty GraphQL API server. But click on "SQL" and we'll change all of that. So, I'm going to literally copy in so that if I goofed up on my copy and paste earlier into Discord, I'll get it, too. But I'm going to copy that SQL from there and I'm going to paste it into here. There we go.
So, as you can see, it does a nice, little color coding for us. We have all our notes and everything in our types et cetera. So, we should be all set, right? Another thing, you can unclick, or click Track This. Click on Track This, so that it will initially track all these tables and start to add the requisite metadata into the service server, so that it will know what GraphQL queries and capabilities and other connections that we want. So, I'm going to click run. And in a few seconds, we'll have a database.
[55:00] SQL executed. And you'll note to the left here, action connection format, or note chart scheme of source, source notes, all just popped up, as it was created. Now, click on Data one more time and it'll leave the SQL screen and just go to the general data screen that you usually see when you log in. Now, there's some foreign key relationships that are not currently tracked. So, it tracked the tables, but it did not track foreign key relationships. That's one thing that we're going to want to do. But, before we do that, let's take a look at the relationship, to see what we get here. Action, for example, if I click on action, and then go over here to relationship, you'll see that the relationship is not currently added. So, don't click anything yet. We'll go back to that.
The other thing you can do is... Let's say source, go to source. We got insert row, modify. This modifies the table itself. We have, again, the relationships and the two connections that it believes we may want to infer for the GraphQL entities will know that they're connected in that way, right? The database knows, because we drew the database connections and the primary form key relationships in the SQL. But, the service server hasn't made that connection in its metadata, to strongly connect those just yet, because it's given us the option to do so. But, it sees in the database that those are obviously there, because they are actual connections within the database itself.
[56:38] So, let's go back... To just click on Data, and then click on track all. It's going to add all those relationships. Should just take a second, or two here. How's the font size, the way everybody? Is the font size okay? Should I bump up the font size? Any other qualms of that sort? Just let me know and I'll see if I can tweak it up and down, or anything. Because, I know some of these fonts get a little small on the 1080p screen size.
All right, so the relationships are added. Now, if we go into, let's say action, go to relationships. You see, now it's an actual relationship. There's not an add button anymore, there's just an edit. So, you click that and you can actually change the relationship if you wanted to, but we don't need to do that. However, there are a few other things that we do want to do in this database that was not done before.
[58:06] This is a point where if I was doing this within a team perspective, from a workflow perspective, right? With trying to get everything into the repo, as Melody mentioned earlier, how do you share all this stuff with the team. At this point, this would be the first point where I would immediately set up migrations to automate the creation, or changes of the database. However, I'm not going to do that today, because we're kind of focused on a different topic. But, it's very important that that's started as soon as possible. So, you can have a clean flow of migrations from your version one migrations to your version two database. And they can stay in sync with your software development, or your report writers, or whoever's using your database, and executing GraphQL queries et cetera against the database. You want to keep that in sync with them. So, you can always give them the version spec and they know exactly what they're getting in the database.
So, if we look at relationships, that's good. But, the other thing is, let's go modify. Well see, this is UUID, primary key and it is unique. That is what we want, but we want to change the default to the random UUID, so that the database creates the UUID for us, okay? Now doing this in Postgres, as well as many other databases, you get the UUID, the database generates it for you. However, that doesn't prevent you on the client side, if you need to, to create a UUID and this is a very cool thing that you're unable to do, with UUID, is that you cannot do with auto increments. And it's very important, especially if you have a system with a lot of users reading and writing data and stuff, that's going to have relationships between the primary data created and the foreign key data created that may need to be written into multiple tables, but you don't want to do a bunch of trips to the database, because you want to speed up the performance of the website.
[01:00:15] So, the way to do that, is you have the website, create that initial primary key value. In this case, let's say source, okay? But, as soon as they create that, it's set up, it's ready to be saved. But, the user also wants to do some other things like create the action, the formatter, or other information like the notes around the source, at the time of the creation of the source itself. And if that's the case, and they're still creating that data, like you have all of this stuff on a form where they can create these things, you may want them to create more of that information, and then write all of it at one time to the database. The only way to do that is to use UUIDs. And then, when you draw the primary foreign key relationship on the client, as they create that data, you'll have created that UUID, to be able to pass to the database, so that the database correctly draws these correlations between this data that's created on the client side.
You can push all of it at one time. So, you have basically a batch insert against a multitude of tables and UUIDs enable you to do that. Whereas, auto increments would not allow you to do that, because you may run into conflicts, as multiple users are trying to write things to the database. And it would necessitate a round trip every single time, you would write the insert and you would have to ask for that auto-incremented ID back, and then take that for the foreign key related data and use it for that, as you do those things, and then send that data in. So, there's that whole round trip that is required in those situations. Whereas, with UUIDs not necessary.
[01:02:01] This is also very, very important with GraphQL, because if you take a complex entity and you have the related elements within that entity of entities, and you try to write all of it back, you may just have the pertinent things you need, but you want it to minimize the trips to the database and the looping back and forth that occur. So, in GraphQL, will hide that, but as the database administrator, the person who's managing the database, you'd want to make sure that it's not doing any more round trips than is necessary. And that is fixed using UUIDs as your primary key, foreign key relationships, throughout your data model. So, I'm going to save that with the generate there. So, that way I don't always have to get that from the client's side, if I don't want to.
And then, for the stamp, I'm going to take the default and I'm going to go with now. Notice how Hasura gave you the auto complete, but it gave it to you based on the data type. Very, very cool. Partially, it's from the data type, the naming pulls it off of that. So, helps a lot when you're trying to go through and make some quick changes to get the auto completes that are specific to what you're trying to do.
[01:03:27] So, all right, we've got the ID and the stamp in there. So, clock underscore timestamp, I think that's dependent on database type. Now. See, I don't actually know the specific difference between that. Do elaborate if you know, Pravin. I just use now, because it tends to be the default function between other databases also, which I like. And Nivertech asks upsert pattern with UUIDs, or maybe I did potent mutations based on UUID primary key. So, upserts are interesting. And depending on the database, they behave differently, because this is definitely a good, no SQL question.
In Apache Cassandra upserts can be dangerous. Because, you may have an ID, which is your primary key and it may be multi column. Because, the primary key can have multiple columns. It's called a, what's it? A candidate C column. So, if you have that, and one column is UUID, one column is a name and one column is an email address. And then, you try to insert a new record with the same name and email address, because you just want to update the name and email address. If you're using UUID you end up with a duplicate, because the UUID would be generated differently. And it would be a different value. So, even though the name and the email address are the same, it would make a new row in the database. So, that's one of the problems with upserts.
[01:05:32] And upserts aren't a functional thing that occurs. There's still just an insert occurring, but it's referred to as an upsert, because you specifically have to map what is your ID in the upsert, to ensure that the update that you're expecting is actually occurring. It's also probably good to mention that updates are a mythical connotation within databases. Because, what's actually happening is a delete insert on the database side, whenever you do an update. When you do an update, those two things occur. So, keep in consideration that you're doing two actions every time you do an update in a database.
Oh, what else is there to mention about that? There's a lot. There's a lot to think about in database design and development. So, yes. Back to the database, specifically here around these tables. So, in actions I've updated that. I'm going to update source, because I want to play around with source a little bit, to show some things. So, let's actually get this and add the UUID there and we'll add the stamp.
[01:06:57] All right, there we go. And then, let's go ahead and do it for... Source notes, doesn't need it, because in every situation you would always be taking the source ID and the note ID to draw this relationship, as you do the insert in the source notes. So, no function necessary there. However, in stamp you want to do now also...
Excellent. So, what is the difference, as Pravin asked, between now and clock stamp? So, quick little side diversion here on good old stack exchange. Says, "I see no difference." However, they state, ah, there is no difference. So, SQL standard functions, return values based on the start time of the current transaction, current timestamp, transaction timestamps, equivalents, current timestamp. This is to traditional Postgres equivalent to transaction timestamp. So, there we go. Good to know. So, I guess now, is basically a wrapper to that, just like now, is often a rapper in other databases to those things. It's a very good question though. And to elaborate on the concern that the questions often derived from, you need to make sure...
[01:08:20] Did I? Oh, there, okay. Thought I lost my tab for a second. But, you need to make sure that whenever you're using any type of timestamp, it's stamping it in the zone, in the format and with other details that you expect and, you know the criteria of that. For example, is it using a time server? Right? Or, is it using the local database time server? Or, time clock, right? The clock of the local server. Is the local server in your time zone? Or, is it on UTC? Or, is it even set to UTC, but in a different location? There's a lot of different variables around times and timestamps, you need to take into account.
Especially, even more so, if the timestamp is being used for a legal audit, that has to be accurate then. So, often in that situation, you'd want to standardize on something specific, that is well-documented, like UTC, and then do any conversions to other time zones, or whatever, based on the UTC for that audit and the timestamp. So, knowing what the function does and how accurate, or possibly inaccurate a function can be, when doing a timestamp is hugely important.
[01:09:37] All right. So, we got source notes, schema, NoteJot, let's look at that one. Pop UUID in there. And then, one I'm just going to real quick do all of them. So, against stamp. Now, the cool thing. If you use the Azure CLI, all of this, as you do this, in the console, would be automated and kept in migrations. So, whenever you execute your DDL, you execute your database schema. You can move back and forth between the versions that you keep, this would be version two that I'm working on now with these extra default functions in place, right? So, with migrations and using the CLI, you can automatically go back and forth, between those, with simple Azure Migrate apply, and the console, you can execute from your migrations, to bring it up, and then make these changes. And it writes the sequel into your migrations folder, within your project. So, very easy, as Melanie mentioned earlier, how do you keep up with these things in your project? It's very easy then, because it creates the SQL for you, to have in your migrations directory, that you can put into your repository and into whatever other collateral you need for your project. Whether it's JIRA cards, or something else, great way to be able to do that. Makes it very easy, so that you don't have to write any SQL that you don't really need to be writing. So, really great option there.
So, it's your known option formatter. Good at quick, click, last few here. Oops. If you have a preference too, as Pravin mentioned. With the stamps, which I just clicked on something here, oh, there we go. With stamp and you don't want to use now, there's all those other functions. You could just go with another default. They all show up in the dropdown. So, it makes it easy to determine which one you want. And to just go with that. All right. So, we got those.
[01:12:10] Now, let's go into the world of GraphQL. With Hasura, it introspects against the database, which in this case, is the database that we've just created, against our particular tables. It looks at our relationships and it infers things. And it infers what ought to be entities. In this case, the entities are generally the table. So again, as I mentioned earlier, it's important to pick the casing, helps to bring familiarity to what the concept is of the thing that you're bringing back that you're querying against et cetera for people writing code against this particular API, right? So, pick your Camel case, or your Pascal case accordingly.
In here, let's see here. We were going to do source. Let's do a basic query, pick ID, name, stamp and URI. Of course, we have nothing in here. We execute that, boom, nothing, right? However, we do have something, we've executed a query. So, if you look at the history, my query has been executed. If you change the name to Another Query, for example, and execute that, boom, it shows up in the history. And you can go back to these various queries and look at them like that. Very easy, just clicking on things.
[01:13:40] The Explorer, I'm going to close that for a second. I'm going to close the history for a second. Now, let's say you need to write some code, that's going to execute this GraphQL statement. Click on Code Exporter. You have options here Typescript, JavaScript. As you can see below, provides the code that you need to execute that, basically in copy and paste format. You can change from fetch, or you can use React Apollo. It shows pertinent code base for that. So, super handy feature there. Let's get rid of code exporter for a second. Let's click on Voyager, Transmitting the space.
All right, this may be a little overwhelming at first, okay? So, we're going to zoom in. And as you can see, this is a map of the metadata that was created about the database, that has all the correlations et cetera. Query roots for each of the tables we created. Action, action aggregate, connection, formatter, note job, schema source et cetera. And then, that maps out to... Look at the nodes. The min max's, looks at the data types and other things like that. You got your connection source, and the list goes on. So, a great way to figure out what is happening, as far as the database is concerned in the design of the GraphQL on top of the DDL-built database schema, okay? Very useful, very important place, as you begin building more complex models and you need to troubleshoot what you want to do with your GraphQL API, and determine what you want to do, as far as the model, but also your entities and things like that.
[01:15:39] Then, let's do an analyze real quick, let's see what we get. That is the query analysis that I showed a little bit earlier in the slides. So, a pretty easy query, because it only executes against a single entity and in this case a single table. So, there's some nested selects, which is how the default generation of these things occurs. As you get more complex in your GraphQL queries, this becomes more complex and you can go in here and see what you're getting and executing on the server side. So, let's get into that a little bit.
First, let's thunk that out, go back to explore. And this is one of the features that I love about Hasura and its GraphQL integration. It's just like, let's go into source. And then, let's change this to a mutation and you'll see all the inserts, updates, deletes et cetera now, okay? Let's go with source. We'll do an insert and I'm not going to insert ID. So, I'll take that out. I want to do name and I want to do URI, that's it. Because, ID and the timestamp will be done the database, right? So, let's get rid of this thing.
[01:17:07] And then, control space, just like in normal GraphQL will give you the auto-complete for things. So, I just want return from that. I'm going to ask for the ID and the stamp, okay? That'll give me confirmation around several things. One, that the record was created, when I ask for it to be created. It'll give me the ID and the stamp. So, I'll know when it was done. And I'll also have the ID, if I want to reference back to that particular record, with whatever I'm doing in my code base. So, pretty common query tactic that I use.
All right. So, first source. What's a good webpage? Anybody got a good webpage? Actually, I'm going to go to my blog. I'm going to use it as a reference. So, compositecode.blog, Put that there. And then, the name of it is Composite Crashing Code. And I'm going to add that. So, there we go. We have the ID and the stamp that comes back from that. We're good to go.
[01:18:30] Let's go to another website. Let's go to YouTube, to the Hasura HQ channel, okay? Lots of good content out there. So, big shout out to our content on YouTube. So, I'm going to take that though and just put it in here. So, now we have a few records. And boom, new ID, new stamp. So, that's great. So, now let's take that ID here. And let's add a related value in, let's hear what is source? And they want to do formatter. So, let's say YouTube has an API, so we'd be adding something to the formatter. Oh, that's the update, I want the insert. We'll have the connection ID. Wait, did I draw? I think, I drew this bad. Oh, here we go. We need to do the connection first. So, we'll do source ID. And then, yeah, there we go, okay. Oops, hide that. Schemes okay, that's good. And we'll want to do, I thought I had called... Oh yeah, the action.
So, let's see how we can do this. So, formatter. You can see where this starts to get complicated, when you're doing multi-insert. Okay, I've actually confused myself. Let me look at this again. So, I put in the source and I want to just add... Oh, I need to add the connection to the source first. So, the connection needs what? Oh, it needs all these things. So, it needs action ID, source ID. I think I left that. Let's make sure we left it nullable. Yeah, so you can add one, or the other, as it's being done.
[01:21:02] To note, that's another thing. If you have foreign key items that have required items, the primary key item, that ID needs to be available to be inserted with this record when this record is finally put all together with all its required items. And if that takes a minute, you may not have that unique identifier if you've just sent the primary key item to the client, right? And then for some reason, you haven't gotten the auto-increment value or something, so that's important to note.
So that's here, connection. We need to go with... how do you stay on this? Let's go back and do that. So ID will be generated for us, and then we want to do source ID. That's what we want to add. Oh, and I added. So change this to mutate. Mutation. There we go. And I'm going to get rid of this. We'll go to insert connection. So, we'll want ... your ID will be done for us, we want the source ID. That's what I copied. Okay. There we go. Then we won't need the stamp. And that's just going to provide the connection. Must have a selection of subfields. So that's your source. Actually, let's do this.
[01:22:57] I feel like maybe we should create the other items first and then this one. Still on that matter, I'm going to actually skip through that and we're going to look at the execution again. So we have the source, let's look at those. There we go. So we have two. No details in there. One's the blog, one's the other one. So if we do that, but then we say, where ... let's go with name, equals, we'll do HasuraHQ, like that. So now we should just get the one. Yep. It's called a single record query, so we can look right at it. And then I'm going to pull up the history, so there it is, along with the others. I'm going to take an analyzation of that and we'll look at what we got. So here you can see the where is built as such. So, Public, Source, Name, Hasura HQ is equal to the text.
It's a little weird, because you might not think of this as what sequel you would write yourself if you were to write this query, but it shows you how this is done and how these things are infilled. And you can make that determination. Then of course, here's the three things from this query. That query is executed, then this query is executed on that result set to show the sets that are returned, i.e the two polls that are returned. Okay. And it maps ID to ID, name to name, details to details so that when you see it, it comes back just like that.
[01:25:00] So, as this gets more complicated though, let's look at one of these crazy ones. Let's say we have connection and you want to show various pertinent things. You want to see the action and then you want the formatter, formatter map, stamp, IP, and then you want the source, name, ID, stamp. You can see how this starts to nest. You have connection and then action within that, within the action you have connections and then source. Now let's see if this even executes. It did. However, let's talk about this for a second.
You can see that there's dramatically more steps to what just occurred, even though there's no data to execute against. It still has to go through these steps. There's a nested loop, a sequence scan, a nested loop, index scan, another aggregate, nest loop, nest loop, sequence scan inside of that, another index scan, et cetera. So now, if this is per se, one record on this join, the next one then is executed, queried, against the set of all that. So you can have an exponential or even a Cartesian project product of the query. You could technically kill your server with certain types of queries like this. And you need to be aware of that so that you can take precautions. For example, this connection and action. You wouldn't want to call that deep into it most likely, you'd want to keep it minimal at a higher level. You'd want to do it in most like one, maybe two levels deep.
[01:26:49] The good rule to follow as with relational design is you never really want to go too far down the path with too many joins. As joins are added, you see a huge spike in decreased performance because of the correlative scans that have to be done against the database. To minimize that, you keep things one level deep, two levels deep, and you can think of those as your joins, right? As you nest into an entity object like that though, you increase your joins. So when you design your graph ... your database schema, think about the GraphQL entities and how those will be queried. So one of the things you can do to fix queries or to change things so that they can be faster or more easily queryable, if you know certain sets of data are always needed together, you can create a de-normalized table and maybe either with code or something else, have that filled at the same time that you fill your system tables of record. Okay.
Now, one way to do that is like, if we go back to data and let's say, we want that connection with its requisite related data. So we have ID stamp, but it has action and source ID. So we might want a table that has all of the columns of the action ID and source ID combined. Very de-normalized, a lot of redundant elements of data within that. So, as was pointed out earlier about cardinality, your distinct cardinality decreases, and you have a lot of redundancy within that table. But you have a single table or a single entity in GraphQL to query against. So then you can query against multiple different elements within that entity or in the SQL speak, multiple different columns to query against in that entity, to be able to get faster results and to also get results that are only nested one level deep to minimize the amount of joins and nesting within the query itself that has to be executed against the database.
[01:29:13] With that, any questions about that? Because that's where a bulk of performance and a bulk of design needs to come into play is around normalization or de-normalization of the data to change how we ask things with our queries and GraphQL of the entities. Like, what parts of the entity do we want at one particular time? Do we want more than that's available in a single entity? Do we need to expand the entity to be more inclusive of other things? Those are the things that we need to take into account. So questions, comments, et cetera?
Links
[01:30:13] As you think of any other questions or comments, I'm going to pull up my final slide because that's basically the material that I have for this. To note, this slide has the links. I will make these slides available. I'll post them in the Discord within the next few minutes. So if you want the links in the slides, you can just get the slides and they have the links. It's all on Google slides so no crazy format or apps needed for it. And I'm going to add a few things here. For example, me. You can reach me at Twitter at Adron, that's for Twitter, Twitch and YouTube also. I also stream shows on HasuraHQ around this specific topic as it relates to the Hasura tool. There's a lot of stuff out there around migrations, I'm going to be doing more and in the very near future in 2021, we will be writing apps with React, VueJS, things like that around the Hasura API server using GraphQL. So a lot of deep dive GraphQL topics in there. So be sure to follow us on Twitch, subscribe on YouTube and follow us on Twitter.
I myself am available on Twitter at twitter.com/adron. And on Twitch at twitch.tv/thrashingcode. And on YouTube at the same. We also have GraphQL Asia coming up, so if you're interested, definitely check that out. And for further training on this specific topic, as well as many others around GraphQL, how to do data modeling around GraphQL, check out hasura.io/events, got a lot of good stuff in there. And that is pretty much it. I think I might have ... let's hear. No, definitely follow me on Twitter and ping me if you're interested in credits because I think I have some pretty cool credit options I can give you. But follow me and I'll get those to you because I don't have the code with me right this second.
Questions
[01:32:56] What are intended use cases for Hasura? Is this mostly for enterprise credit apps or for consumer facing web mobile apps? Naibertech asks.
Yes. Isn't that the greatest answer, yes? So it can definitely be used very, very easily for CRUD applications, right? Because as you saw, as I worked through it, you can go in and all the CRUD features are there default, out of the box. Like all this stuff, none of this was canned at the beginning of the workshop. I did this as I would do it in any circumstance the first time I'm doing it for all of you. One reason I do that is to make sure that there's no little hiccups in between. I like to make sure any material that I present is seamless from start to finish so that your experience would be the same experience as what I'm showing you. I don't want any tricks in there. So you saw it right there. It just boom, all the CRUD capabilities are there default.
However, Hasura does a ton more than just gives you that. For instance, you can look at actions and you can add actions that do custom validation, custom authentication. It gives you those capabilities to add that and then execute GraphQL against that or create against it that way. Remote schemas allow you to connect to an endpoint that's GraphQL, and then integrate that in and even draw relationships between this other technically unrelated GraphQL endpoint and draw it to your GraphQL endpoint here at Hasura, and draw correlations between them, like relationships like we saw that it derives itself against the database that is directly pointed at.
[01:34:50] With the remote schemas though, you can point this out to those locations and basically pull in all sorts of other GraphQL APIs and kind of centralize them to provide a single source reference point for your developers or report writers or whatever, to access and have all those correlative relationships and everything that they need and ought to have to make life simpler as they do development from the singular point. So that's far beyond merely the CRUD, right?
As far as customer facing mobile apps. Yes, I would definitely use it for that. You can set up roles based permissions, authorization, all types of things there. Basically what's available in Postgres is extended beyond that. And as we add databases like MySQL SQLServer, which happening pretty soon, those permissions will also map accordingly through. So you can have all those capabilities that you need, especially like in more complex, odd enterprise scenarios to make sure that you're having the right permissions and roles and other things like that for the client software.
Other things that we have, you can basically scale this out horizontally. So for large customer facing websites or mobile apps that you would need some very high scale capabilities for, we got options around that too. If you were to use some of the high scale options for Postgres, You just point this at that high scale option and boom, you're good to go. Like some of the large RDS options within Amazon even.
[01:36:42] Expose the types of the JSON fields in Postgres through Hasura.
Okay. I'm going to ask you a question because I'm not sure the exact context. So what was it? We had action and this is JSON, right? So you can also do JSON B. Right? And I think that adds more context for some scenarios around typing and such. But if you just do JSON, I think there is, I'm not remembering exactly what we've done in the past specifically for that, to bring the JSON types across in that JSON object. I'm trying to think how Postgres even does it, if it does it. Yeah. So that's kind of a tricky one because otherwise, I mean, as you probably know, when you have JSON, if you just get the JSON and there's not something to infer the types within the JSON itself, it's generally just strings, right? If you have an array, it would be an array of strings unless designated ... unless you can find out some way that ... and infer that it's in [int 01:38:03] or something like that.
So I think you'd have to do that the same way that you would infer if you just got JSON that you didn't know about. That's a good question though. I would have to check that out and see specifically what the detail is around that because I'm not sure at this point. Somewhere in there, there's something though because I know that the types are mapped accordingly, so whatever's in Postgres ought to be available in some way form or manner. So if it's not available to infer the types as is and Postgres does it, then Hasura will at some point be able to do that if it doesn't do it already.
Perf Tips!
[01:38:56] And one more thing right here on the end. Of course, I remember this right at the end, I created a list that I did want to go through and I'm just going to paste it right here. Perf tips. These are somewhat arbitrary, not specifically related to schema, but just kind of in general. They in some ways do relate back to schema. But again, they're just kind of in general. So let's see here. Paste without formatting. Can I do that? Yes, there. These are just a few that I've been collecting the last few weeks. They're more related to things I note that specifically help the GraphQL side of queries and such from the database perspective, right? Like one, minimized joins, I mentioned that. The use EXISTS instead of IN to check existence of data. I've run into that scenario a few times. That's pretty much synonymous against anything you're going to execute ANZ SQL against. So SQLServer, Postgres, et cetera, that helps. I feel like it's kind of silly to mention, but use the appropriate data type for a thing if it's a particular data type. Ah, yes, I'll get them in there. That is a good call Robin or Naibertech. I want to turn it black, that's weird.
Also, I mentioned you can create indexes on columns in the database. Yes you can. However, that doesn't mean you get a performance boost if you go create an index for every column. Depending on how the database does indexes, which generally is similar between relational databases. You want to pick the one or two or maybe three at most columns that you're going to be querying against within the wear clause of obviously your GraphQL or your SQL as it maps through, you want that wear clause to execute against index columns, ideally. But if you're just kind of doing wear clauses against a whole bunch of different columns, then it's kind of hard to pick and choose. You want to pick the columns that will have the highest distinct cardinality, that's one thing. If there's a lot of redundancy, indexes can be iffy. So you really kind of have to trial and error through some of that until you become more familiar with how the database streamlines and enhances the execution path and the scans of the queries.
[01:41:49] And I also mentioned denormalizing the data, which is taking like that standard primary foreign key and just kind of munching a bunch of columns into a single table, that can help. But of course you lose your distinct cardinality and then you have to really kind of weight which columns to index and how many things you should have together or shouldn't have together because of the redundancy within the table. This is something, however, if you have a server that uses Apache Cassandra and you run a GraphQL on top of that, Apache Cassandra works very differently in how to do this. You have a table structure, but it's columns are stored and your wear closets can only be done against one or two columns, generally speaking at a time. But you can literally have petates of data in a table and get timely queries against that because of the horizontal scaling of Apache Cassandra for example. But again, like I'm saying, that's very different, very different database than the relational model, but it gives you that reporting capability that is not even comparable because the reads are so fast.
So in some of the databases, it seems that schema isn't ... there's a little bit of redundancy. So there's database model schema, and then there's the schema feature. Like in Postgres, you can name schema and then put tables within the schema. Okay. There's the schema design where all the tables are related to each other, but then there's the schema almost as if it were a name space that you can organize tables within. So then you have my schema dot table name, whenever you write a query. Putting things into schemas helps organize, but also can help you streamline for specific performance needs or something like that, and to break out things per business domain or other needs. SQLServer has options like that, so does Oracle, et cetera, not always called schema, but it's like a name spacing element within the database to be able to break out tables into groupings. Doesn't really change the table itself in any way, it kind of name spaces it where it breaks it out into a domain organizational structure.
[01:44:17] All right. So yeah. Any other questions? I'm going to be hanging out for the next five plus minutes at least. I am going to share these slides real quick and I will post those in the Discord right now. So if any more questions, feel free to ask them, I'll answer anything and everything I can. If not, I'll help you dig up and answer to that. In the meantime, I'm going to share these real quick and get them to you.
So thanks for attending, I hope this was useful. And again, go follow me, follow Hasura's Twitch, et cetera, go read the blog, check it out, there's so much more out there that we are producing and we'll be producing and so much more specifically around connecting database schema and modeling design to how to do it well with GraphQL online to make development faster, easier, more robust, and capable as you do so. So again, thanks. I'm going to unshare my screen, get those links to you and hang out and chat. So, thanks again, everybody. Cheers.
Table of contents
Part 1 - Hour 1
a. Relational Database Data Modeling
b. Comparing Relational and NoSQL Databases
c. GraphQL with the Database in mind
Part 2 - Hour 2
a. Designing Relational Data Models
b. Relationship, Building MultijoinsTables
c. GraphQL
&
Relational Data Modeling Query Complexities
Prerequisites
a. Data modeling tool. The trainer will be using
dbdiagram
b. Postgres, albeit no need to install this locally, as I'll be using a
Postgres Dicker image
, from
Docker Hub
for all examples
c.
Hasura
Transcript
Intro
&
Prerequisites
Welcome to today's workshop. I am Adron. This is relational modeling with GraphQL and mine. There's a few prerequisites pointed out. Of course, nobody has to follow along. However, I have this material very much oriented toward speaking about particular topics and then actually working through with all of you through the content. So there's going to be some coding. There's going to be some design, and then there's going to actually be some spinning up of services and things like that. So if you do want to follow along to get the most from this workshop, you can absolutely do that.
And I put here that I'll be using Postgres Docker a little bit, but I have actually made it so that I don't even need to use that. The only thing that you'll need is an account, which you can sign up for even right now, it's pretty quick to be able to do so with dbdiagram and Hasura and prospectively, Heroku. Heroku will be providing the Postgres database back in for Hasura when I get to that part of the workshop and there will be time to sign up for each of those quickly, if you haven't done so yet.
[01:44] I've split this into two parts. We'll go through the first part, get a database schema built, design out some things. Thank you Daria and then we will take a short break, basically a bio break before getting into part two. Part one, again, relational modeling, we'll build ourselves out an actual schema, something like this that's dbdiagram, the way. So, that's what we'll be working in. And then in part two, we'll get more into the relational data models. What we get when we do like multi-join queries and the relationships within the database, but also from what we would get using a tool like Hasura, which I'll be using to show how it sees relationships and how it builds those together for the GraphQL queries, based on the relationships and the modeling of the underlying database.
That is what the Hasura tool looks like, will be using primarily the GraphicQL section, which makes it super easy to build out the GraphQL queries and we'll look at some of the other parts of it, like the Explorer, the code exporter, things like that. We'll also be looking at the data section to actually interact directly with the database in some of the design and everything and to execute some of our initial SQL that we'll write out.
[03:22] And I may use the Hasura CLI a little bit. It's a great tool to get if you're going to use Hasura and you're going to build yourselves the development workflow around it, definitely get the CLI too. It's got a lot of great features around doing migrations, building out the metadata and just working with and interacting through an automated way or being able to build an automated pathway for your tooling.
Database Concepts
[03:49] All right, we're going to start off with relational concepts. I will mention a few things about no SQL databases. Primarily though this entire workshop is going to be relational database oriented. So focused on the ideas behind that, but I'll mention some of the others because they are quite prevalent these days and there's a lot of shared concepts between a lot of these and it helps to know the difference between the concepts there versus the concepts in the relational models. So I'll mention that in a few parts and those things that I'll mention primarily are only focused around the columnar and document based distributed databases like Mongo and Apache Cassandra and I'll mention a little bit about graph databases like Neo4j or something like that. Neo4j, Titan and the other ones. I'm not going to bring up time series. I'm not going to bring up some of the other, no SQL niche databases at this point. However, if you are curious, be sure to follow me and sign up for future workshops and events at Hasura and under my accounts which I will provide at the end of this. So no reason to worry about those right now. And I will be, I do a lot of streaming and other things like that around all sorts of database types. So done a lot of work in that, love to share it.
Schema Design
[05:17] All right. So starting off in the relational world to cover the standard way in which things are designed for relational databases. We have schema design, which lays out the relations between tables. Now in relational database these are the tables. These are the core elements within the database that are going to be used to store data. That's you can think of a table almost like as a spreadsheet or just whatever it's going to store all of the things that we will put in and it does so splitting up things into columns. So like the source table here has an ID, a date stamp, name, URI details, and they have a data type, timestamp text, et cetera.
The tables then have relationships which are of various types. In a relational database, it's primarily you have a one to many relationship or a many-to-many relationship or you have a recursive relationship, i.e a table back to itself. If you look at the NoteJot table there and you see on the far side, that's actually a recursive relationship line being drawn back to it.
[06:46] If you're ever working with graph data, graph databases, you will notice that graph data is oriented toward edges and sides of the data and how it connects so that relationships vary and can have many different types of ways in which the relationships occur beyond just one to many and many-to-many things like that. But in a relational database, it's focused around related one to many, many-to-many or many to one, depending on which way you look at the relationship.
Here, I've connected these tables where the source ID, you'll see source ID and the source notes here, it is connected back to the ID here to draw that relationship and in the tool that we will be using dbdiagram, there is the primary key designated with a one and an asterisk which designates a foreign key, which would then make this a one to many relationship between that table.
[07:53] There's another where I pointed out directly what that is. So this NoteJot goes one to many to source notes. And what I have created in this particular situation shows a many-to-many relationship because of that middle table, I can relate source to NoteJots and there can be one or many sources for each NoteJot or one or many NoteJots per source. And that's what a many-to-many is where you can relate that back and forth, but there no limit to how many of the thing can be related to the other thing.
With the recursive relationship drawn on NoteJot here, it means that a note which I've called NoteJot can be related back to another note, which would make one of whatever it's related back to would be the parent of that note. And the ones that are related to that parent would be the children of that note. And in that sense, you can draw out a recursive relationship with that data and the easiest way that I've always found that's shown in computers all the time of a recursive relationship. Just think of a folder tree within your directory structure on your computer. That is a recursive relationship. You have as many folders as you want, as many files as you want in the folders generally speaking, and it just nests inwards as you go. So that's a recursive relationship and you can simulate the same kind of thing in a relational database doing the same thing there.
Building a data model
[09:35] All right. So building a data model, This is what I'm going to build. So somewhat of a minimal database, we were just looking at these three tables: Source, Source Notes, and NoteJot, but here there's a parent table called Connection and then three others called Formatter, Schema, and Action. And I'm going to show you in dbdiagram how to build a structure like this and we'll talk through some of the particular pieces of this model and why, I'll elaborate on why I picked certain data types, certain column names, et cetera, as we go through this.
All right. So open up dbdiagram, if you haven't already. When you go to the site at first, you'll probably, you'll either end up here and you'll need to create an account. So go ahead and do that if you haven't already. If you have, when you click the go to app, it shoot you right over here like this. Generally speaking, you saw that like ghosted schema that it showed a great way to work with dbdiagram is to use one of the reference diagrams that it gives you. And if you scroll over here to this dropdown and just click load sample, it'll load the sample for you. And it gives you information in the markup over on the left hand side of the screen here about how to draw out all the relationships and build the tables here in this tool.
[11:19] You can see, here's a primary key with an auto increment value. They're using a varchar here, a timestamp here, and I'll get to more of that in just a minute as I start to work through this and it shows some of the references just right here is a many-to-many relationship between the country code to countries and then merchants to countries. Merchants can have many countries. Countries can have many merchants and the respective country codes, et cetera. And there's some other reference points in here also. So this is a great tool and we're going to be using it to build out our initial schema. So let's do that. I'm going to just click on the dropdown and click on a new diagram.
And then over here, I'm going to start my first table like this, and I'm just going to call it source, open close brace, and then let's see, we'll go ID, UUID. So a UUID is a 16 bit, entered to your value. It looks, usually it's represented with dashes in it to make it a little bit more readable. Every database has it. It's one way to ensure that your ID is unique and unlike the auto increment, where you can run into conflicts, because it will go 1, 2, 3, 4, 5, and then if more than one person is trying to insert something, you may run into a conflict where somehow they get the five and then five gets overwritten or five gets deleted and then someone else tries to add it. And there might be a collision between those because they're not particularly unique numbers. Someone may end up with the same ID in that situation.
[13:14] A UUID, however, if you use a UUID generator, which every database has a function or something that will allow you to create it automatically and most code bases, whether they're using JavaScript or Python or whatever, there's something that will enable you to create a UUID. So you can insert without worry of conflict. And you can reference without worry of running into needing to de-duplicate, if for some reason, somebody turns off the key around the UUID that keeps it unique.
So in this case, I'm going to start off right off the bat setting this UUID as the primary key, PK. Then I'm going to add a stamp. So there's often and this is something very important to data modeling. It's a good idea to have a default set of columns that are basically duplicate across all tables that offer you audit capabilities, especially when working in the enterprise environment where audits happen on a semi-regular basis, it's always great to have certain things that the audit will require such as timestamp to represent when the data was last changed and other pertinent information that may be required the audit, like the user who made the last change. And it gives a path, a trail that the audit can be done on so that you know, all of the things that have changed or altered or manipulated the data in any way throughout the database. Some of the default things that I always add is timestamp and often I'll put in the system user, not particularly the database user, but the system user because a lot of the time the database system will keep track of the user that's changing the data anyway.
[15:11] So if you want to make it easy to be queryable in the tables themselves, you want to add a timestamp and you want to add a user, maybe the system user or the database user, depending on whatever your requirements are. So in this case, I'm just keeping it simple and I'm just going to have the timestamp not going to get into the users and everything. So that's what that is for and again, the database has a function which we can use to have it timestamp it. It puts the timestamp stamp in the column. So we don't have to like manually enter that or derive it from code or anything like that. It can be overridden, but the database can do it for us too.
I think I answered the UUID question problem. So if you have any other questions about that, feel free to ask and I will get into that more. And we'll also look at an example. So beyond just talking about this, you'll see the database generate one when we enter some data into the database shortly.
[16:16] All right. So the next thing in source, I want to do name, so put text and then URI, so like the name of this source, I'm thinking of this table as something that provides sources of where data is at or a reference to data where data is at, think of it as something where you had to look up in a dictionary or in a library or something like that. But in this case, it's the internet. You've looked it up on the internet and you want the URI.
So the actual location of the universal resource indicator, and you want a name to be able to give it a friendly name that us humans can read and understand what it is. And then finally I'm going to add details and details will basically be something like just notes that we might keep about a particular source that we've collected for some reason, like if I find an awesome page on GraphQL and React or something like that. I'd put the URI in there, add a friendly name and then add more details about what it is specifically. So basically a very glorified table of bookmarks.
[17:25] Then, let's add one for source notes like this. I'll add a source ID. So that source ID, the reason I named it that because it's not the ID of the table, it's the ID that references back to another table that this table will be related to, so I'll do a UUID because it needs to be the same as the table that it references back to and then I'm going to add a reference, an ndbdiagram, that's ref: chevron, the name of the table that it's related back to.id, the column that it's related to. So there's that and then details, text, and then stamp. I'm going to have a time stamp for this table too.
There was a question Navitech, what is faster UUID or auto increment. If you're talking about a large volume of data being inserted, you're going to win out with UUIDs. And I'll tell you a reason because if you have one system only one system with just that one system throughput writing inserts to a database with auto increments, it can be pretty fast and auto increments can be fast in all sense of reality. However, often if you have a lot of inserts, you also have a lot of clients trying to do the inserts. They're going to get very slow with auto increments because they're going to run into collisions and they're going to have other conflicts come up as you work through these inserts. If you also have updates and deletes, even more conflicts. UUIDs are not going to have the conflicts. If you're really worried about inserts though, don't have the database call any functions to create UUIDs or to do auto increments and create the unique identifier purely on the client side, make the client do that work. However, again, that can be complicated in nature because you're again taking something that is going to be a database reference, specifically the UUID in primary key foreign key relationships and you're putting the onus onto the client, which is not the database. It doesn't know why it would be creating this.
[20:12] You remove a lot of the conflicts and issues at the database level, but you also remove some of the database's responsibility. I like to have the database, use the database function to create a UUID. But sometimes you do need to have the client code or the client whatever is creating this data, you need it to actually create the UUID but as far as performance goes, the fastest way is to not have the database do it. If you have one client writing data only, auto increment is pretty fast. If you have multiple clients writing the data, you need to move to UUIDs regardless of the performance.
The other thing I would add is the difference between a UUID function and an auto increment function in almost every single relational database is so negligible that it shouldn't harm your inserts.
[21:08] The other thing to do is to take into account like batch inserts if you actually have large volumes of data to insert into the database. So that's some of the things to take into account there. Unfortunately, the answer isn't straightforward, it depends. It Depends on how you're going to write that data into the database.
All right. So we have source and source notes, and now I'm going to get the note taker thing in there. We're going to add NoteJot like this and again we'll have an ID in there, UUID and we'll make it the primary key. And then add a stamp again, like that timestamp. Then let's add the note ID. And this is that elephant ear, that recursive relationship back to the table itself, NoteJot.id, and you see it pop into the diagram there to the right. Really cool feature of this is that it draws it as you do this.
[22:19] I'm going to add details and then let's go back to source notes and we'll relate this. Let's put note ID UUID reference NoteJot.id. So now let's move this around so we can look at it better. You can see, we have that many-to-many relationship and that recursive one over here.
So that was the first part of the schema that we looked at when I discussed the specific details. Well, the details for the note would be the body of the note itself. Like if you wanted to have a recursive wide range of notes around a particular source. And in the source, the details is specific to just the correlative thing that's held in that table, in this case, the URI only. And also it's just an example. I probably wouldn't have details in both, unless there was very specific reason to have them tightly coupled to each of those entities like that. But in this case, it just seemed like an easy thing to add that could be of use as this is fleshed out. Like when someone starts to do act application development against it, they could prospectively use the details in both of those related specifically to the note or the source.
[23:56] All right. So next thing I want to create the connection. Let's see, we'll do ID UUID make that primary key and then we're going to have a stamp again, another timestamp. Because you know, as again, with timestamps it's for any updates or changes to the data at all. Stamp and then let's see what we want. We're going to do two other things. So the connection, just to describe what I'm thinking with the connection is these things here, these three tables we'll change their color. Let's change them to this color. You can see in the markup, they change it over here, like this to head color, just so we can see these differentiated from what I'm about to do. So this is just purely the source and the notes about the source that are collected. With the connection I'm going to connect the sources to particular actions.
So the idea is if the source is, let's say not just a webpage, but an FTP location, like a file transfer location or some other REST API location or a GraphQL location, I could set up actions and other things let's say, tell it what type of format it is, is it XML or JSON or something like that or is there a schema to the data that it could commit an action against and pull down from that particular source.
[25:39] So with the connection, I'll add a stamp and then there's going to be an action, a format and schema. So let's add the tables for that and then we'll go back and add the IDs to this.
So first let's table, schema, again ID just like everywhere. We'll go PK and then we'll go table formatter and I'm just going to copy this, put it in there since it's the same and all and then table, oh yeah, action. I want to do an action like that. Okay. So those are there and now we can relate these back to here with, let's see if it's over the connection we want to do. How do I want to do this? I think what we'll do is, we'll do an action ID and then we'll do a source ID. We'll need ID, we'll need that. Oh, changing the colors of the pro feature. I must have been what's the term great grandfather in or whatever. I'm not paying, yeah, I have the upgrade button. So I don't know. I must have gotten pulled in there somehow. Sorry about that. Sorry about that. Reading the questions real quick. Give me just a second.
[27:22] Nick, so, this is true. However, you can deduce that 41 likely exists, but as you stated, emphasis on "likely" exists, but it may not. It may not have even ever existed. This is a thing that we don't know, because the database may have killed that and jumped to the next increment. So even though it seems like a good thing for an audit trail, it's not super reliable. And I'm not saying that "UUID" fixes this, as per your reference there, but in an audit, one definitely needs to take that into account from the audit perspective, that just because there's an auto increment, doesn't mean that there's always the next succession of numbers in order.
The database could have skipped things for various reasons. There could have been conflicts and it skipped them and it never got that entered. A whole bunch of things. A database administrator could have gone in and said: "Skip a bunch of things. Start incrementing from 4,000," when it only got up to like 399 or something. So, from the audit perspective, one definitely needs to take into account that there's not really a smooth back and forth. There's no linked list of the data flow. The best way to get a data flow is you're going to have to have a timestamp, you're going to have to have additional data, and information and logs about the table and the data in the table. But that's a good point to bring up. Good call-out there.
[28:56] All right. Here, put that over here, here, and here. And then, let's get this in. So, this is "ref." This can be "Action.Id." And this can be "ref," "Source.Id." Now, as you can see, my naming generally follows an approach with IDs where I can easily ascertain what the relationship is without looking at the physical relationship drawn in the database. I can just look at the name and know that "Action.Id" points to the action table. I can look at "Source.Id" and know that it points at the source table.
The only one that's a little weird is "NoteJot." And the reason I called it NoteJot is because one might jot down a note that just kind of a slang way to say it. But note, however, is often in most databases, a reserved word I have found, so you can't really call something a "note" as the table name. Or at least in some databases. I don't know about all of them. And in that case, I called the table NoteJot to remove that conflict, but I routinely refer to that table when I'm calling a relationship just as "note-dot-something." Because at the core, it's just intended to store notes that someone would write about a particular URI.
[30:19] All right, so we got these in there. Schema. Oh, let's see here. Schema. Schema. What else do we want in there? Basically, we'll do the standard timestamp real quick. Oops. Timestamp. And then, we'll have the ConnectionId. We want to relate back to that, like so. And then, let's do a Formattermap. And then, because this is going to be backed Postgres, we'll do JSON as the data type. So, that's just an open-ended... It's going to describe the schema in whatever way it needs to so that we can know what the schema is of the underlying data that's in the URI, or that would be pulled from the URI.
All right, what does it not like about my thing here? Nothing like errors while you're doing stuff live. Let's just release it for a second, and then Connection. Oops. ConnectionId. Oh, I know what I did. I didn't set the type. Derp. There we go. Okay. All fixed. And then, for Formatter, we'll go with Stamp again, and then I think we'll need the ConnectionId there, too, so I'll just copy that in. And then, we'll do "Formatter," we'll do a map of that too and just make it a JSON map, basically. Let's see here. And then, action. We'll add Stamp. Do we need the ActionId? No, because multiple actions can take place on a connection, so per the relationship in connection, it will be a one-to-many relationship to action. So that should give us what we need there. And then, action. Oh yeah, I need to declare something to be the action, so we'll make it JSON, too. So then, we can define actions and really extend and elaborate on what that would be, because I don't really know what the client system would be like here. This is just, again, kind of an idea of how to build out a model for a system that would prospectively do all these things.
[32:56] So, there we go. We had that data model that I had previously shown in the slides, so this is how you would build that out. Now, the next big thing is for all of you to build out a model yourselves that we will then use momentarily as we spin it up in the Hasura product.
So, everybody take 10 minutes, take a bio break, grab a drink real quick, and also, if you haven't already, begin to build out the data model, which has at least some primary key foreign key, and some many-to-many relationships as we have done here. You could, if you want, basically just use this, and I'll make it super easy. Let's see here. I will paste it in the Discord. Boom. There it is. So if you want to grab that from Discord, you can have this exact data model or you can build out whatever you want. So, take 10 minutes. It is currently 9:38. We will come back at 9:48 and begin part two. Also, in this interlude, if you have any further questions or thoughts about this, or questions about how to model these things, please ask those questions and I'll cover anything that comes up whenever I get back. So, I will be back in exactly 10 minutes at 9:48. Or 48 after the hour wherever you're at. Whatever time zone you're at. All right. So, see you in just a few.
Data Modeling
[34:47] All right. Hope everybody was able to get a drink, take a bio break, et cetera. A few questions that I'll verbally go over right now.
Nick's asked: Why was I using Pascal case? Was it just a personal choice or some specific reason? In Discord, I also posted a blog entry where I talked about that a bit more, especially in the case of database naming conventions and such. In databases, I would strongly always stick with Camel or Pascal case. It doesn't always matter which one you go with. However, I personally tend to stick with the case-naming that is used in the tech stack.
So if it's .NET, I often use Pascal case because a lot of.NET code generally tends to follow the convention of being Pascal case. With Java, for example, I often will switch to Camel case so that then... The database, the tables, the entities that might be generated through GraphQL or whatever, generally tend to be cased. However, the coders that are writing code against these tables and these entities and other things that will be created from the database, they'll be using the same case that they already use in their code base, so then it will just look like whatever it is for them, their objects or their entities or their tuples or elements or whatever. Encourages familiarity from that perspective.
[36:24] Oh, interesting. Nivertech mentions they use ERD diagramming and CLI tools and VS Code extensions, but it doesn't generate SQL DDL. That almost seems odd. I would check into JetBrains' tools. I would check into a number of other tools. Because many, many, many ERD diagramming tools that I have seen almost always have DDL spec that they'll export. So I would definitely check out a few new tools here or there. There's got to be something somewhere.
There was another question, too, from Melanie Berger. It says: How generally do I share, communicate, and iterate on the data model with the team? So, the dbdiagram tool, which we'll look at in just a second, actually does export of the TL, gives you a SQL, it'll give you images, and then you can put that into the repo. Also, on the tool itself, you can share these diagrams, the actual diagram itself, in dbdiagram, out to the team via a link. And then, if you give them permissions, they can go and they can edit it or add to it or whatever they need to. A lot of the other tools that I use tend to have this type of capability. So, you can at least put an image in the repo or put the SQL in the repo. I mean, you need the SQL at some point, anyway, especially when you're building out that initial design and you need to get it into the database. If you don't have that, you then need to recreate hand all of the tables and everything in the database. That's quite frustrating. That would be a lot of redundancy in work. So at this point with the dbdiagram, we'll get the SQL and then we'll export it into the tool that we're going to use, which is the Hasura GraphQL server.
Hasura.io
[38:34] With that, let's get rolling here. So, we have our database schema, so we're going to get that running and actually get it built, executing some SQL. We're going to look at some GraphQL queries around that. We're going to review some of the queries and execution plans of those queries, put some data in the database, things like that. We'll further discuss the data model and how that would or wouldn't apply to the GraphQL. And also be sure if you have any NoSQL questions, be sure to add ask me, too, because I'm going to leave that kind of open-ended. But one of the things I will broach a little bit toward the end here is how you'll deal with what you get from a distributed database from a GraphQL perspective, because it can depend on the distributed database, can depend on the graph database or whatever it is, and you really have to do a lot of the heavy lifting for the GraphQL yourself to determine relationships or graphs back to things.
Like in the case of, say, Apache Cassandra, there's no relationships. There's also no joins in that database. Vastly more scalable for many reasons, because it's distributed, horizontally scalable, et cetera. But then, in the GraphQL, to get the value and the power of being able to relate the entities and such, you'll need to make those connections in whatever that server is that you'd be using for your GraphQL. So a little tricky.
[40:09] With Hasura in a relational database however, you can draw those, which makes it very, very powerful in that sense. So, before doing anything else, if you don't have a Hasura account and the requisite database backend via Heroku, go sign up right now for an account on Hasura and Heroku. With Heroku, all you need to do is go create that sign in, and then in Hasura you'll be able to do an authentication against that to get the database. Very easy in that sense. Just like a click of the button. So, if you don't have that, start signing up for those right now, and I'm going to continue just a little bit and we'll come back to that.
This is what the cloud interface looks like in the Hasura solution. We'll be clicking on New Project, which will create us a new project that looks just like that. And then, we'll use the free tier. Again, Heroku and Hasura combined gives you a free database and a free GraphQL API. And in it, you just enter the project name, the region, and then you go through, there's a button to click. It will create the Heroku database for you. As long as you have that account authenticated, which the button will do that. It'll initiate that process and allow you to just easily finish it clicking some buttons. And then, you click "Create project," and it spins up a project that'll look just like that. And then, we'll click on Launch Console and that's what we'll be working in through most of the remainder of this workshop.
[41:51] And again, this is what I built and prospectively you built, or you can grab the dbdiagram code from in the Discord chat. In dbdiagram, right up on the Export, you can export a PDF to Postgres, MySQL, SQL Server, or to PNG. Of course PNG is an image format, so you'll get an image of just the diagram. I believe it's what that is. And Postgres, MySQL, and SQL Server will all give you SQL that is specific to that relational database that you can execute against that particular database. In our particular situation, we'll be using Postgres, because Hasura primarily works against Postgres at this time. And we don't even need to deal with the PDF. We don't need a PDF of any of this. But it's there if, if you need it
I don't know if you can without a paid account, or whatever account that I'm grandfathered in as, I've been able to save the diagram. So that's another thing you can do. You can save it and you can share it and it'll give people the link that'll point back at the diagram that you just created.
Executing SQL in Hasura
[43:07] In Hasura, when we get logged in... And don't worry, I'll be navigating through this. I'm just showing you a quick overview of what we're about to do. We'll go in and we'll paste the SQL code that we get under the Data tab. Under the SQL section of the Data tab is a place where we can just post in raw SQL, and then execute that raw SQL. A little bit more about that in just a second. We'll do that and we'll end up with a database. We'll be able to draw connections, look at relationships, look at the entities, and execute GraphQL queries against it, and do all sorts of whatever we need to do at that point.
We'll also take a look at the query analysis. As we execute some GraphQL queries, we'll be able to see those and look at the SQL that is actually being executed the GraphQL server. It is very important, as with any data scheme and data diagram, you want to know how the server's actually executing the SQL, how it's executing the query, whatever that might be. In this particular case, it's just a basic query against something that I had executed. Below the generated SQL, you see the execution plan. Because even if you have the SQL statement, you don't particularly know how a database server will execute something unless you look at the execution plan. The execution plan shows you the timing and other characteristics around that. Postgres has it. Oracle has it. SQL Server has it. Every relational database has some way to look at the execution plan. So, Hasura provides that capability to look at what SQL statement it's created and it provides the execution plan just below that. So we'll look at that and talk about that somewhere.
[45:00] All right. To the action. So if we go back over here to Export, Postgres. And even if you didn't do any of the steps so far, I have an easy cheat for you. It'll get you right up to where we are at. I'm going to copy all of the SQL into the Discord chat. So in two seconds, one, two, boom, all of it is now in the Discord chat. So even if you didn't do any of that, you can still follow along from this point, if you would like to.
Now, I just want to cover a few tidbits. Little pedantic detail around relational database, especially if you want to speak in great minute, pedantic detail about a database. These are tables, as I have mentioned. The things in that tables are columns. The data that is stored in those columns within a table are actually referred to as a tuple. It's a single row of a table, and within that it's that single row of data. So each single row is considered a tuple. Now in programmer speak, that could be a little confusing, because in some, a tuple may be a thing with two or three values or whatever. Kind of varies just a little bit in the programming languages. But in databases, relational databases, a tuple is the row.
[46:42] Okay. The relation schema is the schema with all of these relationships here. A schema could be just tables without the relationship, so sometimes you hear very specifically stated "relation schema" versus just schema. A degree is the total number of attributes. Okay? So the individual values within a row... Attributes. That's what an attribute is. Which in relation is called the degree of the relationship, so like references. You see those. Those are your degrees within your attributes, et cetera. The cardinality, that's a word that comes up quite often with relational databases. The cardinality is the total number of rows present in the table. Okay? So high cardinality would provide a lot of data around various things. A column, which I've stated, that's these. Let's see. What else? Making sure I haven't missed anything. I'm going through a little punch list in my head here.
The relation key. Sometimes things are referred to specifically as, "This is the relation key." Okay? And the primary key or the foreign key, this other thing that it's related to, are both referred to generally as a "key." Okay? Now, from a performance perspective, these receive an index. When you make something a primary key or foreign key, they receive a type of index. You can add indexes at the database level to other columns. I'm not going to go into that, but it's something that's very important to know, especially in relational databases, because indexes can dramatically improve the performance of scans, which in turn, as we look at these queries, you'll see in the execution plans what the scans are doing. So if you see something taking a lot of time on a particular column, like if you say "something-something based on this thing," you can improve that performance often ensuring that a particular column that you're doing that scan against is indexed.
[48:57] Yes. Navertech points out: There's also a cardinality of the distinct values in a specific column. So cardinality is one of those words that can mean a thing or many things, depending on the specific context in which you're talking about. And it is quite often used. You'll see it in newspaper articles, you'll see it in database articles, et cetera. So it's important to note what it specifically means in a general sense, and then specifically what it's applied to in the particular context, what it's used for. That's the same thing with tuple. And I bring that up because as you go from the development tech stack in JavaScript or C# or Java or Python, or whatever language you may be using, to the database and then back again, you may see these similar words, which do have either very distinctively different meanings or just slight variations on the meaning, it helps to know the difference so that you know what the context of it is with either the tech stack or the database.
All right. I think that covers the basis of just some of that pedantic detail there. All right, so we have the SQL, and hopefully you have created your Hasura account and your Heroku account. And I'm going to go in and we'll create ourselves a new project. And I'll call this the "awesome-workshop-project." Just going to click on Free Tier. Select the region. The free region is Ohio. There's a lot of other regions, depending on where you're at, that you might want to use, but I just go with the free tier just because then all of us can use it and we can do exactly the same thing and get it to work. I have named the name too long. I should know better than that. Let's just call it the "awesome-workshop."
[51:00] There we go. Okay. So then, go to "Database setup." Going to click "Try with Heroku." I'm going to click the Heroku button. Right here, if you haven't done so already, this is where the auth would begin authentication. You'd click "okay," et cetera, and then it would bounce through, and then create the Heroku app, install Postgres, get the database URL, put that into here, and then... Oh, quick note. You can take this, this database link, and you can use it in a Postgres connector in any tool that you may be using against Postgres, if you're doing that. So it's helpful to know that's what that is. And click "Create project." Did I just delete something? Well, I'm just going to quickly refresh. I'll have to do that over again.
There we go. I don't know. I think when I highlighted that, I might have messed up the link there. But anyway, here we are. While we're in this, just some quick context while it's generating the server. It's useful to know where the environment variables are. If you do this, one of the first things you often want to do is to remove the console from the live view, because literally anybody... Like everybody right now could go to each other's console if you're creating this, because it's initially created wide open. But what you do is you just add a new environment variable, and you go down, and... Well, I'm going to type it because it'll do it faster. Yeah. Enable console. And you'd want to set that to "false." I'm going to leave it enabled for now just so we have something to work through here.
[53:01] Now, looks like we're all set. All good. And "Launch Console." There we go. So, right now it is a GraphQL. We have nothing in here. If you click on the data, it'll go and it'll show that we just have these two schemas in our database. The "hdb_pro_catalog." Unrelated, don't have to deal with that at all. "Public" is just our general public schema that we have. And of course, there's nothing in here because we've created nothing here. It's an empty database with a respectively empty GraphQL API server. But click on "SQL" and we'll change all of that. So, I'm going to literally copy in so that if I goofed up on my copy and paste earlier into Discord, I'll get it, too. But I'm going to copy that SQL from there and I'm going to paste it into here. There we go.
So, as you can see, it does a nice, little color coding for us. We have all our notes and everything in our types et cetera. So, we should be all set, right? Another thing, you can unclick, or click Track This. Click on Track This, so that it will initially track all these tables and start to add the requisite metadata into the service server, so that it will know what GraphQL queries and capabilities and other connections that we want. So, I'm going to click run. And in a few seconds, we'll have a database.
[55:00] SQL executed. And you'll note to the left here, action connection format, or note chart scheme of source, source notes, all just popped up, as it was created. Now, click on Data one more time and it'll leave the SQL screen and just go to the general data screen that you usually see when you log in. Now, there's some foreign key relationships that are not currently tracked. So, it tracked the tables, but it did not track foreign key relationships. That's one thing that we're going to want to do. But, before we do that, let's take a look at the relationship, to see what we get here. Action, for example, if I click on action, and then go over here to relationship, you'll see that the relationship is not currently added. So, don't click anything yet. We'll go back to that.
The other thing you can do is... Let's say source, go to source. We got insert row, modify. This modifies the table itself. We have, again, the relationships and the two connections that it believes we may want to infer for the GraphQL entities will know that they're connected in that way, right? The database knows, because we drew the database connections and the primary form key relationships in the SQL. But, the service server hasn't made that connection in its metadata, to strongly connect those just yet, because it's given us the option to do so. But, it sees in the database that those are obviously there, because they are actual connections within the database itself.
[56:38] So, let's go back... To just click on Data, and then click on track all. It's going to add all those relationships. Should just take a second, or two here. How's the font size, the way everybody? Is the font size okay? Should I bump up the font size? Any other qualms of that sort? Just let me know and I'll see if I can tweak it up and down, or anything. Because, I know some of these fonts get a little small on the 1080p screen size.
All right, so the relationships are added. Now, if we go into, let's say action, go to relationships. You see, now it's an actual relationship. There's not an add button anymore, there's just an edit. So, you click that and you can actually change the relationship if you wanted to, but we don't need to do that. However, there are a few other things that we do want to do in this database that was not done before.
[58:06] This is a point where if I was doing this within a team perspective, from a workflow perspective, right? With trying to get everything into the repo, as Melody mentioned earlier, how do you share all this stuff with the team. At this point, this would be the first point where I would immediately set up migrations to automate the creation, or changes of the database. However, I'm not going to do that today, because we're kind of focused on a different topic. But, it's very important that that's started as soon as possible. So, you can have a clean flow of migrations from your version one migrations to your version two database. And they can stay in sync with your software development, or your report writers, or whoever's using your database, and executing GraphQL queries et cetera against the database. You want to keep that in sync with them. So, you can always give them the version spec and they know exactly what they're getting in the database.
So, if we look at relationships, that's good. But, the other thing is, let's go modify. Well see, this is UUID, primary key and it is unique. That is what we want, but we want to change the default to the random UUID, so that the database creates the UUID for us, okay? Now doing this in Postgres, as well as many other databases, you get the UUID, the database generates it for you. However, that doesn't prevent you on the client side, if you need to, to create a UUID and this is a very cool thing that you're unable to do, with UUID, is that you cannot do with auto increments. And it's very important, especially if you have a system with a lot of users reading and writing data and stuff, that's going to have relationships between the primary data created and the foreign key data created that may need to be written into multiple tables, but you don't want to do a bunch of trips to the database, because you want to speed up the performance of the website.
[01:00:15] So, the way to do that, is you have the website, create that initial primary key value. In this case, let's say source, okay? But, as soon as they create that, it's set up, it's ready to be saved. But, the user also wants to do some other things like create the action, the formatter, or other information like the notes around the source, at the time of the creation of the source itself. And if that's the case, and they're still creating that data, like you have all of this stuff on a form where they can create these things, you may want them to create more of that information, and then write all of it at one time to the database. The only way to do that is to use UUIDs. And then, when you draw the primary foreign key relationship on the client, as they create that data, you'll have created that UUID, to be able to pass to the database, so that the database correctly draws these correlations between this data that's created on the client side.
You can push all of it at one time. So, you have basically a batch insert against a multitude of tables and UUIDs enable you to do that. Whereas, auto increments would not allow you to do that, because you may run into conflicts, as multiple users are trying to write things to the database. And it would necessitate a round trip every single time, you would write the insert and you would have to ask for that auto-incremented ID back, and then take that for the foreign key related data and use it for that, as you do those things, and then send that data in. So, there's that whole round trip that is required in those situations. Whereas, with UUIDs not necessary.
[01:02:01] This is also very, very important with GraphQL, because if you take a complex entity and you have the related elements within that entity of entities, and you try to write all of it back, you may just have the pertinent things you need, but you want it to minimize the trips to the database and the looping back and forth that occur. So, in GraphQL, will hide that, but as the database administrator, the person who's managing the database, you'd want to make sure that it's not doing any more round trips than is necessary. And that is fixed using UUIDs as your primary key, foreign key relationships, throughout your data model. So, I'm going to save that with the generate there. So, that way I don't always have to get that from the client's side, if I don't want to.
And then, for the stamp, I'm going to take the default and I'm going to go with now. Notice how Hasura gave you the auto complete, but it gave it to you based on the data type. Very, very cool. Partially, it's from the data type, the naming pulls it off of that. So, helps a lot when you're trying to go through and make some quick changes to get the auto completes that are specific to what you're trying to do.
[01:03:27] So, all right, we've got the ID and the stamp in there. So, clock underscore timestamp, I think that's dependent on database type. Now. See, I don't actually know the specific difference between that. Do elaborate if you know, Pravin. I just use now, because it tends to be the default function between other databases also, which I like. And Nivertech asks upsert pattern with UUIDs, or maybe I did potent mutations based on UUID primary key. So, upserts are interesting. And depending on the database, they behave differently, because this is definitely a good, no SQL question.
In Apache Cassandra upserts can be dangerous. Because, you may have an ID, which is your primary key and it may be multi column. Because, the primary key can have multiple columns. It's called a, what's it? A candidate C column. So, if you have that, and one column is UUID, one column is a name and one column is an email address. And then, you try to insert a new record with the same name and email address, because you just want to update the name and email address. If you're using UUID you end up with a duplicate, because the UUID would be generated differently. And it would be a different value. So, even though the name and the email address are the same, it would make a new row in the database. So, that's one of the problems with upserts.
[01:05:32] And upserts aren't a functional thing that occurs. There's still just an insert occurring, but it's referred to as an upsert, because you specifically have to map what is your ID in the upsert, to ensure that the update that you're expecting is actually occurring. It's also probably good to mention that updates are a mythical connotation within databases. Because, what's actually happening is a delete insert on the database side, whenever you do an update. When you do an update, those two things occur. So, keep in consideration that you're doing two actions every time you do an update in a database.
Oh, what else is there to mention about that? There's a lot. There's a lot to think about in database design and development. So, yes. Back to the database, specifically here around these tables. So, in actions I've updated that. I'm going to update source, because I want to play around with source a little bit, to show some things. So, let's actually get this and add the UUID there and we'll add the stamp.
[01:06:57] All right, there we go. And then, let's go ahead and do it for... Source notes, doesn't need it, because in every situation you would always be taking the source ID and the note ID to draw this relationship, as you do the insert in the source notes. So, no function necessary there. However, in stamp you want to do now also...
Excellent. So, what is the difference, as Pravin asked, between now and clock stamp? So, quick little side diversion here on good old stack exchange. Says, "I see no difference." However, they state, ah, there is no difference. So, SQL standard functions, return values based on the start time of the current transaction, current timestamp, transaction timestamps, equivalents, current timestamp. This is to traditional Postgres equivalent to transaction timestamp. So, there we go. Good to know. So, I guess now, is basically a wrapper to that, just like now, is often a rapper in other databases to those things. It's a very good question though. And to elaborate on the concern that the questions often derived from, you need to make sure...
[01:08:20] Did I? Oh, there, okay. Thought I lost my tab for a second. But, you need to make sure that whenever you're using any type of timestamp, it's stamping it in the zone, in the format and with other details that you expect and, you know the criteria of that. For example, is it using a time server? Right? Or, is it using the local database time server? Or, time clock, right? The clock of the local server. Is the local server in your time zone? Or, is it on UTC? Or, is it even set to UTC, but in a different location? There's a lot of different variables around times and timestamps, you need to take into account.
Especially, even more so, if the timestamp is being used for a legal audit, that has to be accurate then. So, often in that situation, you'd want to standardize on something specific, that is well-documented, like UTC, and then do any conversions to other time zones, or whatever, based on the UTC for that audit and the timestamp. So, knowing what the function does and how accurate, or possibly inaccurate a function can be, when doing a timestamp is hugely important.
[01:09:37] All right. So, we got source notes, schema, NoteJot, let's look at that one. Pop UUID in there. And then, one I'm just going to real quick do all of them. So, against stamp. Now, the cool thing. If you use the Azure CLI, all of this, as you do this, in the console, would be automated and kept in migrations. So, whenever you execute your DDL, you execute your database schema. You can move back and forth between the versions that you keep, this would be version two that I'm working on now with these extra default functions in place, right? So, with migrations and using the CLI, you can automatically go back and forth, between those, with simple Azure Migrate apply, and the console, you can execute from your migrations, to bring it up, and then make these changes. And it writes the sequel into your migrations folder, within your project. So, very easy, as Melanie mentioned earlier, how do you keep up with these things in your project? It's very easy then, because it creates the SQL for you, to have in your migrations directory, that you can put into your repository and into whatever other collateral you need for your project. Whether it's JIRA cards, or something else, great way to be able to do that. Makes it very easy, so that you don't have to write any SQL that you don't really need to be writing. So, really great option there.
So, it's your known option formatter. Good at quick, click, last few here. Oops. If you have a preference too, as Pravin mentioned. With the stamps, which I just clicked on something here, oh, there we go. With stamp and you don't want to use now, there's all those other functions. You could just go with another default. They all show up in the dropdown. So, it makes it easy to determine which one you want. And to just go with that. All right. So, we got those.
[01:12:10] Now, let's go into the world of GraphQL. With Hasura, it introspects against the database, which in this case, is the database that we've just created, against our particular tables. It looks at our relationships and it infers things. And it infers what ought to be entities. In this case, the entities are generally the table. So again, as I mentioned earlier, it's important to pick the casing, helps to bring familiarity to what the concept is of the thing that you're bringing back that you're querying against et cetera for people writing code against this particular API, right? So, pick your Camel case, or your Pascal case accordingly.
In here, let's see here. We were going to do source. Let's do a basic query, pick ID, name, stamp and URI. Of course, we have nothing in here. We execute that, boom, nothing, right? However, we do have something, we've executed a query. So, if you look at the history, my query has been executed. If you change the name to Another Query, for example, and execute that, boom, it shows up in the history. And you can go back to these various queries and look at them like that. Very easy, just clicking on things.
[01:13:40] The Explorer, I'm going to close that for a second. I'm going to close the history for a second. Now, let's say you need to write some code, that's going to execute this GraphQL statement. Click on Code Exporter. You have options here Typescript, JavaScript. As you can see below, provides the code that you need to execute that, basically in copy and paste format. You can change from fetch, or you can use React Apollo. It shows pertinent code base for that. So, super handy feature there. Let's get rid of code exporter for a second. Let's click on Voyager, Transmitting the space.
All right, this may be a little overwhelming at first, okay? So, we're going to zoom in. And as you can see, this is a map of the metadata that was created about the database, that has all the correlations et cetera. Query roots for each of the tables we created. Action, action aggregate, connection, formatter, note job, schema source et cetera. And then, that maps out to... Look at the nodes. The min max's, looks at the data types and other things like that. You got your connection source, and the list goes on. So, a great way to figure out what is happening, as far as the database is concerned in the design of the GraphQL on top of the DDL-built database schema, okay? Very useful, very important place, as you begin building more complex models and you need to troubleshoot what you want to do with your GraphQL API, and determine what you want to do, as far as the model, but also your entities and things like that.
[01:15:39] Then, let's do an analyze real quick, let's see what we get. That is the query analysis that I showed a little bit earlier in the slides. So, a pretty easy query, because it only executes against a single entity and in this case a single table. So, there's some nested selects, which is how the default generation of these things occurs. As you get more complex in your GraphQL queries, this becomes more complex and you can go in here and see what you're getting and executing on the server side. So, let's get into that a little bit.
First, let's thunk that out, go back to explore. And this is one of the features that I love about Hasura and its GraphQL integration. It's just like, let's go into source. And then, let's change this to a mutation and you'll see all the inserts, updates, deletes et cetera now, okay? Let's go with source. We'll do an insert and I'm not going to insert ID. So, I'll take that out. I want to do name and I want to do URI, that's it. Because, ID and the timestamp will be done the database, right? So, let's get rid of this thing.
[01:17:07] And then, control space, just like in normal GraphQL will give you the auto-complete for things. So, I just want return from that. I'm going to ask for the ID and the stamp, okay? That'll give me confirmation around several things. One, that the record was created, when I ask for it to be created. It'll give me the ID and the stamp. So, I'll know when it was done. And I'll also have the ID, if I want to reference back to that particular record, with whatever I'm doing in my code base. So, pretty common query tactic that I use.
All right. So, first source. What's a good webpage? Anybody got a good webpage? Actually, I'm going to go to my blog. I'm going to use it as a reference. So, compositecode.blog, Put that there. And then, the name of it is Composite Crashing Code. And I'm going to add that. So, there we go. We have the ID and the stamp that comes back from that. We're good to go.
[01:18:30] Let's go to another website. Let's go to YouTube, to the Hasura HQ channel, okay? Lots of good content out there. So, big shout out to our content on YouTube. So, I'm going to take that though and just put it in here. So, now we have a few records. And boom, new ID, new stamp. So, that's great. So, now let's take that ID here. And let's add a related value in, let's hear what is source? And they want to do formatter. So, let's say YouTube has an API, so we'd be adding something to the formatter. Oh, that's the update, I want the insert. We'll have the connection ID. Wait, did I draw? I think, I drew this bad. Oh, here we go. We need to do the connection first. So, we'll do source ID. And then, yeah, there we go, okay. Oops, hide that. Schemes okay, that's good. And we'll want to do, I thought I had called... Oh yeah, the action.
So, let's see how we can do this. So, formatter. You can see where this starts to get complicated, when you're doing multi-insert. Okay, I've actually confused myself. Let me look at this again. So, I put in the source and I want to just add... Oh, I need to add the connection to the source first. So, the connection needs what? Oh, it needs all these things. So, it needs action ID, source ID. I think I left that. Let's make sure we left it nullable. Yeah, so you can add one, or the other, as it's being done.
[01:21:02] To note, that's another thing. If you have foreign key items that have required items, the primary key item, that ID needs to be available to be inserted with this record when this record is finally put all together with all its required items. And if that takes a minute, you may not have that unique identifier if you've just sent the primary key item to the client, right? And then for some reason, you haven't gotten the auto-increment value or something, so that's important to note.
So that's here, connection. We need to go with... how do you stay on this? Let's go back and do that. So ID will be generated for us, and then we want to do source ID. That's what we want to add. Oh, and I added. So change this to mutate. Mutation. There we go. And I'm going to get rid of this. We'll go to insert connection. So, we'll want ... your ID will be done for us, we want the source ID. That's what I copied. Okay. There we go. Then we won't need the stamp. And that's just going to provide the connection. Must have a selection of subfields. So that's your source. Actually, let's do this.
[01:22:57] I feel like maybe we should create the other items first and then this one. Still on that matter, I'm going to actually skip through that and we're going to look at the execution again. So we have the source, let's look at those. There we go. So we have two. No details in there. One's the blog, one's the other one. So if we do that, but then we say, where ... let's go with name, equals, we'll do HasuraHQ, like that. So now we should just get the one. Yep. It's called a single record query, so we can look right at it. And then I'm going to pull up the history, so there it is, along with the others. I'm going to take an analyzation of that and we'll look at what we got. So here you can see the where is built as such. So, Public, Source, Name, Hasura HQ is equal to the text.
It's a little weird, because you might not think of this as what sequel you would write yourself if you were to write this query, but it shows you how this is done and how these things are infilled. And you can make that determination. Then of course, here's the three things from this query. That query is executed, then this query is executed on that result set to show the sets that are returned, i.e the two polls that are returned. Okay. And it maps ID to ID, name to name, details to details so that when you see it, it comes back just like that.
[01:25:00] So, as this gets more complicated though, let's look at one of these crazy ones. Let's say we have connection and you want to show various pertinent things. You want to see the action and then you want the formatter, formatter map, stamp, IP, and then you want the source, name, ID, stamp. You can see how this starts to nest. You have connection and then action within that, within the action you have connections and then source. Now let's see if this even executes. It did. However, let's talk about this for a second.
You can see that there's dramatically more steps to what just occurred, even though there's no data to execute against. It still has to go through these steps. There's a nested loop, a sequence scan, a nested loop, index scan, another aggregate, nest loop, nest loop, sequence scan inside of that, another index scan, et cetera. So now, if this is per se, one record on this join, the next one then is executed, queried, against the set of all that. So you can have an exponential or even a Cartesian project product of the query. You could technically kill your server with certain types of queries like this. And you need to be aware of that so that you can take precautions. For example, this connection and action. You wouldn't want to call that deep into it most likely, you'd want to keep it minimal at a higher level. You'd want to do it in most like one, maybe two levels deep.
[01:26:49] The good rule to follow as with relational design is you never really want to go too far down the path with too many joins. As joins are added, you see a huge spike in decreased performance because of the correlative scans that have to be done against the database. To minimize that, you keep things one level deep, two levels deep, and you can think of those as your joins, right? As you nest into an entity object like that though, you increase your joins. So when you design your graph ... your database schema, think about the GraphQL entities and how those will be queried. So one of the things you can do to fix queries or to change things so that they can be faster or more easily queryable, if you know certain sets of data are always needed together, you can create a de-normalized table and maybe either with code or something else, have that filled at the same time that you fill your system tables of record. Okay.
Now, one way to do that is like, if we go back to data and let's say, we want that connection with its requisite related data. So we have ID stamp, but it has action and source ID. So we might want a table that has all of the columns of the action ID and source ID combined. Very de-normalized, a lot of redundant elements of data within that. So, as was pointed out earlier about cardinality, your distinct cardinality decreases, and you have a lot of redundancy within that table. But you have a single table or a single entity in GraphQL to query against. So then you can query against multiple different elements within that entity or in the SQL speak, multiple different columns to query against in that entity, to be able to get faster results and to also get results that are only nested one level deep to minimize the amount of joins and nesting within the query itself that has to be executed against the database.
[01:29:13] With that, any questions about that? Because that's where a bulk of performance and a bulk of design needs to come into play is around normalization or de-normalization of the data to change how we ask things with our queries and GraphQL of the entities. Like, what parts of the entity do we want at one particular time? Do we want more than that's available in a single entity? Do we need to expand the entity to be more inclusive of other things? Those are the things that we need to take into account. So questions, comments, et cetera?
Links
[01:30:13] As you think of any other questions or comments, I'm going to pull up my final slide because that's basically the material that I have for this. To note, this slide has the links. I will make these slides available. I'll post them in the Discord within the next few minutes. So if you want the links in the slides, you can just get the slides and they have the links. It's all on Google slides so no crazy format or apps needed for it. And I'm going to add a few things here. For example, me. You can reach me at Twitter at Adron, that's for Twitter, Twitch and YouTube also. I also stream shows on HasuraHQ around this specific topic as it relates to the Hasura tool. There's a lot of stuff out there around migrations, I'm going to be doing more and in the very near future in 2021, we will be writing apps with React, VueJS, things like that around the Hasura API server using GraphQL. So a lot of deep dive GraphQL topics in there. So be sure to follow us on Twitch, subscribe on YouTube and follow us on Twitter.
I myself am available on Twitter at twitter.com/adron. And on Twitch at twitch.tv/thrashingcode. And on YouTube at the same. We also have GraphQL Asia coming up, so if you're interested, definitely check that out. And for further training on this specific topic, as well as many others around GraphQL, how to do data modeling around GraphQL, check out hasura.io/events, got a lot of good stuff in there. And that is pretty much it. I think I might have ... let's hear. No, definitely follow me on Twitter and ping me if you're interested in credits because I think I have some pretty cool credit options I can give you. But follow me and I'll get those to you because I don't have the code with me right this second.
Questions
[01:32:56] What are intended use cases for Hasura? Is this mostly for enterprise credit apps or for consumer facing web mobile apps? Naibertech asks.
Yes. Isn't that the greatest answer, yes? So it can definitely be used very, very easily for CRUD applications, right? Because as you saw, as I worked through it, you can go in and all the CRUD features are there default, out of the box. Like all this stuff, none of this was canned at the beginning of the workshop. I did this as I would do it in any circumstance the first time I'm doing it for all of you. One reason I do that is to make sure that there's no little hiccups in between. I like to make sure any material that I present is seamless from start to finish so that your experience would be the same experience as what I'm showing you. I don't want any tricks in there. So you saw it right there. It just boom, all the CRUD capabilities are there default.
However, Hasura does a ton more than just gives you that. For instance, you can look at actions and you can add actions that do custom validation, custom authentication. It gives you those capabilities to add that and then execute GraphQL against that or create against it that way. Remote schemas allow you to connect to an endpoint that's GraphQL, and then integrate that in and even draw relationships between this other technically unrelated GraphQL endpoint and draw it to your GraphQL endpoint here at Hasura, and draw correlations between them, like relationships like we saw that it derives itself against the database that is directly pointed at.
[01:34:50] With the remote schemas though, you can point this out to those locations and basically pull in all sorts of other GraphQL APIs and kind of centralize them to provide a single source reference point for your developers or report writers or whatever, to access and have all those correlative relationships and everything that they need and ought to have to make life simpler as they do development from the singular point. So that's far beyond merely the CRUD, right?
As far as customer facing mobile apps. Yes, I would definitely use it for that. You can set up roles based permissions, authorization, all types of things there. Basically what's available in Postgres is extended beyond that. And as we add databases like MySQL SQLServer, which happening pretty soon, those permissions will also map accordingly through. So you can have all those capabilities that you need, especially like in more complex, odd enterprise scenarios to make sure that you're having the right permissions and roles and other things like that for the client software.
Other things that we have, you can basically scale this out horizontally. So for large customer facing websites or mobile apps that you would need some very high scale capabilities for, we got options around that too. If you were to use some of the high scale options for Postgres, You just point this at that high scale option and boom, you're good to go. Like some of the large RDS options within Amazon even.
[01:36:42] Expose the types of the JSON fields in Postgres through Hasura.
Okay. I'm going to ask you a question because I'm not sure the exact context. So what was it? We had action and this is JSON, right? So you can also do JSON B. Right? And I think that adds more context for some scenarios around typing and such. But if you just do JSON, I think there is, I'm not remembering exactly what we've done in the past specifically for that, to bring the JSON types across in that JSON object. I'm trying to think how Postgres even does it, if it does it. Yeah. So that's kind of a tricky one because otherwise, I mean, as you probably know, when you have JSON, if you just get the JSON and there's not something to infer the types within the JSON itself, it's generally just strings, right? If you have an array, it would be an array of strings unless designated ... unless you can find out some way that ... and infer that it's in [int 01:38:03] or something like that.
So I think you'd have to do that the same way that you would infer if you just got JSON that you didn't know about. That's a good question though. I would have to check that out and see specifically what the detail is around that because I'm not sure at this point. Somewhere in there, there's something though because I know that the types are mapped accordingly, so whatever's in Postgres ought to be available in some way form or manner. So if it's not available to infer the types as is and Postgres does it, then Hasura will at some point be able to do that if it doesn't do it already.
Perf Tips!
[01:38:56] And one more thing right here on the end. Of course, I remember this right at the end, I created a list that I did want to go through and I'm just going to paste it right here. Perf tips. These are somewhat arbitrary, not specifically related to schema, but just kind of in general. They in some ways do relate back to schema. But again, they're just kind of in general. So let's see here. Paste without formatting. Can I do that? Yes, there. These are just a few that I've been collecting the last few weeks. They're more related to things I note that specifically help the GraphQL side of queries and such from the database perspective, right? Like one, minimized joins, I mentioned that. The use EXISTS instead of IN to check existence of data. I've run into that scenario a few times. That's pretty much synonymous against anything you're going to execute ANZ SQL against. So SQLServer, Postgres, et cetera, that helps. I feel like it's kind of silly to mention, but use the appropriate data type for a thing if it's a particular data type. Ah, yes, I'll get them in there. That is a good call Robin or Naibertech. I want to turn it black, that's weird.
Also, I mentioned you can create indexes on columns in the database. Yes you can. However, that doesn't mean you get a performance boost if you go create an index for every column. Depending on how the database does indexes, which generally is similar between relational databases. You want to pick the one or two or maybe three at most columns that you're going to be querying against within the wear clause of obviously your GraphQL or your SQL as it maps through, you want that wear clause to execute against index columns, ideally. But if you're just kind of doing wear clauses against a whole bunch of different columns, then it's kind of hard to pick and choose. You want to pick the columns that will have the highest distinct cardinality, that's one thing. If there's a lot of redundancy, indexes can be iffy. So you really kind of have to trial and error through some of that until you become more familiar with how the database streamlines and enhances the execution path and the scans of the queries.
[01:41:49] And I also mentioned denormalizing the data, which is taking like that standard primary foreign key and just kind of munching a bunch of columns into a single table, that can help. But of course you lose your distinct cardinality and then you have to really kind of weight which columns to index and how many things you should have together or shouldn't have together because of the redundancy within the table. This is something, however, if you have a server that uses Apache Cassandra and you run a GraphQL on top of that, Apache Cassandra works very differently in how to do this. You have a table structure, but it's columns are stored and your wear closets can only be done against one or two columns, generally speaking at a time. But you can literally have petates of data in a table and get timely queries against that because of the horizontal scaling of Apache Cassandra for example. But again, like I'm saying, that's very different, very different database than the relational model, but it gives you that reporting capability that is not even comparable because the reads are so fast.
So in some of the databases, it seems that schema isn't ... there's a little bit of redundancy. So there's database model schema, and then there's the schema feature. Like in Postgres, you can name schema and then put tables within the schema. Okay. There's the schema design where all the tables are related to each other, but then there's the schema almost as if it were a name space that you can organize tables within. So then you have my schema dot table name, whenever you write a query. Putting things into schemas helps organize, but also can help you streamline for specific performance needs or something like that, and to break out things per business domain or other needs. SQLServer has options like that, so does Oracle, et cetera, not always called schema, but it's like a name spacing element within the database to be able to break out tables into groupings. Doesn't really change the table itself in any way, it kind of name spaces it where it breaks it out into a domain organizational structure.
[01:44:17] All right. So yeah. Any other questions? I'm going to be hanging out for the next five plus minutes at least. I am going to share these slides real quick and I will post those in the Discord right now. So if any more questions, feel free to ask them, I'll answer anything and everything I can. If not, I'll help you dig up and answer to that. In the meantime, I'm going to share these real quick and get them to you.
So thanks for attending, I hope this was useful. And again, go follow me, follow Hasura's Twitch, et cetera, go read the blog, check it out, there's so much more out there that we are producing and we'll be producing and so much more specifically around connecting database schema and modeling design to how to do it well with GraphQL online to make development faster, easier, more robust, and capable as you do so. So again, thanks. I'm going to unshare my screen, get those links to you and hang out and chat. So, thanks again, everybody. Cheers.