Remix is the best React framework for working with the second most important feature of the web: forms. (Anchors are more important.) But building forms is the fun part: the tricky part is what happens when a web consumer submits a form! Not the client side validation logic but the brass tacks backend logic for creating, reading, updating, destroying, and listing records in a durable database (CRUDL). Databases can be intimidating. Which one to choose? What are the tradeoffs? How do I model data for fast queries? In this talk, we'll learn about the incredibly powerful AWS DynamoDB. Dynamo promises single-digit millisecond latency no matter how much data you have stored, scaling is completely transparent, and it comes with a generous free tier. Dynamo is a different level of database but it does not have to be intimidating.
Remix Persistence With DynamoDB
Transcription
you Hi everybody. It's real pleasure to be here and I'm stoked to be part of this remix movement with you. And today I'm going to be talking to you about persistence with dynamodb. My name is Brian LaRue. You can find me on various socials that still exist under that name and I work for begin.com. So before I get into the Dynamo and talk about persistence in general. So persistence is a really important requirement for dynamic web applications. It is a essential complexity for anything that's personalized. So anything that's got an authentication step and we're saving some data about a person we need to do that in a secure way and in a fast way you're going to need a database and you can't do this with a flat file system. You don't want to do this system because you'll get into concurrency issues. So traditionally people would choose a relational database and dynamodb is a is a kind of Next Generation database that is a key value based and so Most organizations out there these days have settled on Amazon for infrastructure and Dynamo is the flagship key value database for AWS of if you're using AWS, you probably would would like to learn more about Dynamo. So what exactly is Dynamo? It's a low latency wide column key value store. That's a really fancy way of saying that we query by key. And we store values as Json and we can have different shapes of different columns on every row and so for every Row in my database I could have different item attributes and it's all good. Dynamo is a completely managed database. So this means that there's no patching. There's no software to upgrade and That's really nice and also really nice. It's scale to zero which is a fancy way of saying that you only pay for what you use and you don't pay for anything else. So it's a hundred percent utilization. You're not keeping a big cluster of servers around just to meet demand that you may or may not have you just use what you what you pay for and happy days you move on from there. So the next kind of big question that I often get is like well, why would I choose dynamodb of the millions of database options that we have available to us out there and the key one for me is it's the flagship manage database for the pioneer of the cloud and they use it themselves at Amazon to back amazon.com to retail business. So it's just a good choice from that regard. There's great local development options. It's got a huge free tier. It's fast no matter how much data you store which is kind of a science fiction dream for databases used to be as you added servers you would add latency. And you definitely would have problems which which we just don't experience with with the managed database. It's very small. So there's not a whole lot of API to learn. There's really only about six key API calls to deal with Dynamo, which is really great. It's got a good SDK for just about every runtime you could think of And the sort of sleeper feature my favorite feature that it works really well with AWS Lambda single digit millisecond latency for queries. No matter how much data your story as generally the line that gets tossed around and this is a really big deal because a lot of databases are quite slow today, especially traditional relational databases. They can be made fast, but they're never going to be that fast and they just don't work as well with Lambda for a variety of reasons. So The next question would be well, why not dynamodb like obviously there's trade-offs. This isn't going to be a trade-off free decision. So the first big kind of thing is that it's an Amazon only thing so people get worried about that. This isn't actually technically totally accurate though. There are examples of people cloning Dynamo now for other key value stores, so we saw this happen with a lot of other Amazon services like S3, very famously cloned quite a bit now, so that'll probably continue to happen. A very valid criticism though is that it's just a bit weird to learn so. Modeling for a wide column key value store is very query-centric. You have to think about how you want to get the data before you store it. Unlike relational data where we denormalize everything and we can kind of query out Hawk. Now the other thing that's a bit weird is that the query syntax is a bit strange. It's not as same as relational. Although I would argue that this is just a familiarity thing if you'd learned. Dynamo first and then learned a postgres after you probably find postgres weird. So not really a major thing. Why non Dynamo DB also has a bunch of myths and I want to address these headlines. So one of the first myths is that it's expensive. Another one is that you need to know all of your access patterns completely upfront before you do anything. That's not true. I've heard that it's hard to modify and migrate. That's also not true. Sometimes people see can't use SQL that's not true. And we'll address the biggest red herrings of all lock in and scaling at the end because it'll be funny. So first one is is it expensive? Well, the dynamodb free tier gives you 25 gigabytes stored on disk per month for free and it's 25 cents for every gigabyte after that. That's a that's a lot of starter data 25 gigs. It's a ton from a that's a that's just on disc what it costs. So you also have to pay for read and write and basically it works out to about 200 million requests per month is going to be in the free tier. So storing 25 gigs 200 million requests per month. I I think this is not expensive at all. This is very cheap. And so even if as you scale you find these numbers to get out of hand. It still speaks to using Dynamo for at least prototyping your application. It's Gonna Get You really far for very little. As a side the joke, I used to like to say, it's like yeah Dynamo is expensive. That's so our dba's and that joke will really only lands if you've had to Shard a database so I'm not gonna make the assumption that anyone else I'm talking about there. Another big myth is that you need to know access patterns up front. So Dynamo DB modeling is different than relational database modeling and it's just different it's not better and it's not worse. It's just different and so in both databases relational or key value you need to have a schema of some kind you'd say that these values are expected and that's how I'm going to query for stuff. So it really doesn't matter which database you choose at some point. You're going to have to declare a schema. You're gonna have to say I have an accounts table or I have a songs table or whatever and and this is true whether you're dealing with relational or a wide column store. So I don't think it's a great argument. You're going to be migrating and learning about your app and as it gets more big and you understand the access patterns where you're gonna be able to store the data more efficient ways, and that's true no matter what database you choose. Sometimes people say you can't use SQL dynamodb that used to be true. That's no longer the case. We now have a query language called party ql sounds fun. It looks like SQL. So I guess if you find that fun it is and you can click through and check it out. It's not just limited to Dynamo. It's actually supposed to be a kind of a broader idea. I think right now time probably the only people so the other big red herring that gets thrown around is whether or not you're locked in and a certain point you're always locked in and and I would argue that the biggest lock-in of all is choosing a front complexity and sharding and managing a running instance for database is definitely a lot of complexity. Sometimes people think that if you use a database like postgres that means your data is inherently portable and I think that's a lie, you're gonna have to go through a huge amount of pain to move that data between different database providers. Regardless of the database you choose. So to me, it's not a very good argument. There's a much longer more rational written argument here on Martin Fowler's website that you can check out. I just don't think lock ends a good reason to choose anything if you've already selected Amazon as a vendor, you're probably going to be there for a while. This one's fun. And it comes up all the time. People are like, well does it scale? I yeah, it's super does so Jeff Barr last year says Amazon Dynamo DB Powers multiple high traffic Amazon properties and systems, including Alexa and amazon.com sites and all the Amazon Fulfillment centers over the course of the 66-hour Prime day. These sources made trillions of API calls while maintaining High availability with single digit millisecond performance peaking at 82 million requests per second. that scales So it's also not just Amazon social proof is how you like to make decisions. There are a ton of logos here people using dynamodb at scale for a variety of applications. I think. I think we can put that one to bed. So as a very fast recap dynamodb scales up and down dynamically generous free tier supports an SQL like syntax. It's fairly straightforward to migrate around works really well with AWS Lambda. Yeah at scales. And yeah, it's a bit weird to learn and we're gonna get into that. Before I get into it too much further, if you just want to zone out and not pay attention to the rest of my talk thing. I would recommend you do is pick up this book and follow Alex 3 on Twitter. If you're still there, this is just an immediate buy for your whole team. If you're using Dynamo is the best docs written for dynamodb by far when we started adopting Dynamo. I wish this book could come out because We would save a lot it is the cheapest thing you'll you'll ever be. Thankfully. You bought It's Gonna Give You superpowers and I can't recommend it enough. So Dynamo, let's get into it a little bit deeper. Some terminology to be aware of first of all, so Dynamo has tables and tables or collections of items. You could think of these like pages in a spreadsheet and items are like a row and spreadsheet or maybe in a database if that's what you're familiar with and now items have attributes and so like if a row if the table's name is accounts and a row is, you know, a user in the accounts. It might have attributes like name and email and address. Partition key is how we query for a specific item or a collection of items. So partition key is kind of like a primary key and relational terms. It's usually a fairly unique value, but it doesn't have to be and it's it's how we get stuff a sort key is a secondary way to query and so a sort key might be a way to narrow down the collection of items or a way to get a more exact representation of hierarchical data. So if you want to get like a lower Leaf in the tree as it were and then the last concept is indexes and indexes are actually they're just basically like tables there are another table with a different primary key sort key schema so that you can query by different stuff and I'll explain what I mean. So basic modeling a partition key is a unique value for accessing item. So this right here is what we call an arc file and it's a document we use to provision dynamodb tables. In the architect open source project. So we say hey table. I want to table name to counts. And I want to partition key of account ID. And I'm going to have an extra index for the accounts table. Oh, I have a typo here that should be accounts with an S. That's terrible, but this should be accounts and so it matches here and it would be a different way to query so I might want to get my my user by account ID, but I might only have the email such as when they're, you know, registering or signing in. Or logging into the system. So I need an index to create that by email. I like how my typo carried over to here. So sorry about that. So basically 1012 so that there's a primary key concept and then there's a secondary key concept or the sort key. Sometimes called a secondary key sort key is for querying sort of one to many relations. So if I was modeling a Blog in my in my system, and I've got accounts and accounts have many blocks and so one account ID. Would have more than one blog post and we could you know have a secondary key of slug to get an individual post. Now, this is cool because we might also have to query blogs by when it was published. And so this is a secondary key in the index where we're going to go by published instead. So it's kind of nice. So now we can get all logs. We can get a Blog by it's title and we can get whether or not it's been published. Cool, so primary keys and sort Keys tables and indexes. That's like the most basic high level stuff. There's a lot more to it. One of the things to know about dynamodb and you hear this come up all the time. So I'm going to address it now us what about single table design? So Amazon recommends that you model your application with this few tables as possible. And ideally that means just one table and you can get really fancy with Partition keys and sort keys to store really complex relationships inside of your tables. inside of one tape Now I don't like to do this with a femoral data and durable data and what I mean by that is if I have API keys or tokens, which I'm expiring with a time to live. I don't want to store that in the same place where I have user accounts which are durable data that I want to keep around forever and maybe back up on a regular case. I don't want to back up short-lived tokens. I want to back up user accounts. And so we don't want to put those tables together. So you typically will have more than one table depending on the life cycle of the data involved. Now this is a personal opinion and everybody's got one. So you take it for what it's worth but I think single table design is a optimization step. Once, you know, your data access patterns until you know them you should probably just lean towards trying to stuff out and figuring out the best design for your app at the time and that might mean there's going to be opportunities to improve it later and that's not a bad thing. That's actually what will happen regardless whether you desire for a single table or not. Okay. that's a lot hype let's Let's get into modeling some stuff. So first of all, if you're going to create a remix app to work with this, you're going to want to choose the grunge stack that uses architect under the hood, which gives you all these AWS superpowers. You don't have to do that though. And I'm gonna show you how to work with dynamodb directly on your machine locally without an Amazon account because that things a lot easier if you do check out the grunge stock, you will see in the generated code that it gives you this awesome note taking application that is really close to how I would recommend you'd want to work anyways, so typically you'd have some kind of folder called models and you would want to put all your data access logic in those models. You don't want your application layer to be interacting with dynamodb. You want to be interacting with the entities of your system. And so in this case the entities are user and Note and so whatever you want to call them whatever those things are that you're persisting you want to collect those into like an objects so that your application lay reads really cleanly and it's more focused on business logic and not on data access logic and that's really really important to separate those concerns. Once this is all compiled. It'll all be Mash together in the right places But if you'd sort of dig into like these models, you'll see that they kind of encapsulate all the ugliness of working directly with the database which isn't that ugly but, you know dv.note.get with these PK and SK things is a lot less clear than the interface of just get note, right. So that's a nicer way to work. Now. I was gonna go through this and just sort of explain how this works, but I feel like it Starts with almost too much context and I don't want to confuse you with all this extra business that's going on. I just want to show you the most basic basic stuff. So with that, I'm actually just going to go jump over to my terminal. And I should fix my example. super funny I'm gonna create a new tab. and I'm gonna make a directory and call It remix. Dynamod DB and I'm going to jump in there. I'm going to Echo a pair of parentheses into a file called package Json and I'm going to npm install. tape top spec architect sandbox detect functions and I'm going to save them for development. It's going to take a sec. We're all kind of used to the npm tax, but it's worth maybe talking about what's going on here. So I'm just created a package Json file and I added some dependencies to it. Those dependencies are for testing and that's it. And there we go now I'm gonna I'm going to touch a file called app.ark. And modify it. And it's called my app. We will start by defining a dynamodb table called cats. and we'll give it a primary or a partition key. Sorry. I always want to call the primary key. We'll give it a partition key of cat ID. So we have typically I like to name the tables and plural and then you know each row would be like a singular type thing. And so now with the way that architect works we'll be able to run this locally, which is really really nice. Well before I do that, let's go pop over to Package, Json cool. I got all my depths. So we'll just add a script. and Text back. Okay, so this is gonna run tape against a file called test.mjs and it's gonna pipe the results to tap spec now test.n.js doesn't exist yet. So let's create that. And we'll just add a silly test. oops works Now you can add all kinds of tooling to this obviously and you can get really, you know, Buck nutty if you want, but I'm just trying to get this working and I just want to like do like one little step at a time and see how it works, you know. Just taking baby steps. So we have one passing test. That feels pretty good. But there's not a whole lot going on here. So remember when I went into this app dot Arc file and I Define this table called cats. What would be cool if we could, you know, take a look at that. So why we do that? So before we do that, we're gonna need the ark sandbox. So Architect ships with a dynamodb runtime based on I think called dynalate. and it lets us run it locally, which is nice so you don't have to deploy anything to the cloud to test out your Dynamo DB. and now it runs like a dead server. So, you know, we have to start and stop it and we'll do that before and after our tests run. Pretty much why I think of this. I think there we go. And and it and and we'll see some weird output when we do this. Let's take a look at that. Oh. Miss, you know. I should install eslint. But I don't want this to turn into a configure my stuff story. Okay, so you'll see here. We got some output. And when the sandbox started said tables created in local database. Nice actually don't want to see that every time those okay until it to be quiet true. And instead of just saying works. Why don't we take a look at the architect Dynamo client? architects of a collection of small functions so that you work with AWS And so one of those functions is a data access layer. And so we're gonna say let data equals. Oh wait tables. So this is gonna reflect back. The tables that we defined in our Arc file, and if we were to deploy this that Arc file would generate the cloudformation needed to create the database at AWS. So let's take a look at that. well right this is Okay, so that was actually pretty cool. We'll see now that when we said Arctic tables it returned this object called data and data has this thing on it called cats and Cat Ted methods delete get put query scan and update. That's our Dynamo client. So The reason this is useful. I'm going to just quickly jump over to the architect website and I'll show you. Here is on the left an arc file. Oops. And this on the right is the generated cloudformation for that Arc file and you'll see down here somewhere. It's going to define a dynamodb table. And that's great. You know, we've got like scope and data and IDs and it's got a TTL on it this table when it gets generated is gonna just have a meaningless good for a name if I call it. Cats for example, there's no there's no cats in here. It's going to be mashed in with with what gets generated. So at runtime we actually will look it up by key and allow you to have this Dynamo client that there's a little more easy to work with. So you're just working with Arc dot or data dot cats as opposed to working with some good. Okay, so we got our table let's let's write some data to it. So actually kept okay result equals the way get it at cat's not put and I think I said in the schema that it would have a cat ID and I'll just you know, this isn't intended to be. Production code so we'll just do that for the cat ID. Give it a name. Two Trails my cat's name. Yeah birthday and Okay, and instead of saying pass, let's say t okay. So that cat ID. Has a cat log out result here so we can take a look at. cool So we wrote a cat to the database and then we exited right away. It's gone. Okay, you don't persist anything in our in our local file system sandbox by default. This will actually just run in memory. It's just great because as you add more and more tests and you build a bigger in bigger bigger data access layer, the tests are going to run fast in our case for begin.com. I think we have something like Thousands of tests and they run a milliseconds which is how you want this to work. So we read whoops. read all caps and we can add a couple cats and the cat and Tuxedo not that and I'm gonna I'm gonna want to get this ID later. So I'm just gonna say like cat ID up here outside the scope of my test. And I'm going to copy this chunk of code up here. And so this cat ID will be outside of the scope and then later on I'll be able to look it up. But for now, let's just get all the cats so dinner cats that's game. and as cats Okay, so scan is typically considered bad practice. Terms typically, so what scan will do is it'll go through and will read everything and if your table's really big that's gonna take a long time and it's gonna have to paginate through multiple pages of data. So typically you want to grab things by querying or by directly giving it a key. So that's just the the sloppy fast way to work. But you know what this also scales for quite a long time. So don't feel bad about using the tools you have available to you and then iterating to improve it doesn't have to be perfect up front. So let's read. one cat data.cats. Oops You get and we're gonna query by Kevin. So if we see here my cat ID, we pulled back this one. This is just screaming fasting like we're just adding queries and no big deal reading and writing all that takes place through all seconds. Not bad kind of exactly what you want. So as far as cred goes we've we've got create. We've got read. We've got a list with this oops with this scan business. Let's do a query. Just to round it out. And so this is kind of the free cats by cat ID. So this is kind of the recommended way. to work with with rose key condition expression now you you can do this in multiple different ways. This is just sort of one way and there's a whole lot to unpack about how to learn how to do proper modeling and building up a larger applications trying to give you a sense here not trying to say this is the only way or the best way. This is a I'm introducing you to Dynamo for the first time. So we have a key condition expression. We're saying give me all the cats where the cat ID equals this value and then we have to have an expression at your value. And one of them is going to be cat ID. And we'll pass the calories and make you generate above that's going to get us a result. And we'll inspect that value of that result. so You can see it's really actually quite similar to running a scan but it was prefiltered on the on the ID that we were looking at. Now as you design and learn how to model out dynamodb applications, you'll use IDs as a way to like query on groups of items in higher keys. And this is How you get? More than one item by ID. and you can add all kinds of conditions to this. It doesn't have to be. Doesn't have to be too complicated. And I do another one of these things say like cat 82. and we'll use it for this one. And so read I'll catch that cat ID has much cats cool cool. Let's delete it or row. So we'll have all crud now create read update. Delete. Oh, let's try a cat. That's not sad. Delete a cat rolling. It's a little more benign. And a friend of the cats and want to see them get hurt. Okay. So I'm using tape by the way here. I'm very deliberately. Sometimes people in the react Community think you should use just don't use just to test node projects. It's not very good for that. It's loaded it slow and it patches globals. You want something as close to the metal as possible for your test? You don't want your test to be a murder mystery. You want them to be a place where you can trust what's going on? That's what you thought was going on. All right, so Let's grab some go from there. We're gonna get the cats and we're gonna say delete. and it's like at ID so we can just steal that code and then we'll see if we get a result of some kind. And sadly, yes, we will the right way to do. This would actually be let's do a scan afterwards and we'll see how many items we have in the scan. Now the right way, but the right way to do this to learn I should say. Call us one result. Oh and delete the second one. Okay. Oh. man I think I know right it was back up here when it's creating this. Yeah. oh still have So you can't really hot cats. Oh. Read Kat by cat ID. I got one. And delete Arrow as cats. Let's just see what I got. We deleted both cats, didn't we? I deleted this cat. And I delete both cats. Where did the other cat go? Oh. going off script Recaps by cat ID we have a tuxedo tuxedo. see Okay, that's interesting. It's most likely overwrote our cat. here in Arkansas put it cat stuff, but oh. I think we did. because there's never happened. So close. shared an ID Yeah. That's what happened this year maybe. Now this is obviously for demo purposes. You could use uuids here. Probably want to use some other data like email and stuff to. but so Yeah, now we've got just one back. It's doing what we expect. Cool. So that's a very fast tour of building a dynamodb client in one file with three dependencies or four dependencies. As you as you were to mature this project, you would likely want to like have a thing called cats, right and you wouldn't you wouldn't be interacting with the Dynamo coding here. You would be interacting with the cat's code. And so the exercise for the reader would be to move this into. Some kind of model here and then your application will consume that model would be tested independently of your your UI have your application and you wouldn't have to worry about how these things compose it can compose nicely together. As you build it you're out. Okay, so that was that was like the highest level fastest speed run of learning Dynamo. I could throw at you. So obviously only just barely scratches the surface a big recap here. So Diamond, would you be scales up and down? Dynamically? It's got a huge freet here. It supports SQL like syntax. It's very easy to migrate. It works super well, Lambda. Yes at scales. And yeah, that's a bit weird to learn. I am out of time. So please follow me on the very socials at Brian LaRue. If you want to give this a try give a grunge stack ago remix, and then if you're having any problems just join us at the architect Discord. You can find us at arc.codes. We're always happy to help. Thanks so much.