JavaScript conferences

Node Congress 2023

Node Congress 2023

English version

Scaling up Your Database With ReadySet

Aspen Smith

Nick Marino

The database can be one of the hardest parts of a web app to scale. Many projects end up using ad-hoc caching systems that are complex, error-prone, and expensive to build. What if you could drop in a ready-built caching system to enable better throughput and latency with no code changes to your application?

Join developers Aspen Smith and Nick Marino to see how you can change one line of config in your app and use ReadySet to scale up your query performance by orders of magnitude today.

FAQ

ReadySet is a caching system designed to work with MySQL and PostgreSQL databases. It allows users to create caches for individual SQL queries on-the-fly without needing changes to the application, other than minor configuration adjustments. This system can significantly speed up database query execution, showcasing up to a 25,000 times speed increase in benchmark tests.

Unlike traditional caching solutions that require separate client libraries or manual cache invalidation, ReadySet integrates seamlessly using the same protocol as MySQL and PostgreSQL. It automatically updates cache data in real-time as new data comes in, avoiding manual invalidation and reducing latency significantly.

ReadySet offers several benefits including plug-and-play integration, flexibility in caching individual queries, scalability across multiple nodes without additional load on the database, and improved read performance through its efficient cache updating mechanism which keeps latency minimal.

While ReadySet is powerful, it doesn't support caching for all SQL queries yet. It focuses on caching the most critical and performance-impacting queries. The system is continuously being developed to support a broader range of SQL functionalities.

ReadySet is ideal for applications experiencing scaling limits of a single database server, particularly in read performance. It is also beneficial for applications requiring high query throughput without adding significant latency, making it suitable for high-load web applications.

No, implementing ReadySet does not require major modifications. Applications can continue using the same code and client libraries, with only minimal configuration changes needed to redirect database queries through ReadySet.

ReadySet provides eventual consistency, meaning the cached data might be slightly out of date but updates quickly. This level of consistency is suitable for many applications, though critical operations may require direct database queries to ensure data accuracy.

Aspen Smith

Nick Marino

33 min

12 Apr, 2023

Comments

Sign in or register to post your comment.

Video Summary and Transcription

ReadySet is a plug and play caching solution that improves database performance and query optimization. It supports scaling out to multiple nodes without additional load on the database and ensures real-time cache updates. ReadySet significantly speeds up queries and reduces query time, making it a game-changer for web applications. It is not designed for write-heavy applications but works best for read-heavy loads. Community feedback is encouraged, and a cloud version of ReadySet is available for production use.

Available in Español: Escalando tu base de datos con ReadySet

1. Introduction to Scaling with ReadySet

Short description:

We're going to talk about scaling up your database with ReadySet. We'll explain the problem it solves and give a demonstration of its usage. Then we'll open it up for questions.

Hey, everybody. I'm Nick Marino. Hi, I'm Griffin Smith. And we're going to talk about scaling up your database with ReadySet. So just to summarize, we're going to start off and give kind of some background information, explain what the problem is that ReadySet is trying to solve, how it solves it. And then Griffin's going to give a demonstration that kind of shows how to use ReadySet with a real-world application and kind of gives you an idea of some of the performance benefits and features you can expect to see there. And finally, after that, we'll open it up for questions that any of you might have.

2. Introduction to ReadySet

Short description:

ReadySet is a caching system for MySQL and PostgreSQL that allows you to create caches for individual SQL queries on the fly, resulting in significant speed improvements. The problem it solves is the difficulty of scaling a database when your application grows in popularity. Starting with a single server relational database is common, but as your idea becomes successful, the database struggles to keep up. Using MySQL or Postgres initially is convenient, but when scaling limits are reached, it becomes a challenging problem to solve. The critical point for user experience in a web application is read performance, with loading a web page feeling instantaneous at around 200 milliseconds. To scale up database reads, you can optimize queries, use read replicas, or consider other options.

All right. So first off, what is ReadySet? So you know, if you look at the ReadySet website, it says it's a caching system for MySQL and PostgreSQL, which is true. And when I first looked at the website, though, I kind of saw that and thought, oh, maybe it's just a wrapper around, you know, Redis or Memcached or a clone of some kind of existing popular caching system. So you know, it's something like a cool, useful product, but maybe not anything that revolutionary. It does turn out it's actually much more than that. It allows you to quickly and easily create caches for individual SQL queries on the fly with no changes to your application needed, other than maybe like a config change or two. And, you know, I don't want to spoil too much of the demonstration later, but in one of the queries that we've benchmarked, we see something like a 25,000 speedup. So that's pretty exciting, I think. But before we get into all that, let's discuss the problem. So I think this quote kind of encapsulates the problem we're trying to solve pretty well, so I'll just read it aloud. The data access layer of a million dollar idea starts out as a single server relational database. You're hardly worried about scale issues, you have an application to write. However, if your million dollar idea ends up being worth even 100,000, you'll likely find your database struggling to keep up with the scale. And this is certainly a story that I'm personally familiar with. I know a lot of people out there probably are as well. When you're starting a start-up or website and you're at a very early stage, you don't really have the time and resources to build something using the latest fancy, no-SQL, scalable fad of the day and there's nothing wrong with using MySQL or Postgres. They're free, they're very popular, they're easy to use. And when you're just trying to get a simple initial version of something, a minimum viable product, a prototype, whatever you want to call it, you don't want to spend too much time dealing with really complex, sophisticated scaling systems. You just want to get something that works. But, of course, if and when you do hit scaling limits in your database, it can be a pretty tricky problem to solve. And it's a good problem to have, but it often comes at a pretty critical point in the life of a website or an organization. So if we're sort of looking at the problem that we're running into here, I just want to sort of frame it very briefly. When you're scaling out a production web application, I think the first thing that you start to really care about as you sort of hit the scaling limits of a single database server is read performance. I think for most web applications, that's where sort of the critical point for user experience, there's this number that's quoted and there's some pretty extensive research that shows that loading a web page feels instantaneous at about 200 milliseconds. And that 200 milliseconds, most of the time you're spending is reading data out of your database. And so if you need to scale up those database reads, if you need to make those database reads faster, there are three sort of high-level main options that you're going to take there. You could spend a lot of time optimizing your queries. That can be as simple as adding indexes or something more involved. You could use a read replica to scale out the query throughput. So if you're getting a lot of queries, you could use read replicas to increase the number of queries you can handle.

3. Options for Improving Database Performance

Short description:

Building a custom caching solution or optimizing queries are options for improving database performance. Optimizing queries requires expertise and may involve fighting against ORMs or client libraries. Indexes come at a cost, increasing write time, disk usage, and memory usage. Additionally, spending time on query optimization can limit the number of features you can ship.

Or you could build a custom caching solution where you store frequently accessed data that's not changing a lot in some sort of key value store that you can access super, super fast.

All of these options come with tradeoffs. So our first option, optimizing queries, this is something that takes a lot of expertise. You have to understand how SQL databases work. You have this sort of notion that your database should be able to just perform the query for you in the way that is the fastest. But every database server on the market that I'm aware of, at least, needs a little bit of help to do this. As I said earlier, that could be as simple as just adding indexes or as complex as completely changing the way that the query is written. And in the latter case, frequently you're fighting your ORM. We all sort of, anybody who's used an ORM has run into this problem where your ORM wants to issue queries in a particular way. But you know that a different way of writing the query that could return the same results would execute faster against the database. And you have to sort of fight this battle against your ORM or client library. And even indexes come at a cost, right? They increase the amount of time it takes to perform a write. They increase disk usage. They increase memory usage. It's nothing here is free. The other big thing is that your application is changing, right? If you're spending a lot of time optimizing your queries, someday you're going to add a new feature and a new query is going to become important, right? This is a thing that is, every application is changing all of the time. And if you're adding this extra time to create indexes or optimize queries to every single feature you have to write, that means that you're shipping fewer features, right?

4. ReadySet Features and Benefits

Short description:

Scaling out the number of queries can be done by running read replicas, but this adds operational overhead. Building a custom caching solution is another option, but it requires writing code, dealing with cache invalidation, and introduces operational challenges. ReadySet offers a solution that eliminates the drawbacks of these options and allows you to focus on building features for your users. It is a plug and play solution.

You know, this is sort of a pretty standard way of scaling out the number of queries that you can handle. Both Postgres and MySQL I'm pretty sure fork processes per connection, so they have sort of a hard upper limit on the number of connections that they can handle. And so, you can scale out throughput by running read replicas, but this also isn't free. Your read replicas have to connect to your primary database in order to stream the replication log, which adds load to your primary database. And you're adding, like, not insignificant operational overhead to running your database in production.

So, this is still a lot of work. It's a lot of cost. And at the end of the day, you're only improving throughput, not latency, because it's the same database engine, executing the same query, just on another server. And then, that takes us to sort of the third option, which is building this custom caching solution. The idea here is you execute a query once, you store the results in something like Redis or Memcached, and then, if the data hasn't changed, you're able to read the data out of Redis or Memcached. But this is, you know, this is a lot of work. This is code that you have to write. You know, previously, your application was just making SQL queries against MySQL or Postgres. Now, you have to, you know, you have a totally different access pattern for accessing these caches. You have to deal with invalidating the caches on writes. And frequently, this is like something that's like totally manual. So, you can introduce bugs where you forget to invalidate a cache on a particular write. And these bugs, like, I've spent months of my life tracking down bugs that eventually, you know, turned out it was just we were forgetting to invalidate the cache on a write. And you're still adding operational overhead, because you have to run this extra service. And there's a lot of other problems, you know, like little fringe problems that come with running these caches. You know, fallback and failover, running these caches and distributed scenario can be really tricky. And you have this problem where because you're invalidating on writes, if you get a lot of writes, then you have to run the query against your primary database again. And you have this like thundering herd problem, where, you know, if a bunch of people request the same data against an invalidated cache entry, you know, you could really put a lot of load on your upstream database and, you know, cause a lot of production problems.

But the idea here, the idea behind ReadySet is all three of these options kind of suck. And it would be awesome if we didn't have to do any of them at all. You know, we want to be focused, instead of focusing on, you know, scaling out our database, we want to be focused on building features that our users want, you know, sort of making our customers happy. And all of this database scaling issue is kind of a distraction. So this kind of brings us to what I think is so exciting about ReadySet, is that it enables you to scale up without dealing with any of those headaches that we were kind of talking about earlier that Griffin discussed in the last few slides. So this kind of shows some of the most exciting features I think about ReadySet. Number one, it's plug and play.

5. Rediset: Plug and Play Caching Solution

Short description:

Rediset is a plug and play caching solution that speaks the same protocol as Postgres and MySQL. It allows you to use the exact same code and client libraries without any changes. Rediset is flexible, allowing you to turn caching on and off for individual queries and easily experiment with caching performance. It is highly scalable, providing quick and efficient read latencies, and supports scaling out to multiple nodes without incurring additional load on the database. Rediset also ensures real-time cache updates as new data comes into the database.

Number one, it's plug and play. So it speaks the same protocol as Postgres and MySQL. So what this means is that, you know, unlike with a custom caching solution where you might have to integrate, say, a Redis client library into your application, and you might even need to speak multiple protocols, so you can talk directly to the upstream database if you need to fall back, if your cache has a problem, you literally just use the exact same code and client libraries you're already using. You know, we've implemented the entire, you know, wire protocol for these existing databases. And so you don't have to change any of the code in your application. All you have to do is change a line of config, typically, to point your application at Rediset instead of the database.

Second off, it's very flexible. So we can turn caching on and off for individual queries at will. So that makes it really, really easy to very quickly iterate on the performance impact of caching. You can run experiments. You can see what happens if I cache this query, what happens if I cache this other query? How does it affect performance? How does it affect memory usage? It makes it very easy to quickly run these kinds of experiments, get results, see what works.

Additionally, Rediset is very, very scalable. I kind of hinted at this a little bit earlier, but when you're doing a read against a query that's been cached and the cache is available, it's really just look into HashMap. So that's very quick and efficient. Typically, we can see even sub-millisecond latencies there. Additionally, if the performance benefits you're getting from a single node of Rediset are not enough, or you don't have enough memory on a single machine, we also support scaling out to multiple nodes. So not only does that kind of give you even more headroom and breathing room in terms of performance, but additionally, one of the cool things about that approach is that we still only act from the perspective of the upstream database as a single read replica. So in the read replica slide that Griffin was showing earlier, if you have more replicas, they're all getting updates from the upstream database, so it kind of puts more load from having additional replicas needing to retrieve updates and get data replicated to them. But with Rediset, it acts like a single replica. All of the scaling out in a multinode setup happens internally to Rediset. So you don't necessarily incur additional load on your database just by adding more Rediset nodes.

Finally, this last point, negative latency, that might sound like a little bit of a buzzword, but let me explain what I mean here. Latency, it's the span of time between when you request a result and when the result comes in. And part of the big idea with Rediset is that we're continually updating your cache data data comes in before you ask for it. So unlike in a custom caching solution where maybe, you know, you have to invalidate things periodically. And then when you ask for the new data, it's temporarily slower because it has to rerun the query. We're constantly trying to keep your cache results up to date in real time as new data comes into the database and is replicated to Rediset. So to show how that works a little bit, we have this little diagram here. So that what you can see here is that when your application makes rights to the database, that is, you know, inserts, updates, deletes. It goes directly to the database, but then Rediset acts kind of like a read replica, and it just gets updates from the database using the same protocol that a read replica would use.

6. Getting Up and Running with Rediset

Short description:

Rediset uses the same protocol and replication as the database. It keeps caches up to date and provides a significant speedup. The process of setting up Rediset is simple: run Rediset pointed at your database and change the configuration in your application. Rediset works with Postgres and MySQL client libraries. A live demo will be done using cal.com, a real-world production application built on Next.js and Prisma. Rediset caches important queries, improving performance.

So you know, you don't need any kind of fancy database plugins or changes. It still speaks the same protocol and replication as well as querying. And then yeah, and then Rediset uses those replicated updates to keep your caches up to date, so that when the application asks for a query result, it hopefully gets it straight from Rediset with no, you know, no additional, like, cold read lag or anything like that. Cool.

So, next up, I'm going to sort of show you the process of getting up and running with Rediset. We've worked really hard to sort of make the process of getting Rediset set up as simple as possible. There's really two steps to this process. It is run Rediset pointed at your database, change configuration in your application to point your application at Rediset. As, as Nick said, you know, we speak the Postgres and MySQL client library, like the wire protocol for Postgres and MySQL, so, if your application is already talking to one of those databases, your application can just talk to Rediset. And then it just works. You can cache queries and you can, you know, play around with which queries are cached or which aren't and ideally see some pretty significant speedup.

So, I'm going to do a live demo using cal.com. Cal.com is an open source application. It's a sort of real world production application. We didn't write it. We're not involved in writing this application. We picked it for a couple of reasons. One, it is a real world application. So, the idea here is that this is as sort of, you know, as realistic as possible of a scenario for running Rediset. It's built on Next.js and Prisma. So, it's a stack that's hopefully relevant to some of the audience here. It's as I said, mature. It's got a lot of queries there. Some of which are simple. Some of which have gone through a lot of optimization. Some of which are very slow. Some of those queries Rediset can't cache. As Nick said, we can't cache everything, because we're still sort of working to support all of SQL. But the idea is that we don't have to cache everything, we only cache the queries that are the most important and the most critical to the performance of the application. We also picked Calcum because it's pretty easy to set up.

7. Running Cal.com with ReadySet

Short description:

It's a Node.js application. You just run a Yarn command and things just work. This is Cal.com, an open-source version of Calendly. It allows people to book time on your calendar, create different kinds of events, and even group events. When running against Postgres, with about 10 million rows in the booking table, the page load time increases significantly. However, switching to ready set solves this issue, and the application continues to work seamlessly.

It's a Node.js application. You just run a Yarn command and things just work. I'm going to switch over to doing that demo.

This is Cal.com. I've got it running in a web browser here. It's running via a dev server on my local machine over here. The application is basically, if y'all are familiar with Calendly, it's basically like an open source version of Calendly. The idea here is you can create event types that allow people to book time on your calendar subject to your availability and then people can book various different kinds of events on your calendar. Some of those events can be recurring. They can have meeting links associated with them. You can create group events with people. It's actually a pretty nice application.

This is Cal.com running just out of the box pointed at Postgres. You can see I just clicked on the bookings tab here. Just to demonstrate, ready set, I've generated about 10 million rows in this booking table. It takes a little bit to load this page when we're just running against Postgres. So it's a great example of what happens when suddenly your application hits an unexpectedly high amount of load. And we can see things do indeed slow down. Yep.

So as I said, I'm running Cal.com locally. Let's just go through the process of pointing this at ready set. So I'm also running ready set locally on my machine. It's running on port 5435. So all I have to do to switch to ready set is I go to my config, which is just a dot end file. And switch the port. And then I can run the dev server again, and it'll take a second to compile, because it's a dev server, so it's recompiling everything. And just to be extra clear there, normally you probably have to switch more than just the port. But Griffin is running everything on a single machine, so it happens that the URL change to point ready set is pretty minimal in this case. Yeah, that's correct. But we can see once Next.js compiles everything that I can still click around and the application still works perfectly fine.

8. Caching Queries and Intelligent Caching

Short description:

I'm not seeing any speedup on this bookings page. Ready set keeps track of queries that are not cached and proxies them to the upstream database. Queries are cached for specific users who request the data, based on placeholders in prepared statements. ReadySet intelligently caches queries for different user IDs. It can significantly improve the performance of a poorly performing query.

I'm still getting the same results everywhere. But I haven't actually cached any queries yet, so I'm not seeing any speedup on this bookings page. What we can do, there's a few paths here to sort of explore what it looks like to cache individual queries. One, you can see I'm just logging queries here. Prisma by default logs queries that it runs. So you can see there's a bunch of queries in the logs here.

Ready set also keeps track of all of the queries that we're not caching. When we're not caching a query, we proxy it to the upstream database. And I can get a list of those queries by just connecting to ready set by a PSQL shell like I've done over here, and then running show proxy queries. And that gives me a list of sort of all of the queries that ready set is proxying to the upstream database, along with whether or not this query can be cached. And you know, as you can see, this is this application is running quite a few queries.

And just to be clear, when we say the word query here, it's useful to sort of be precise about what we're talking about here. Really, I think the easiest way to think about this is prepared statements. So you can see, you know, when Prisma runs a query, we've got these placeholders with the dollar sign here. You know, ready set also sort of thinks about queries in terms of sort of a query is the same query modulo a particular value for a placeholder in a prepared statement. And then when we cache a query, we're actually saying that we want to be able to cache individual results sets for values of those placeholders. So you could imagine a query to load data for a particular user filtering by a user ID. We cache a query and we say cash this query for all values of the user ID except we don't actually spend memory caching results sets for every single user. We only store the results sets for the query for the users who are actually requesting that data. So for the users who are actually logging in. And the idea here is, you know, if you have an application, you might have thousands or even millions of users in your database, but only a few actually log in on a particular week. You know, there's this sort of log normal distribution of which users are logging in. You have your really active users and then your users who sort of never open the application. And for a user who's never logging in, we don't want to spend memory and we don't want to spend compute maintaining the data in the cache for those users. So what ReadySet will do is we'll only store data in memory for the queries that are actually the result sets that are actually being requested. And then as we sort of hit memory pressure, we'll evict on a least recently used basis. And if all this sounds a bit abstract, I think the takeaway here is that if you tell it to cache, you know, select star from users where user ID equals 14, it's not just going to cash the result where user ID equals 14. It's actually intelligent enough to kind of generalize from that and cache the same query for other user IDs as well. Yep. So I did some playing around with this application ahead of time and I found one query that, you know, performs particularly poorly and that ReadySet can help with quite a bit.

9. Speeding up Queries with Ready Set

Short description:

To speed up this query with Ready Set, you can cache the result sets for the query and only store the result sets for the user IDs that are actually read. This significantly reduces the query time from about 700 milliseconds to a fraction of a millisecond, allowing the page load to fit within the 200 millisecond budget. It's a game-changer.

So there's this query, it's got a couple of aggregates, it's got, you know, I'm in an account and it's filtering by, as I said, filtering by user ID. And this is, you know, I'm not entirely familiar with cal.com, but just from reading the query, it looks like this is getting sort of the earliest booking per recurring event. So you can have recurring events and you might want to say the next booking for a particular recurring event. And we can see if I run this query, I happen to know my user ID is 14, so I'm filtering by 14, by user ID 14. If we run this query against Postgres, so like uncache by ReadySet, we can see it takes a fair amount of time. I mentioned earlier that most page loads have a budget of about 200 milliseconds for the page load to feel instantaneous, subjectively, to the user, and a particular page load might do tens of queries. So you really need each individual query to be pretty quick in order to fit that 200 millisecond budget, and this one query just on its own is already blowing way past that by a factor of three. So let's walk through the process of speeding up this query with Ready Set. So if I want to cache this query, all I have to do is just take the query and prepend, create, cache, from, again, this is all just in a PSQL shell. And now Ready Set is going to be caching the result sets for that query and will, as Nick said, be able to say, okay, we know that this user ID is 14 here, but we're going to actually cache result sets for any user ID and then, as I said, only store the result sets for the user IDs that are actually read. So now, after I created the cache, if I run the query again, it will take a second to precompute the result sets the first time. It's warming up the cache. But then if I run the query again, we can see it's running pretty quickly. So we've gone from, you know, about 700 milliseconds to a fraction of a millisecond. And you know, you sort of, this, I think, kind of changes the game.

10. Optimizing Querying Data

Short description:

Web applications often face the problem of overquerying, where more data than necessary is retrieved. This can be addressed by optimizing queries or using GraphQL, but these solutions have tradeoffs. An alternative approach is to make querying unnecessary data cheap, ensuring fast performance even with multiple queries per page load.

You know, another thing I wanted to talk about here is a problem that you run into with a lot of web applications is that you tend to, like, query more data than you need. There's this overquerying problem. You might have a function that says, give me all of the information that you have about this particular user. And then every page load calls that function because some pages need some data and some pages need other data. But, you know, that function might be making a query that never ends up getting used on a particular page. And so one of the steps that you might take if you're trying to optimize this application would be to go through and, you know, add flags to your function, add a bunch of configuration to make sure you're only querying the data you need. You know, this is why people use GraphQL. But GraphQL is complex and comes with its own set of tradeoffs. I think, you know, a much more compelling in my opinion solution to this problem is just to make querying data that you don't need so cheap that you don't care anymore. And I think that this sort of latency, you know, if we're looking at sort of a fraction of a millisecond, you know, we could be doing tens of these queries on a page load, and we're not going to get anywhere close to our 200 millisecond page load time budget.

11. Caching and Benchmarking with ReadySet

Short description:

With ReadySet, the result set of a cache is updated when an insert is made, ensuring fast reads even with many writes. Running benchmarks against Postgres shows high latency and low queries per second, indicating the need for scaling out with read replicas. In contrast, running the same benchmark against ReadySet significantly improves latency and allows for scaling up to around 200,000 queries per second.

And then the other thing that I wanted to talk about here is that, you know, ready set, you know, most caching solutions, if you were to do an insert into this table, you have to throw the result away and then recompute the result set. With ready set, if I do an insert into this table that's going to update the result set of the result of this cache, and then I rerun the query, we can see, you know, this number it used to be 100,008, now it's 100,009. This number changed, but the latency has not gone up. That's because we're not recomputing the result set for the query. We're actually keeping the result set up to date in place based on the data that is flowing into the upstream database.

So, the idea here is that no matter how many writes you're getting, your reads are still fast, and they're still as fast as you're seeing here. And all of these sort of queries that I've been running are just like sort of a one off query that I'm running in the pSQL shell, which is not super realistic. In a real world web application, you're not just going to be running one query at a time. You're going to be running, you know, thousands or hundreds of thousands of queries a second And potentially, if your application is getting a lot of load. And to sort of walk through what that looks like, I'm just going to run PgBench, which is a benchmarking tool that's distributed with Postgres. And just to establish a baseline, I'm going to run this against Postgres itself. This benchmark.sql file contains the same query that we've been running. And what we're doing here is we're running 32 concurrent clients, and we're making 32 queries at a time. And we can see that when we run this against Postgres, our latency actually spikes quite a bit. So previously, we were looking at somewhere around 700 milliseconds. Now we're looking at 3.5 to 4 seconds of latency, because Postgres, a single Postgres database server doesn't scale super well. And our queries per second is like nine-ish. That's not good enough for a real-world production application that's getting a lot of load. So this is the scenario where you might have no choice but to scale out with read replicas.

With ready set, I can run the same benchmark against ready set, again, by just changing the port. This is default connection settings. I'm running this all locally. So I'm going to run the same benchmark against ready set. And just to sort of frame this, when we're running the pSQL shell, we're using sort of the ad hoc query mode. These benchmarks are using prepared statements because ready set, like all databases, does better with prepared statements because we're able to only parse the query once. And we can see the latency actually gets even better than we were seeing in the pSQL shell for that reason. And the queries per second can scale up as high as, you know, around 200,000. And I think 200,000 is usually good enough for most applications. I think you have to be doing pretty well for yourself for 200,000 queries per second to be not enough. But in the case that you are doing that well for yourself, as Nick said earlier, ready set can horizontally scale.

12. Scaling and Trade-offs

Short description:

So we can run double the ready set servers, double the ready set cache servers, put no additional load on the upstream database because we're only replicating data out of the upstream database once, and serve double this. So we can serve 400,000 queries per second with double the servers. You do have to be aware that you're getting eventual consistency. The results you get in a cache may be slightly out of date. Memory usage is an issue with any caching solution. It works best for read-heavy loads.

So we can run double the ready set servers, double the ready set cache servers, put no additional load on the upstream database because we're only replicating data out of the upstream database once, and serve double this. So we can serve 400,000 queries per second with double the servers. And we've actually seen even significantly better numbers than this on single nodes, but this is just Griffin's development machine, which is just running on the same machine. So it's running on my desktop.

All right. So you might see all that and say, well, there's no such thing as a free lunch. What do I have to give up? Is this too good to be true? So there are some small trade offs that you have to make with ready set. I think in general, they're often not a huge deal, but I'm going to talk about them a little bit so that you're aware of them. First off, most notable is you do have to be aware that you're getting eventual consistency. You're not getting the same strongly consistent guarantees that you would get querying your upstream database. What that means in practice is just that the ready set, the results you get in a cache may be very slightly out of date. There's a little bit of lag time for the data to replicate through and update your cache data. That may be less bad than you might think. A lot of queries, that's fine. Some it's not. But as we saw, you can very easily enable and disable caching for individual queries as needed. If there's particular queries that you know that you really have to have those guarantees that everything is up to date, for example, like with a bank account balance, you don't want to have eventual consistency there because you don't want the user to be able to spend the same money twice because maybe they tried to spend it a second time before the balance had updated in the cache results or what have you. In that case, you would obviously not want to cache that query. But at the same time, if your same hypothetical bank has a dashboard that the user can look at to see your recent transactions, that might be completely fine to cache that because if they make a purchase and it doesn't show up on their phone app for a split second, no one's going to really notice or care about that. It's not a big deal at all.

Secondly, memory usage, I mean, obviously, this is an issue with any caching solution. When you're caching data, it has to go somewhere. So the more data you cache, the more memory you use. But again, being able to turn caching on and off for individual queries makes that generally pretty easy to manage. You can decide which queries are worth caching and spending the memory on. And additionally, as we also mentioned, you can scale out to multiple nodes if needed. If you really want to cache more data, then it'll fit on a single node. That's also an option.

Finally, this may also go without saying, but it works best for read-heavy loads. I think when we looked at that diagram earlier where we kind of showed how the replication happens and how ReadySet keeps results up to date, you may have noticed that there was an arrow for writes that goes straight from the client to the database.

13. ReadySet Usage and Deployment

Short description:

ReadySet is not designed for write-heavy applications or data ingestion. It focuses on improving read performance and optimizing queries. The project is still in the early stages, with ongoing bug fixes and the addition of support for more SQL functions and operators. Community feedback is encouraged, and a cloud version of ReadySet is available for production use. Installation options include local deployment or using the cloud version. Deployment assistance is provided, including a Helm Chart for Kubernetes.

Writes go directly to the database. They don't go through ReadySet. If you're building some kind of crazy write-heavy application where you've got millions of IoT sensors dumping data every second, then ReadySet's just not going to help you with that. Maybe it will still help you with reads if you have a read performance issue for that same application. But this is all about doing queries and not about ingesting data.

And finally, we're still early stage. We're still finding and fixing bugs, although we've been making really great progress with that lately. Not all queries are supported, because in order to know how to keep data up-to-date in our caches, we have to actually execute the queries and know how the different SQL functions and operators and things work. So we haven't implemented support for all of those yet, but we're adding more all the time.

And, you know, we're fully open source, so we always welcome community feedback. If there's any feature that's not there that you'd really like to see, or any kind of issues you're having, please absolutely just go on to GitHub. Leave a comment, file an issue, and we'll be very happy to receive that and talk to you and take a look at whatever it is that you're finding and seeing. Oh, we do have a question.

Oh, yeah. Must ReadySet be installed locally? Oh, that's a great question. It definitely doesn't have to be installed locally. It can be installed in, you know, whatever kind of supported machine you want. Additionally, we are offering a cloud version of ReadySet that we set up and administer and run and support. So, you know, if you want to just play around with it, absolutely download the open source version. But if you want to run it in production for your company and you don't want to deal with installing and maintaining it yourself, we're more than happy to talk about cloud customers. In terms of deployment options, we have a Helm Chart for running in Kubernetes. You can also deploy it using, you know, sort of on bare metal or on an EC2 instance. But yeah, if there's a particular way you want to deploy ReadySet, you want help doing that, I would say make a GitHub issue and we're going to be happy to help you. Yeah. Yeah, absolutely. Thank you guys so much. Great. All right.

Watch more workshops on topic

How to Solve Real-World Problems with Remix

Remix Conf Europe 2022

195 min

How to Solve Real-World Problems with Remix

Featured Workshop

Michael Carter

- Errors? How to render and log your server and client errorsa - When to return errors vs throwb - Setup logging service like Sentry, LogRocket, and Bugsnag- Forms? How to validate and handle multi-page formsa - Use zod to validate form data in your actionb - Step through multi-page forms without losing data- Stuck? How to patch bugs or missing features in Remix so you can move ona - Use patch-package to quickly fix your Remix installb - Show tool for managing multiple patches and cherry-pick open PRs- Users? How to handle multi-tenant apps with Prismaa - Determine tenant by host or by userb - Multiple database or single database/multiple schemasc - Ensures tenant data always separate from others

database validation error monitoring remix

Relational Database Modeling for GraphQL

GraphQL Galaxy 2020

106 min

Relational Database Modeling for GraphQL

Top Content

WorkshopFree

Adron Hall

In this workshop we'll dig deeper into data modeling. We'll start with a discussion about various database types and how they map to GraphQL. Once that groundwork is laid out, the focus will shift to specific types of databases and how to build data models that work best for GraphQL within various scenarios.
Table of contentsPart 1 - Hour 1 a. Relational Database Data Modeling b. Comparing Relational and NoSQL Databases c. GraphQL with the Database in mindPart 2 - Hour 2 a. Designing Relational Data Models b. Relationship, Building MultijoinsTables c. GraphQL & Relational Data Modeling Query Complexities
Prerequisites a. Data modeling tool. The trainer will be using dbdiagram b. Postgres, albeit no need to install this locally, as I'll be using a Postgres Dicker image, from Docker Hub for all examples c. Hasura

graphql database

Build a Full Stack React Native App with Oracle 23ai

React Summit 2024

37 min

Build a Full Stack React Native App with Oracle 23ai

WorkshopFree

Doug Drechsel

In this workshop, you will set up a local full-stack environment and create a React Native Mobile app that runs against that stack.
Agenda:- Install Oracle 23ai Docker container- Build and run Parse Server with the new Oracle Storage Adapter - Build and run a Walking History React Native mobile app against the stack
Walking History is a React Native application that allows you to walk around New York City (or simulate that in a device emulator) and it tells you about the closest attraction or point of interest.  

Building a Realtime App with Remix and Supabase

Remix Conf Europe 2022

156 min

Building a Realtime App with Remix and Supabase

Workshop

Jon Meyers

Supabase and Remix make building fullstack apps easy. In this workshop, we are going to learn how to use Supabase to implement authentication and authorization into a realtime Remix application. Join Jon Meyers as he steps through building this app from scratch and demonstrating how you can harness the power of relational databases!

authentication database remix

Building a GraphQL-native serverless backend with Fauna

GraphQL Galaxy 2021

143 min

Building a GraphQL-native serverless backend with Fauna

WorkshopFree

Rob Sutter

Shadid Haque

2 authors

Welcome to Fauna! This workshop helps GraphQL developers build performant applications with Fauna that scale to any size userbase. You start with the basics, using only the GraphQL playground in the Fauna dashboard, then build a complete full-stack application with Next.js, adding functionality as you go along.

In the first section, Getting started with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.

In the second section, Building with Fauna, you learn how Fauna automatically creates queries, mutations, and other resources based on your GraphQL schema. You learn how to accomplish common tasks with GraphQL, how to use the Fauna Query Language (FQL) to perform more advanced tasks.

serverless graphql database backend

Building GraphQL APIs With The Neo4j GraphQL Library

GraphQL Galaxy 2021

175 min

Building GraphQL APIs With The Neo4j GraphQL Library

WorkshopFree

William Lyon

This workshop will explore how to build GraphQL APIs backed Neo4j, a native graph database. The Neo4j GraphQL Library allows developers to quickly design and implement fully functional GraphQL APIs without writing any resolvers. This workshop will show how to use the Neo4j GraphQL Library to build a Node.js GraphQL API, including adding custom logic and authorization rules.

Table of contents:
- Overview of GraphQL and building GraphQL APIs
- Building Node.js GraphQL APIs backed a native graph database using the Neo4j GraphQL Library
- Adding custom logic to our GraphQL API using the @cypher schema directive and custom resolvers
- Adding authentication and authorization rules to our GraphQL API

graphql api development database backend

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

TypeScript and the Database: Who Owns the Types?

TypeScript Congress 2022

27 min

TypeScript and the Database: Who Owns the Types?

Top Content

Dan Vanderkam

Independent Software Developer, author of Effective TypeScript.

We all love writing types in TypeScript, but we often find ourselves having to write types in another language as well: SQL. This talk will present the choose-your-own-adventure story that you face when combining TypeScript and SQL and will walk you through the tradeoffs between the various options. Combined poorly, TypeScript and SQL can be duplicative and a source of headaches, but done well they can complement one another by addressing each other's weaknesses.

typescript database backend typescript libraries

I Would Never Use an ORM

JSNation 2023

29 min

I Would Never Use an ORM

Top Content

Matteo Collina

Author of Pino & Fastify, Platformatic.dev Co-Founder & CTO

What's an ORM? An Object-Relational Mapping tool (ORM) is a library to map a SQL table to a Class. In most cases, ORMs force the users to structure their code to have Model objects that include both data access and business logic.
Once upon a time, I did several projects using ORMs as I followed the common belief that they would simplify the development and maintenance of projects. I was wrong. ORMs are often a hurdle to overcome for the most complex part of a project.
As the next stop of my journey, I recommended people use the native languages of their databases, e.g., SQL. This works great for the most part, but it creates quite a struggle: there is a lot of boilerplate code to write that can be pretty tedious. I was wrong, again.
Today I'm presenting you Platformatic DB.

node.js database

Database Access on the Edge with Cloudflare Workers & Prisma

Node Congress 2022

31 min

Database Access on the Edge with Cloudflare Workers & Prisma

Alex Ruheni

Developer Advocate at Prisma

Edge functions are pushing the limit of serverless computing – but with new tools, come new challenges. Due to their limitations, edge functions don't allow talking to popular databases like PostgreSQL and MySQL. In this talk, you will learn how you can connect and interact with your database from Cloudflare Workers using the Prisma Data Proxy.
You can check the slides for Alex's talk here.

node.js database prisma cloudflare

Bring AI-Based Search to Your Web App

JSNation 2023

31 min

Bring AI-Based Search to Your Web App

Sebastian Witalec

Sebastian Witalec

Head of Developer Relations at Weaviate

ChatGPT took the tech world by storm. Everyone talks about it, from your CTO to your hairdresser (at least my barber does). And there are many reasons why we should all be excited about it and many other AI/ML innovations.
But how do you bring them into your tech stack, your website/backend, to work with your data and provide AI-driven search and data augmentation?
There is a new generation of AI Native databases, which use deep learning models to find answers to natural language queries. We are talking about the ability to search through text, images, videos, DNA, or any unstructured data, all with a single query.
The rule of thumb: if there is an ML model, we can search through it.
Join me to learn about the foundation blocks (LLMs and vector embeddings, Vector Databases), how they all play together and most importantly - how you can build something yourself with open-source tech.
And, of course!!! There will be a live-coding demo, where I will take you through the experience of building an AI-based search – with Weaviate, an open-source Vector Database – and adding it to an app. Now the question... should this be done in Angular, React, Vue or just pure JS ;)
#MayTheDemoGodsBeWithUs

database artificial intelligence

Remix Persistence With DynamoDB

Remix Conf Europe 2022

41 min

Remix Persistence With DynamoDB

Brian LeRoux

Architect (arc.codes) Maintainer, Enhance Contributor, an AWS Serverless Hero

Remix is the best React framework for working with the second most important feature of the web: forms. (Anchors are more important.) But building forms is the fun part: the tricky part is what happens when a web consumer submits a form! Not the client side validation logic but the brass tacks backend logic for creating, reading, updating, destroying, and listing records in a durable database (CRUDL). Databases can be intimidating. Which one to choose? What are the tradeoffs? How do I model data for fast queries? In this talk, we'll learn about the incredibly powerful AWS DynamoDB. Dynamo promises single-digit millisecond latency no matter how much data you have stored, scaling is completely transparent, and it comes with a generous free tier. Dynamo is a different level of database but it does not have to be intimidating.

javascript database remix

Local-First Software With ElectricSQL

React Advanced Conference 2023

29 min

Local-First Software With ElectricSQL

James Arthur

Local-first is a new paradigm for developing apps, where your components talk to a local embedded database and you get instant reactivity, multi-user sync and conflict-free offline support built in. ElectricSQL is a new, open-source, platform for local-first development from the inventors of CRDTs. This talk introduces local-first development and shows how you can develop real-world local-first apps today with React + ElectricSQL.

case study database builders and founders