Scaling Databases For Global Serverless Applications

Rate this content
Bookmark

This workshop discusses the challenges Enterprises are facing when scaling the data tier to support multi-region deployments and serverless environments. Serverless edge functions and lightweight container orchestration enables applications and business logic to be easily deployed globally, often leaving the database as the latency and scaling bottleneck.


Join us to understand how PolyScale.ai solves these scaling challenges intelligently caching database data at the edge, without sacrificing transactionality or consistency. Get hands on with PolyScale for implementation, query observability and global latency testing with edge functions.


Table of contents

  •         - Introduction to PolyScale.ai
  •         - Enterprise Data Gravity
  •         - Why data scaling is hard
  •         - Options for Scaling the data tier
  •         - Database Observability
  •         - Cache Management AI
  •         - Hands on with PolyScale.ai

83 min
11 Feb, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Polyscale is a no-code database edge cache that aims to make caching and speeding up the data layer effortless for developers. It uses AI and machine learning to adjust caching behavior, selectively choosing what to cache and automatically setting cache duration. Polyscale supports multiple databases and provides detailed observability. It eliminates the need for caching at the application tier and offers significant performance gains.

Available in Español

1. Introduction to Polyscale and Scaling Data

Short description:

Welcome to this Node Congress talk on scaling databases for global serverless applications. Today, I'm going to introduce you to Polyscale and talk about the challenges in managing and scaling data. We'll also explore different options for scaling database systems and dig into how Polyscale works. Feel free to ask questions and get hands-on with the code.

♪♪ Yeah, welcome, everybody. I can still see a few people coming in, but we'll get kicked off anyway, so welcome to this Node Congress talk on scaling databases for global serverless applications. Let's just show the agenda for today. Just a couple of housekeeping rules, everybody feel free to use the chat, you can also unmute, there's quite a lot of people on this, so stay muted if you're not talking, but please unmute yourself, ask questions, let's make this as interactive as possible, because I think there's lots we can learn by collaborating our questions.

So, yeah, feel free to use the chat, but obviously, unmute and introduce yourself and, by all means, jump in at any point if you have questions. Those are very welcome. So, yeah, hopefully, everybody can hear me OK, and this is our agenda for today. A quick bit about myself, I'm the founder of PolyScale and my background's really been in working with sort of large data companies and database companies. And I've sort of been in the solution architecture and sales engineering space for many years. And sort of the last few companies I was at, you know, Alfresco was an open source document management company. So we dealt with very large amounts of data repositories for specifically the documents of web content management. DataSift was one of the early adopters of the Twitter firehose and ingested all of Twitter's data and later on, that worked with Facebook and LinkedIn to ingest all of that data and serve insights in a privacy safe way. And then from there, I moved to Elastic here in the Bay Area, in San Francisco, where I was working with some large enterprises with the Elastic Solution and scaling those and then before founding Polyscale, I was at a company called Rockset, which is a cloud-based Database for realtime analytics. So throughout those last few years, I've always been focused on managing large volumes of data and understanding how to scale those globally. That's really been the challenges I've been working on.

But today, we've got a great talk lined up. I'm going to introduce you to Polyscale and we'll get hands on with that for people who want to actually test it out. And I'll talk a bit about the challenges really in the enterprise and small businesses as well around managing data and scaling data. And why is that hard? Why is it hard to move data to different regions and to keep and maintain your transactionality? And so we're also going to talk about what are your options, what do people do today to scale these database systems and data platforms. And then, as I mentioned, we'll dig into Polyscale and what Polyscale is and how it works. And as I say, everyone's welcome to get hands on. We've got some code to play with and get started with. So, yeah, as I say, everybody who would, you know, any questions, feel free to chime in, unmute yourself, introduce yourself, you know, very welcome to take questions.

2. Challenges of Scaling Data and Data Gravity

Short description:

Polyscale exists to address the challenges of scaling stateful cloud-native data. Scaling databases to different countries or independently presents difficulties, from CAP theorem considerations to maintaining transactionality and managing data sharding. Accessing data across continents in a low latency environment is also complex. Data gravity, the creation of data across multiple systems, poses challenges for enterprises. Applications and users require access to data from different locations, making data movement difficult. Location, data regulations, and global growth further complicate the situation. However, there are options for deploying web apps and replicating static content and business logic locally with companies like Netlify and Vercel. CDNs and containerization tools like Fly and Section provide flexibility in deploying to multiple regions.

So, really, the premise for, you know, why Polyscale exists and why, you know, our focus is really scaling stateful cloud-native data is hard. So, anyone who's been in a role where you've had to, you know, take a database, for example, and scale that to different countries, or even just scale it independently, you know, you'll understand the challenges of doing that. And there's a whole bunch of reasons around like why that is hard, from things along the CAP theorem lines to, you know, maintaining my transactionality, you know, how do I shard my data, where does that reside? And then how do I sort of manage just the laws of physics around accessing that in a low latency environment across continents, for example?

So, great example that we've got people from all parts of the planet dialed in today. And making data fast for everyone is a hard and complex thing, and specifically here, we're talking also about cloud native data. So, what happens when you are in a situation where, you know, you have a data center outage, where one region goes down, or a part or component of your architecture goes down. You know, how do you support those environments? And that's what makes it hard. It's maintaining that state and it's doing that in an unforgiving environment like the cloud.

So, you know, I talk a lot about this concept of enterprise data gravity. And you know, kind of what is that, and how does it affect us? The reality is data is being created everywhere. And it can be across multiple disparate systems. And inside a single enterprise, there are likely many, many different data persistence layers. It could be things from data lakes, databases, warehouses. And that stuff is being created all day every day. It's just an ongoing thing. With sort of AI and machine learning, sensor data and IoT, that's just accelerating massively. So what tends to happen is, within enterprises, more and more applications want to use that data. And people are requesting access to that data from different departments, from different locations. And that then gives us this inability to move the data. We find ourselves kind of shackled to, once somebody is using... Let's say we have a large transactional database that says Postgres, and that's running from a data center in New York. And we now have, you know, large numbers of users actually actively using that. It becomes harder and harder to kind of access that data in different locations and move that data. So location becomes critical depending on where your users are. And then the other sort of big consideration around data and usages, obviously, what are the data regulations around this? Things like privacy and compliance, and PII data, what are you storing in what countries? You know, is it GDPR compliant? And this is becoming more and more what is incredibly critical today, but it's only becoming more critical as we as we progress with these data systems. And then lastly, you know, we've got to consider global growth. So it could be a simple web application. We find that we now have a growing audience that are spanning different countries or parts of countries. And how do we make sure they can access that data and how do we deal with that global growth? And the same within sort of a non-customer facing application. Maybe it's something like a be a environment whereby people are accessing internal intelligence data using tools like click tech and tableau. So same types of problems. We're putting new offices in different locations. People are working from home, working remotely. How do we support that growth? And that's really the essence of the problems that come with sort of data gravity.

I think probably a lot of people on this this talk will be very familiar with the application tier has a lot of agility. And so I think that's all the time I have today. Like if I said to everyone on this call, like, how would you deploy a web app? I think we have a lot of different options. And if I said, how do I deploy a web app to different regions? I think there's a huge number of options as well. And I kind of think about this, the content and the code. So static content and business logic can easily be replicated locally. And when I think about, if I had a server where we have these amazing companies like, you know, Netlify and Vercel, who can, you know, push my application logic and my static content everywhere. And I can run functions at the edge. And I can knit that together with, you know, a simple load balancing plan that will allow me to resolve to my closest application server. So it's really quite achievable now for any size of business or individual. Just so, you know, an indie hacker to be able to set up global load balancing. We can deploy to CDNs for our static content and our functions. And we're also seeing the ability to, you know, these new players pop up like Fly and Section who allow you to containerize your code and push that everywhere into multiple regions. So as far as the business logic and your static content goes, there really is a huge amount of flexibility. And that's a great space to be in. I'll just pause there. Any questions before we move on? Anything in the chat or anyone wants to chime in with a question? Good.

3. Challenges of Scaling Data and Compliance

Short description:

In terms of compliance, enterprises are defining policies around where data can reside. Products are making it easier to adhere to these policies. Deploying data-driven apps globally remains complex and expensive. Challenges include consistency, transactionality, latency, and scalability. People often turn to application level caching to make web applications fast in multiple regions.

I guess stuff's pretty straightforward at the moment, so hopefully you're all good. Take a look.

Hi. Hi. I put a question in the chat. So essentially the question is, in terms of compliance, what if you have a user in the EU, but the server and the product or service in the US, how do we deal with the data store? Yeah, it's a good question. And I'm going to tell you a little later how Polyscale deals with that. But, but a high level, you know, enterprises are defining the policies around where data can reside. And there's a huge industry in auditing and compliance around what data is stored where, and you have to adhere to those policies. Now, products are making that easier and easier. And you know, in in kind of a basic sort of web application world, you may, you know, have a block list of locations that you want to deploy that data to. Now it gets more complex if you're thinking around, you know, you've got global audiences who reside in certain countries and accessing data in other countries. So for example, if I'm a resident in Germany, for example, and I'm using a service in North America, is that you know, am I okay with storing my data outside of Germany, or must it be stored in Germany? So I think these are things that you must build into your product from kind of from day one. I think these are absolutely critical. If you're storing customers' data, you have to start with that mentality of what are the policies that we need to adhere to. So yeah, I'll show you how we support that with Polyscale. But really, it's something that you've got to build into your products from early stages. And there's a lot of products now out there that provide frameworks that you can interact with and integrate with to make that easier. So from a compliance perspective, you go to a single location. You can easily audit what data is being stored where. And that's kind of critical for SoC to Compliance and other forms of compliance.

So, yeah, as I was sort of talking about here, we've got a lot of agility in kind of that business logic space in the content. The CDN market has really made this trivial now for us to deploy everywhere. But the data-driven apps, making those fast globally still remains to be complex and expensive. So it's one thing to kind of make a data-driven app fast in a single location. But as soon as we start thinking about doing that everywhere, we really run into these complexity issues and expense problems. And the expense can be everything from developer's time, from architect's time, using different types of databases, data systems. It could be your AWS and your storage costs. And the complexities are typically ongoing. So it's not simply a case of let's start up a database and let's put a couple of read replicas in, and then we'll forget about it. It's more a case of what happens when they start lagging and get out of sync? What's the implications on my application tier? Do I need to start separating out my reads and my writes in my code layer? So there's sort of complexities and expense all around all parts of the business. So why is this hard, like the global data challenges. One of the core ones here is really sort of consistency and transactionality. So, the classic example is I'm running an e-commerce website that may be being served from, let's say, the East Coast of the States, New York. I go and buy a product, someone else on the planet shouldn't be able to go and buy that exact same product because that may well be out of stock. Or, in simpler terms, wherever people are actually witnessing what the stock levels are, those should be accurate in real-time. They should be consistent. So, you've got to decide, and one of the trade-offs here is like, are we using asset-compliant systems versus eventually consistent systems? And really, that depends on the use case, what are your tolerances for different semantics here? And the classic one, which obviously the CDN space has done an amazing job at looking at, that is that latency one, the classic speed of light. How do I serve a customer who's in a location who is far away from my data? And then the other one is, sort of, I've talked here a little bit about sort of fast-lived TCP connections, but really it could be anything that is your data platform is dependent on. So, you know, in sort of a more transactional database, Postgres, MySQL, whatever, you know, TCP connections really matter. And that hits your scalability challenges. So what if you're using things like, you know, functions that run at the edge or, you know, AWS Lambda, things that are creating a high number of short-lived connections, how do I scale those and how do I support those? So these are some of the core challenges that people face with scaling into different regions.

So, you know, how do we address this today? Like, if I said to kind of people on this call, like, I've got a web application, let's just sort of tail this down to web applications. And let's say, you know, I need to make that fast kind of in the US and I need to make that fast in Europe as well. Like, how would we do that? And I think the core things that I've seen in my career and I see quite regularly are, people turn to application level caching. So people are going to be really comfortable in writing code that will save that data. We can reuse it. We can hydrate it. We can evict it from the cache intelligently. We could use tools like Redis or Memcache that scale nicely.

4. Caching, Global Databases, and Polyglot Data Tier

Short description:

And we can also share state with these if we want to. So these are sort of the go-to areas for application caching. There's pros and cons of every approach. I've seen all of these work great in production. There is cost and complexity that comes with all of them. The classic one with application caching is you need to write that code. If your data systems are scaling multiple platforms, you may have different caching policies to adhere to. It can get complex quickly. We're seeing amazing global databases like Cockroach and Yugabyte. Migrating databases can be non-trivial. The proposition of migrating a database is typically a large process. The polyglot data tier is accurate for most businesses with multiple repositories. Take the slowest feature and break it out to scale independently. Use change data capture streams to populate other data sources. People are breaking up features and using the right tool for the job.

And we can also share state with these if we want to. So these are sort of the go-to areas for application caching. Now, there's pros and cons of every approach. And I just want to be clear here that I've got sort of, you know, I've seen all of these work great in production. And, but I've also seen, you know, there is cost and complexity that comes with all of them.

So the classic one with sort of application caching is you need to write that code, right? That's not a sort of typically not a quick thing, depending on how large or complex your web application is. And if your data systems are scaling multiple platforms, let's say we have a, you know, a real estate website and we're going out to various different data repositories to build data around, you know, the property history and the current mortgage rates. They may have different caching policies that you need to adhere to within your application. And then on sort of an ongoing basis, you would, you know, every feature that you add to your application, you potentially need to, you know, incorporate your caching there as well. So it's kind of an ongoing thing that, you know, can be abstracted out nicely, but there is an overhead. And, you know, tell me, I'd love to hear chats and comments, but, you know, anyone who's actually written a caching layer to a reasonable size application, it can get complex really quickly. So, and then you obviously have the knock on effect of, you know, communicating how that's working to the rest of the engineering team as well. So definitely pros and cons of all of these. We're also seeing, you know, the last few years we're seeing some amazing sort of global databases pop up like Cockroach and Ugerbyte. You know, databases are focused on that global challenge. You know, how do we efficiently shard? Where do we do our transactions? You could configure and define all of that behavior, which is, which is incredible. And, you know, I'm a big fan of these systems. And what's, what is often a challenge in the enterprise is the proposition of, hey, go migrate to database X. You know, we know that database X can do this for you. Let's go and migrate off of what you're currently using to this database. And that may well be an option if it's a brand new project that could be a, you know, no problem at all. But in again, in larger enterprises or even smaller businesses, the proposition of migrating a database is, non-trivial. You know, all the transactional semantics are same. Is my query language the same? Has it got the same features? Is it, do I need SQL? It's typically a large process to go and migrate an entire database. But again, you know, if that's right for your solution, then that's the right way to go, pick the right tool for the job.

Now, what I see a lot of and I've seen a lot over the last few years is kind of the polyglot data tier. And I think this is accurate for most businesses now that have multiple repositories for doing very, you know, using various tools to do different things. And what I'm talking about specifically here is if we say, let's assume we have a large web application and, you know, we now need to start accelerating and making things faster. We'll take, you know, typically one way to approach us is you take your slowest feature. What's the one that's really top of the product managers list around performance for your customers? Is there something that really bites them every day? And you'll get that from your APM analysis. What's the customer satisfaction? Is it a specific page? You know, I know, for example, I logged into my unnamed cell phone provider account and things are incredibly slow. I just wanna look at my bill for this month and last month and, you know, what numbers, how many messages have I sent and how much data have I used? And that stuff's all so slow and it's going and pulling that from a remote location. Maybe the database is busy. Maybe it's overloaded. Maybe I've done that in my lunch hour. So we're seeing spikes of load at that period. So it's a, you know, a common practice to then say, right, let's take just a single feature and let's break that out from our platform and start scaling that independently. And that can be as extreme as, okay, let's go and use a brand new data store to actually support that feature. And then you get into the complexities of things like, well, how do we maintain state with that external data store maybe we need to use things like change data capture streams to populate other data sources. And a great example here is, you know, maybe I've got a transactional database, let's start using something like my SQL and I want to populate a search engine when things change. And you can use things like change data capture to do that. But let's take an example here. And it's a great way to think about sort of breaking your monolith out into microservices. So maybe we want to take a specific function and run that at the edge. And maybe we want to start using something like a key value store offered by, you know, one of the many CDN providers. So now I'm in a situation where I've got a kind of a transactional database and I've got a key value store and one runs at the edge and one's centralized. And again, that can be a great solution but equally you're adding in a lot of potential complexity into that situation. But we see this a lot. People are breaking up those features and using the right tool for the job. And as I say at the bottom, right, these are all valid.

5. Introduction to Polyscale

Short description:

Polyscale is a no-code database edge cache that distributes data and query compute closer to the user. It aims to make caching and speeding up the data layer effortless for developers. Polyscale integrates seamlessly with current database clients and sits at the TCP layer to talk to the native protocol of the database. It acts as a cache between the client and the origin database. Polyscale focuses on making caching decisions effortless for developers, abstracting away complexities and providing a plug-and-play experience. It can be set up in minutes and requires no additional client libraries.

These are all valid, useful today. And, you know, the other one I sort of haven't talked about really in this list is kind of database read replicas is an obvious one. So, you know, I've got a central location and I wanna start supporting customers or getting that latency down in a different geographic region. You know, your hyperscale is make that super easy today. I can go click a couple of buttons and I can deploy a reed replica. So again, another good option depending on where you need to scale and depending on the behavior.

So yeah, a quick introduction to poly-scale and kind of where do we fit into this world and what are we and this is a bit of a mouthful but we're a no-code database edge cache that distributes data and query compute closer to the user. So, you know, at a Node conference that may sound like a strange thing but we're a no-code application. The idea being that our focus and our real driver behind the platform is that as a developer, we don't need to do anything. You shouldn't have to think about, implementing caching and speeding up your data layer and managing like time to live values and what queries are expensive and how does that change over time? So our whole focus is, what I would call plug and play. We want you to be at a plugin poly-scale into your data tier and nothing changes within your environment. And what I mean by nothing changing is, I mean, you're not having to change your query language, you're not having to migrate your database, you're not having to scale anything differently. Your transactionality stays the same and that's really our focus.

So what we do is, if you're familiar with the CDN, a similar concept and we actually cache database data at the edge or closer to the user. But as well as the data itself, we also offer the compute itself. So if I want to run a SQL query, that's unique, that will actually execute against Polyscale's POPs or point of presence and return you the data from Polyscale if we have it in the cache. So we shift the query compute and the data closer to the user. So very similar to a CDN in that we, you know, we have global points of presence around the world. And you know, our focus is making the data tier as low latency as possible and keeping the costs and complexity low. So as I've sort of, you know, gone through in the previous slides is there are numerous ways to scale your database and you know, caching may or may not be a good fit for you, but our focus is really on making the complexity as simple as possible. So plug and play is really what our driver is. So if you kind of like peel that onion back a bit on what is Polyscale and why is this useful to me as a developer. And you know, we obsess about the developer and operator experiences and top line up here. And you know, we think a lot about what is that, what, you know, everyone in our company has been in this position of having to build these systems and understand them and support them in their history. So we're absolutely fanatical about making this as effortless as possible. And the way we sort of focus on this is to see, well, what costs and complexities can we abstract away? What are those things that are tedious or hard for a developer to do or a DevOps engineer to do? And how can we bring those away from that situation and make those easily achievable? So, you know, the first thing is first is we integrate or Polyscale can integrate in minutes, literally. So I'll show you, and this is something we're obviously gonna do in this workshop is getting connected. And all you have to do is update your database credentials, like your host name and your database username. They're the two things you have to change. So that's typically, you know, an environment variable. It's a simple config change. So you can literally be up and running in kind of 30 seconds to route your data through Polyscale. And the way that works is that we're focused on no code. As I said, we don't want you to have to start, you know, bringing in new client libraries. And again, it's not to say that that's not a good approach. It's just that we don't wanna add another client library to your world. The world's busy enough as it is. And the way we do that is that Polyscale is completely transparent to your current database clients. So if you're using, you know, Prisma or you're using Mongoose or whatever it may be, Polyscale is completely transparent. And the way that we do that is that Polyscale actually sits there at kind of a TCP layer three, layer four and it talks the native YAR protocol of your database. So you're actually talking to a Postgres server or a MySQL server. And just to take a step back, I mean, what Polyscale is, is it's a cache. So you have your database client connects into Polyscale and then Polyscale connects into your origin database. And it's really, really simple to set up which we'll show you. And then you get into sort of that central column which is, okay, I've now got my database data flowing through Polyscale, what do I cache? And that's, you know, again, if you've ever implemented a caching layer can get complex really quickly. And there's the famous saying down there on the bottom left around, you know, caching and naming things. And we definitely would agree with that statement. So really our focus is on making it effortless for a developer to have to think about what to cache. So if you're in an environment where you sort of coming at this with regards to, I've got five microservices and I know exactly what I want to cache and for how long, then great. You know, you can go and configure those.

6. Caching Behavior, Invalidation, and Scaling

Short description:

PolySkele uses artificial intelligence and machine learning to adjust caching behavior based on inbound traffic. It selectively chooses what to cache and automatically sets cache duration. It inspects every query in real-time and maintains transactionality for inserts, updates, and deletes. Polyscale invalidates data from the cache globally for manipulation queries and adapts to database changes. It offers full automation but allows users to override caching settings. Predictive warming accelerates loading for personalized views. Polyscale scales by placing points of presence globally and using connection pooling for short-lived functions.

The flip side of that is how do those change over time? So is there a more optimum TTL value based on, you know, cyclical, time-based fluctuations. So you could say, for example, in an e-commerce world, maybe there's a spike at, you know, midday UTC where lots of people come online in their lunch hour. You know, so how would my caching behavior change through that period, or should it change? And then you get into much more complex systems whereby maybe you have 5,000 unique SQL queries flowing through the platform every day. You know, where do I start there? Which ones should I cache? How long should I cache those for? And that's a really hard thing for a human to do.

And that's where we bring in artificial intelligence and machine learning. And what PolySkele does is it, it interrogates and views every single SQL query every single SQL query that comes through the platform. And it will adjust the caching behavior based on what it's seeing, you know, what's the actual inbound behavior of that traffic. And it automatically sets how long that that data should live for in the cache. And what that means is you can literally turn on PolySkele, as I've mentioned, update your config file, start sending your database traffic through there, and PolySkele will then start selectively choosing what to cache and accelerating your data. And it does that globally. So it does that in every point of presence. So that's a really, we see that as being incredibly helpful.

So say I've got a Drupal system or, you know, something with more complex SQL queries, WordPress or whatever, or an ecommerce platform. You can switch this thing on and it will start caching selectively, you know, your data. Now, the immediate question here is, okay, well, what about invalidation? And what about my rights? You know, what about my inserts, updates, and deletes? My manipulation queries? What happens to those? So Polyscale in real time inspects every query and it works out, like, is this a select or show? Is it a re-query? And if it's not, then it sends that on to the origin database. So your transactionality never changes of your database. We're not changing anything in that behavior. So all those inserts, updates, and deletes get sent through and your consistency stays the same. Your origin database stays the same.

And the other cool thing about this, if you sort of, and we'll go into this and demo this, and have a place. This... Invalidation. So if Polyscale sees one of those manipulation queries, an insert, an update, or a delete, it will automatically invalidate that data from the cache in real time globally. So if we send, you know, update table X, set username equal to foo, it will go and blow away that data from the cache globally automatically for you. So again, the next time the next read comes in, it will pull that from the origin and populate the cache. And you always get the real time, the freshest data. So that's a big piece, we see that as a big piece of engineering that, developers don't have to build. I don't have to worry about that sort of invalidation side of things. And then the AI is also very clever around, it monitors for payload changes coming back from a database. So if we're seeing database changes going on, we're just the caching time to live respectively as well. So if there's stuff going on, on the database that we can't see, then we're quite clever around how the AI manages that as well. The classic scenario is, let's take something like a travel agency, where they may be collecting data from numerous different inputs, numerous different data sources. So there may be stuff going on, on the database that we can't see. Talking futures as well, we'll also be having the ability to plug in, change data capture streams into poly scale, if that's something that's important for your caching semantics. So yeah, full automation. If you're not, if you wanna, I mean just to be clear, you can also completely override the automation. You can globally switch it off and say, hey, I know what I wanna set this to. And I can do that myself. You know, if you wanna go down that path, we will recommend to you what we think or what the platform thinks you should set that caching value to. So that's a nice feature that we're helping you, but you can go in and overwrite this stuff. It's not a, you know, it's not a complete black box that you don't have control over. And then finally down here, one of the features we've got coming in the short term is what we're calling predictive warming. So this is classic personalization use case whereby I'm logging into a system and I want to, you know, load up, again, my cell phone provider, show me my bill or my usage for the month, or I'm logging into, you know, something like iTunes or whatever and show me my personalized view. So we can, you know, accelerate that loading in the background. So we're looking at clusterings of queries that typically run together and we can execute those and make sure the cache is warmed for that particular user, which is really exciting. And then the final bit, which is sort of the crux of this talk, really is like, how do we scale? Yeah, great, you're giving me a cache for my database, but how does that scale? And as I said before, we sort of, we have points of presence globally and we put those as close as possible to the application tier wherever you're scaling to. So that moves the compute, the query compute and it moves the data itself physically closer to the application tier. Another feature we've got coming very soon actually is connection pooling. So to be able to, if you consider, for example, I wanna run functions at the edge, those are very short lived, as we've already talked about, and there may be thousands of those running concurrently. That's exactly the use case for connection pooling, whereby, let's say I've got a Postgres instance running, that's running out of Paris in France, and let's say I'm now using Polyscale to serve that globally, I'm also using my favorite function as a service provider, and we are using lots of, we're initiating lots of TCP connections.

7. Scaling Data and Integration with PolyScale

Short description:

Polyscale can support tens of thousands of concurrent queries without showing any degradation at all. We're not hitting indexes. We're not doing all the complex things that we need from our database. We don't have to do that. So we can be very fast and consistently fast as well. PolyScale is a disk-based cache that serves large payloads in a millisecond or less, regardless of size. In the BI use case, caching can be done automatically or based on specific rules per SQL query or table. This significantly improves performance in analytics and reporting scenarios. For fast-changing data in chat messages and customer responses, PolyScale can handle large volumes of automated responses and efficiently send data to APIs for OTP verifications, notifications, and message routing. It also enables the creation of logs and automates report generation.

So for example, I know at the time of this presentation, Cloudflare, they're a workers platform, they're about to introduce TCP supports. So for example, you could be running lots of functions globally connecting back to Polyscale at that data tier, and Polyscale manages the pooling there. So you will have a small number of long-lived connections going back to your origin database, and then a very, very high number of ephemeral, short-lived connections hitting Polyscale.

And then the other sort of big problem we see in scaling these data systems is like how to shard. Many databases have great features around sharding, but you've gotta decide what data lives where. And that's not always an easy decision. So a use case that springs to mind is a large gaming company who do leaderboard management, and that's a constantly changing thing. And it's also a constantly changing thing globally. So this, a leaderboard, for example, is not something, I mean, you can obviously have regional leaderboards, but a global leaderboard that's constantly changing with thousands of updates every minute going on. And people want that latest data, it's not good enough to have that distributed and not be accurate. So how do you shard that data? How do you, do I store everything everywhere? Or do I separate out those reads into a platform like Polyscale? And so we shard that data based on demand.

And the difference is Polyscale is not a database. So you can scale terabytes of data to every Polyscale region incredibly cheaply and without performance degregation. So, Polyscale can support tens of thousands of concurrent queries without showing any degradation at all. We're not hitting indexes. We're not doing all the complex things that we need from our database. We don't have to do that. So we can be very fast and consistently fast as well. But just to give you a sort of an idea. So we can if you have any database query that's cached, Polyscale will serve that in less than a millisecond, a millisecond or less at scale. And this is huge benefits obviously for, cause what's actually happening is that you're splitting out your reads from your writes, which then gives your origin database more performance to manage those writes and Polyscale can offset the reads for you. So, yeah, and then I think that covers everything nicely here on kind of how we think about the world and I'll just pause, I've officially talked a lot, but yeah, any questions, anything on people's minds you wanna drop into the chat or unmute and introduce yourself and we'll chat through it. Hi. Hi. How are you? Nice to meet you. And you, thanks for joining. Thank you. So I have a question about how, just basically trying to see how you would integrate PolyScale for, say an enterprise with, say the following objectives for that backend. How would you integrate PolyScale for handling large volumes of, when you need to handle large volumes of automated responses? With handling capability, the handling data that needs to, that needs to be sent to APIs that handle OTP verifications and notifications to customers. As well as, channeling those messages to the right recipients. And lastly, the ability to create logs and automate report generation. Mm, yeah, okay, you touched on lots of good stuff there. So, and I can see there's a couple of good questions in the chat as well, which we'll come on to. So, yeah, I think, I mean, if I was to sort of try and address that really high level, I think we talked about there is kind of some short form responses that change quite frequently. And they're quite fresh, so responses to sort of, chat and customer responses, right through to kind of very large, potentially large payloads that you see with things like business intelligence and reporting and things of that nature. So I think there's two areas of that. So if I take the BI one first, and I've done a lot of work in my history with things like Tableau and building huge reports. And I've sat there and watched these things take 50 seconds to render, because they're running all sorts of, unoptimized database queries that are getting generated through the whatever tool of choice you're using. And of course, once that's being cached by poly scale, again, we'll serve that in a millisecond or less, doesn't matter the size of the payload. What we're actually doing is we're a disk-based cache. So we're dropping it onto disk. We do have l1, l2, but the core of it is a disk-based cache. And so if that payload can be very large, and it really doesn't matter to poly scale to better serve that. So in that kind of BI use case in that analytics use case, if you're using like a fat client like Tableau or whatever, the user just changes their host name to connect to poly scale rather than their source database and nothing changes. And then you can decide, well, as we'll go into in a moment, when we get into the product, you can cache stuff either fully automatically, you can set individual rules per SQL query, or you can actually set rules on a per table basis. And that works quite well in the BI world where you can say, look, I know my products table, I'm gonna cache that for 10 minutes. Or I know that only updates once a day, so let's cache that for a few hours. So you can really boost performance very quickly in those sort of BI and reporting worlds. And they're the ones that typically do expensive queries, right? Whether they're hitting a warehouse or, you know, whatever it may be. And then if I come back to sort of your first part of the statement, you're talking about, you know, sort of fast changing data that you're responding to customers on maybe chat messages or...

8. Data Invalidation and API Scaling

Short description:

PolyScale can quickly invalidate and update data to ensure the latest information is always served. An example is in online booking, where caching can alleviate traffic spikes when booking slots become available. PolyScale also helps scale APIs, such as authenticating API calls by caching data in Redis. This eliminates the need for constant token validation and provides easy scalability.

And, you know, those are really, again, if we see the invalidations coming through PolyScale, if PolyScale can see those inserts, updates, and deletes, how quickly they're changing, it will just blow that data away for you so you're always serving the latest stuff.

Now, a great use case we've got at the moment we're working with a customer is similar to this, is an online booking one. And in the online booking world, you have a lot of, in this particular example, there's a point in the day where everybody tries to book. There's an event where everything becomes available and everyone tries to book at that specific time. And what you see is, you know, you would think cashing doesn't work very well in that kind of world. But what's actually happening is that when a booking slot is gone, you see that invalidation to the cash. And then, you know, once that's recached within the next couple of milliseconds, you then get hits. And you can actually alleviate a huge amount of that traffic from this huge sort of stampede that occurs when it hits a database. So even in high-write environments, because PolyScale is very fast at rehydrating the cache and invalidating a cache, you can get significant benefits.

And then the last part you talked about, sort of scaling APIs and stuff, we see that a lot where, okay, you know, I don't want to sound facetious here, but I really love, you know, the JAMstack is incredible, I love that, but you've got to focus on how do I scale my APIs in that situation? And that's a you know, a simple use case is, okay, I need to authenticate my API call. And that typically goes back to a database. And that's a classic one for Redis, right? Let's cache everything in Redis so that we're not validating those, those, you know, those tokens every time somebody makes a call. And that's a great example for PolyScale. Just plug it in, no code to write, scale your API, you know, without actually having to worry about how that gets invalidated. So yeah, hopefully that answers your question. And yeah, we can chat more on those a bit later on.

9. Handling Bugs, Costs, and Data Storage

Short description:

There are different options for handling bugs in the app and PolyScale caching the broker results. You can either blow away the cache globally, let it expire and refresh, send an update query to remove specific data from the cache, or programmatically invalidate the whole cache. The cost calculation for PolyScale includes the number of connections, concurrent connections, and egress cost for gigabytes transferred per month. In terms of data storage, you can use PolyScale's SaaS service or deploy it in your own network for compliance reasons. PolyScale stores the results of queries, not the entire database, and the data never leaves your network. You can encrypt the data on disk and there are networking options for secure connections. PolyScale analyzes and recommends based on individual SQL queries in real time. There is a demo available at playground.polyscale.ai that showcases a public MySQL database and an application server running in different regions.

Just looking at... Sorry, go ahead. No, I was saying thank you. Yeah, it answered a lot of my questions. Fantastic, thank you. There's a great cache here. Sorry, great cache. A great question here. What if it was a bug in the app and PolyScale cached the broker results? How do we handle this, truncate the whole cache or invalidate by period? Yes, you've got a couple of options, right? So as I'll show you in a second, you can either go in and press the, blow away the cache, you know, purge cache globally which may or may not be a heavy handed approach if that's something you realize you've done and just go and blow it away. The other option is you let it expire, it will get refreshed. The third option is, you could actually send, if you, again, depending on the situation you could just send an update query for that bit of data and we blow just that bit of data away from the cache. Or thirdly, sorry, fourthly, programmatically you can call an invalidation of the whole cache if you want to as well. So there's various ways from UI to API to letting it time out to invalidate that bit of data.

How are costs calculated? Yeah we'll come on to that. High level, number of connections, concurrent connections, you just buy a plan, there's a free developer plan and if you're, is it Ignis or Ignes, hit the website now and look at the pricing and you'll see there's a free plan to get started and you can scale up from there and there's an egress cost for gigabytes transferred per month.

Good. I'm just looking at one more question before we get into a demo cause I can't wait to play with a demo But I have some questions about the cache data. Who has access on the raw data in each node and the data stored encrypted? Yeah. Good question. And the second question is the recommendation feature you make the analysis based on the queries or in the raw data? Yeah. Good questions. So what, just to give you some comfort around the data storage side of things. So two areas, you can either just come to PolyScale and use our SaaS service and we'll cache that data for you. Or you can actually deploy PolyScale, large enterprises deploy PolyScale into their own networks just so they can host inside of their own network. So for compliance reasons that may be essential. And what happens is the data, those sort of points of presence inside an enterprise connect back to PolyScale's control plane which is completely anonymized. So the cache management all happens centrally but the hosting of the data and the data never leaves your network. The second point on that is to be really clear about what do we actually store? It's not like we're a read replica. We don't go and grab your database and replicate that somewhere. What we're actually storing is the results to queries. So if I said select star from products where name like Nike, I get a result set back and that's what PolyScale actually stores. It's that result set. So as far as, you know, the validity of the data and where that data, you know, being able to interrogate that data in kind of a malicious way, it's a very different proposition from maintaining a database. Oh, you know, a complete read replica of that data. Thirdly, yes, we can encrypt the data on disk. You can decide if you wanna do that. You will pay another sort of millisecond penalty for doing that, but completely valid if that's something you wanna do. And then there's all sort of the networking options around how we secure with SSL and allow lists and all that good stuff that we can cover and is on our doc site as well. So I would definitely urge you to dive in there and, you know, we can do everything from VPC peering and endpoints through to complete dedicated PolyScale. So you can come to us and say, hey, I would like a dedicated PolyScale please and we can deploy that for you very quickly. And then, yeah, we'll come on to the analysis stuff now but just quick answer to that question is we analyze based on every individual SQL query and that's what we recommend on and that's done in real time in a continuous manner. But yeah, let's show you a quick demo if that's okay. Now what we've got is a demo cycle playground.polyscale.ai and let's blow this up a bit and this kind of has three steps to it. Now we have a completely public MySQL database and that's a read-only user but be gentle with it. It's not a big instance and what we can do is we can have an application server which is this playground. It's running out of a US East region and we've got a database running out of the EU West. I believe that's in Paris. And if I click this button here, it will run just some, a random query for 20 seconds and that's going direct as you can see, direct to that database. And this is kind of, this is outside of our VPC, it's a real legit web application running in, I think this runs in Heroku, it's nothing to do with poly scalers environments, and this is running, if we scroll down, it's running the SQL query as fast as possible sequentially and it's just varying this department name. So if you wanna log into that database and have a look around, we've got this employee schema, and it has a bunch of tables, this is a public open schema that's linked to down here, you can have a play, and we're just varying this department and that's what you can see in these log files here, we're changing this department and you can see, look, the average query took around 400 milliseconds for those two regions.

10. Connecting to Polyscale

Short description:

To connect to Polyscale, simply update the hostname and database username. Polyscale will automatically connect to the closest point of presence. Everything gets cached and served in one to two milliseconds, resulting in massive performance gains. With a single config change, you can improve global reads without development or deploying servers. Try the public Playground and explore the product to see what's happening under the covers.

Now, step two, we just literally update our hostname and database username, and we'll come on to how do I connect polyscale in a second, but that's literally all we do, and it will then automatically connect to the closest point of presence, because it resolves my sql.polyscale.global to wherever the closest pop is. So if I connect polyscale, what you'll see is everything gets cached, and then everything, these blips of things getting cached, though being evicted, and then everything gets served in one to two milliseconds, and this measurement here is including the network latency between the execution, and actually returning that data back to this web app. So that's inclusive. And then what you can see here, is like direct, we did 48 queries, and then through polyscale, we did 17,000 queries. So that's a cool, hopefully that helps you sort of visualize what we do, with a single config change, no development, no writing code, no deploying servers. You can just blow out your reads globally, with massive performance gains. And that's, yeah, so have a play with the Playground, that's completely public, and what we'll do now, is we'll flip into the product, and I'll actually show you kind of, well, what's happening under the covers. And as a user, how do I do this? How do I get on board? So I'll pop back to the, yeah, let's dive in, and have a look at the products.

11. Signing Up and Configuring PolyScale

Short description:

If you want to sign up and play with PolyScale, it's completely free. You can create a basic developer account with 10 concurrent connections. Once you log in, you'll see the dashboard with workspaces and pre-configured demo caches. You can also disable caching or set the default behavior of the cache. To integrate PolyScale, change the hostname and username, passing in your cache ID. Connection details are always available in the settings tab.

So, yeah, if you want to sort of, anyone wants to sign up, and have a play, and play along at home, go for it. So completely free to sign up. As I say, you can see the pricing here, and what we do is we have a basic developer account, 10 concurrent connections, sign up for free. And, if you do that, I'm just going to log in, with my GitHub account. And then that will give us the dashboard. And what you'll see is, we have the concept of workspaces, where you can invite team members, and you can associate multiple caches with that, and I'm going to open my default one. And, what you'll see is you already have a demo cache, bootstrapped and configured. So there's already one set up. And the cool thing about this, if you, just a nice hack. If you click this, run sample queries button, it will take you back to the playground, but this time, it loads your current dashboard into here. So all these stats and stuff will go back to your dashboard, so you can see them. So just a quick, if you wanna have a play, sign up for an account, click on that run queries one, and that'll take you to a dedicated playground environment versus the shared one that we were just playing with. But yeah, let's start from the beginning. So let's say I've got a database, and I wanna get connected with PolyScale. I'm actually gonna use our playground one, which is the RDS instance that we've just been playing with. And if you want those credentials, they're always on the playground's homepage. So they're always here if you just wanna play with the database. And I'm gonna click new cache. So the scenario is, I've got a MySQL database in this case, and I wanna get connected with PolyScale to my web app. So I'll click new cache. And as you can see here, you give the cache a name. We select what database type we we're supporting here. So you can see as of today, we support MySQL, Maria and Postgres. Microsoft SQL Server is coming next. And we got, you know, the list goes on after that. So I'm gonna paste in my origin hostname and that's running on 3306. I've got a couple of interesting options here. I can actually, if I want to disable caching completely. Now, the reason for doing this is we have lots of people sign up and they say, well, actually, I just wanna put Polysql in kind of an observation mode. I want it to just look at my queries and surface what's actually happening in the database without caching anything right now. And that's a really nice option. So I'm gonna leave that on for this case. And you can also set the default behavior of the cache. So what this is basically saying is like when I see a new query, should I go ahead and cache that? Should I manage that and Polysql do its thing? Or do I wanna manually just set that myself, do nothing? And you can set in, you know, as we talked about already, you can configure custom rules if you want to. But yeah, let me show you sort of how the AI works and stuff. And we'll leave this switched on. So once I've got that set up, the only thing you now need to do, is integrate Polysql is to change the hostname and also change the username. Now, one caveat here, this is from MySQL. We overload the database username and you have to pass in your cache ID. That's the only thing you need to do. In Postgres, it works slightly differently in that you pass your cache ID in as the application name parameter. And just to be clear on that, I'm just going to the docs site. And if you go down to get connected, you can see there's a, how do we get connected with MySQL MariaDB, which is passing your database username and your cache ID separated by a hyphen. But if I'm in the Postgres world, I pass in an application name with my cache ID. So depending on which database you're using, you've got to pass in your cache ID in, you know, one of two ways. And this data is always here. If you, if you're, you know, opening up your cache, you've got the settings tab here. Let's slow down I guess, lots of busy activity going on, but you've always got your connection details here. They're always available. So, you know, you've got to, the host and the port and the user string.

12. Configuring PolyScale and Running Queries

Short description:

Polyscale allows you to update client applications effortlessly by changing config variables. You can connect to different databases and change the hostname, port, and username. PolyScale ensures security by encrypting data and not storing database usernames and passwords. By running queries and checking the observability tab, you can see the caching behavior and the cacheability of queries.

So yeah, the cool thing is, they just update your client applications to use that. Typically a conflict change. And what I'll do now is I'll fire up a bit of code and just show you kind of how we can do this.

So I've got some code and say, you're welcome to do this as well. I've actually set up a quick Node Congress repo here. So if you want to get connected and have a plate to your instance, or just watch the demo, you know, entirely up to you. And you can literally pick your arm of choice, whether it's, you know, Prisma or PG, or, you know, I've got a Kinect example here for MySQL. And, you know, let's just open this up and have a play and I'll show you how we get this configured.

So what I've got here is that repo. It's super simple. And I've got my MySQL with Kinect's example. But you know, as of hopefully you've, you know, everyone's understanding, it doesn't matter what ARM you're using or how you connect, you just got to change these config variables. So I set my host up as, you know, MySQL.PolyScale.global, which is what I've copied from here. MySQL for PolyScale is on port 3306. So if you're using a different port, be sure to set that. And then the important bit is the database username. So what I'm doing here is I'm passing in that cache ID, which is the unique identifier of our cache here, along with a hyphen, and then the actual database username. So let's run through that example. Let's say my real database username, in this case I'm connecting to that, that demo database. The username is PolyScale. So that's my real database username. Now what I need to do is just put a hyphen in there, copy my connection, my cache ID, and paste that in in front, and that's the only change I have to make, is the hostname, the port, and the username.

And then as you can see, you know, when we were setting up that cache, if you remember, we didn't ask for your username and password of your database, which is a nice security thing, like PolyScale can't see your database username and password. Everything says to sell encrypted and just gets sent through to the origin database. So, you know, if I'm connecting as usual, I'll put in my database password, which is Playground. And my database here is the employees one. And just to be clear, again, I know I keep repeating myself, but everything's here on the Playground homepage if you wanna grab those credits. It's also in the readme of this little demo repo. So you can see here we've got the, just the demo RDS instance that we've got running there. So yeah, if we connect... Oh, sorry, I'm all over the place. If we got that set up and that's kind of all we need to do. Now, what I'm gonna do, before we sort of get into anything more interesting here, let's just run a quick query. So I'm gonna comment this out here. Let's grab a query. And I'm gonna say select count star from salary for the fun of that. And let's run that code know my SQL and that will make a connection. Ooh, select count star from tables employee. So there we go, I've made a, I made a nice typo there. What I'll do is put node monon. Check that table exists, I think we've got employees. Pretty sure that exists. There we go, great. So we ran a query, fantastic. So if you go back now to the where it gets interesting, you'll see the observability tab. And observability will show us every single query that runs through the platform. So we can see here that's our example that's come through. We can see we ran that once, it's in an auto caching mode. So it's, and Polyscale is currently telling us that that's highly cacheable. Now let's do something a bit more interesting.

13. Caching Behavior and Rules

Short description:

If I run a query multiple times, PolyScale will recognize it as a simplistic query and set a high time to live on the value to start caching the data. PolyScale also anonymizes SQL data to protect sensitive information. Users can set specific rules on queries, templates, or tables to manage caching behavior.

If I do where id equals one. id equals one. And let's do that. Oh we didn't have an id. Because I think, oh id, type it there. Really should look at this schema before I start playing with it. But yeah, what I'm gonna demonstrate here is actually just to take this query here. This is a bit more interesting.

So this says select count star average salary from salary. So if we run that once. What we'll do. Actually, no let's open this up. What I'm gonna do is I'm gonna run that query in a loop and run that 30 times. So let's see what happens when we run that. So I'm gonna save that. It's gonna connect and it's executing that. You see 1.8 seconds, cause we initiated a new connection. Now bear in mind, I'm on my home machine here. I'm in the Bay Area, the West Coast, San Francisco area. So let's see what happened there. So you can see that we made the connection. We got a query runtime like 1.6 seconds, one second, 1.2 really varied. And then look what happened a little bit later. Suddenly we drop down to 36 milliseconds and you know, 44 milliseconds. And that was PolyScale jumping in and saying, right, this is something we're seeing multiple times. It's a very simplistic query. We can set a high time to live on this value and start caching that data. And we'll see this in our analytics here. And couple of things you'll also notice in here is we anonymize the SQL data that comes through here. So if I had a parameter in here that said, select credit card number, from where name equals whatever, or where a credit card equals this name or ID, we would actually anonymize that completely and take that out of the query. So we never store any SQL or the parameters are anonymized for PII. So you'll never see that in here. But hopefully there's enough in there you can identify the queries.

Now, if for example, I wanted to set a specific rule on this query and say, well, actually don't want polyscale to manage it. Let's overwrite that. I can come in here and I could actually set a rule on there. And I can set that to manual and I can create a query. You can see here, we've got a couple of options. I can actually create a rule at the template level. And a template is effectively a similar SQL query, but where the parameters have changed. So if I said, select star from users where ID equals one, select star from users where ID equals 10. It's effectively the same query, but it's semantically different. And PolyScale will classify that into a template, so you'll only ever see one row in here. So if you want to come in and apply a time to live specifically, I can do that, and I could set that rule up in here. And you can see, I hit the set recommended button, and right now PolyScale is recommending a 30 second time to live for that query that we're just looking with, cause it doesn't know any more. It's just seen one sort of burst of that data that came through. The other option also, I can even set rules on specific tables. And again, this is a great use case for things like really simplistic BI use cases or even e-commerce use cases where I say, look, I want my table to be cached for 10 minutes and I know that's what I want to do. And that's the other option you've got. So that's kind of the very high level ones.

14. Polyscale Observability and Use Cases

Short description:

Polyscale provides a detailed breakdown of observability, including tables accessed, query types, hit rates, and execution times. It allows users to filter and view cached and uncached queries, as well as different query categories. The platform saves significant database execution time, with one example showing a 15-minute run of 12,000 queries saving almost 80 minutes. Polyscale offers customizable cache behavior and the ability to initiate global purges and delete manual cache rules. It also supports scaling a single database and improving read query performance. That's a quick tour of Polyscale and its key use cases.

When this gets more interesting is when you start getting into more complex systems. You can drill down to this detail section. Actually, let me just log in as a different, get you a different data set that will be a bit more interesting with some data in there. Let's do that. And I'm going to load up this Drupal test site that we've been, had running for a while, and there's actually not much activity on here, but it's a live Drupal site that we just plugged in and had a play with and it's taking a bit of time. That should be nice and fast. We'll give that a test. There we can see. This is quite an interesting summary report where you can see the number of queries that you're running or have run in the last seven days, the actual execution time that's run, database time, and then the actual savings, possible savings if everything is cached, what the efficiency is. Now, if we look at this data, I'm going to drop into this details section here and what you get then is the full breakdown of all of the observability that's happening. So, you can see, this is a good example where you can see these question marks in here where we've replaced the parameters. You see what tables have been accessed? Is this query a DQL, a data query language? Is it a read? Is it a show or a select? Versus a DML query, which is a manipulation and insert an update or delete. And just to be clear, out the box, again, we strive to hide this complexity. We hide the manipulation queries. So, what you're seeing here are just the cache-able queries. So, I can come into this filter option and I can say, show me my cached versus uncached, but specifically, do I want just the DQLs? Do I just want the shows and the selects? And I can turn that off and then I can see all the reads, the inserts, updates, and deletes as well. And you can see these broad categories across the top. Don't be too overwhelmed by the minutiae here. But, you know, I can look at my queries. How many hits and misses have I had? How many are distinct? So, that concept of a query template where I may have, you know, 5,000 unique queries, the same query, different parameter, that's what you're looking at here. And the reason that's a decimal, by the way, is because we're in the 15 minute timeframe. So we're sort of chopping these up. You can see here, this particular query is un-cached. Like, we've turned the cache off on this one. We can see our hit rates, and so on and so forth. You can see the total execution times in seconds versus, you know, average execution times in minutes. And if we go to the public playground data here that we were just running, and we'll see some interesting stats here. Let's let this load up. So this is just for the last sort of 15 minutes and the queries that we're running through the platform. That's pretty slow. Look at that. Yeah, and this is interesting. So this is the playground query that we run, that we talked about before, and you can see that department name there being anonymized, and we can see we ran 12,000 of those. And if you start looking then at the time saved, that actually saved just under 80 minutes of database execution time. That playground run that we've all been doing on the site, where we ran 12,000 queries, well, that looks like just one run in the last 15 minutes, that actually saves almost 80 minutes of database execution time. The actual time and the nominal time, and then you can see here the breakdowns of the average execution time as well. So you can really go down into all the details of exactly what queries are happening, and as I said before, polyscale will surface the ones that are highly caches versus the less cacheable ones. So yeah, hopefully that's useful, and just to sort of wrap up here, on the Settings page, you can change your defaults about how the cache behaves. You can purge, so that's going back to a question we had earlier, I can initiate a global purge if I want to, for whatever reason. If I have set up some manual rules in my cache, and I've sort of said, well, let's cache these few queries manually, or I'm running stuff automatically, I can actually go and delete those rules. So you can just kind of reset how you're managing the rules when they come in. And then finally, I can actually delete this cache if I want to just blow everything away and get rid of this. So yeah, that's a quick sort of tour of high-level overview of Polyscale and how it works. And just a couple of use cases that we really focus on is kind of that regional one, which is, we talked a lot about, but the other one also is kind of, if you just want to scale a database, we work with lots of customers like that who are focused on just scaling a single database. And that's read query performance is the other key part of this. So they're really the focus areas of what we work on. So again, I'll just pause there. That's pretty much all the material I kind of wanted to show and yeah, I'd love any questions, anything I haven't covered that you're interested in, or if it's anything you've seen that sparks interest, then yeah, definitely unmute, give me a shout. Hello. Hi Nishant.

15. Database Support and Roadmap

Short description:

Polyscale currently supports MySQL, Maria, and Postgres databases. Support for MongoDB and Elasticsearch will be added later this year. Microsoft SQL Server is expected to be supported by March.

Hi sir. Hi. I want to ask, like it works with only MySQL or this language like I work with MongoDB, Mongo's language. So how do it work with that language? Yeah, not yet. Mongo is not supported today. So you can see we've got MySQL, Maria and Postgres, SQL Server is next and likely then MySQL. What we're focusing on at the moment is really the sort of pure TCP based databases that are hard to scale. So that's really where our focus has been. And later this year we'll be adding support for HTTP with GraphQL and, you know, moving into things like Elasticsearch and Mongo as well.

Yes, sir. Like the main language, the Elasticsearch and the Mongo is the very easy to respond to these languages. Right, exactly. So yeah, thank you. So right now just the three supported, we're expecting Microsoft SQL server, just because we've got a huge demand for it. We're expecting that around March time. That's the current ETA.

QnA

Experiences with Redis and Polyscale

Short description:

I work on a large scale product that we have distributed over five regions globally. We use Mongo for rights and SQL server for reporting. We've gone through several iterations of using Redis, but it hasn't been successful or has added complexity. We're looking forward to Polyscale supporting SQL server and Mongo. Manual overrides can be made through the API.

Hello. Hi, Jean. It's Jean or Jean? It's Jean. Jean, nice to meet you. Nice to meet you. Well, congratulations on the product that you guys have developed. It's amazing. Well, I work on a large scale product that we have distributed over five regions globally. We get, in the best of weeks during the year, up to 33 million hits. Well, when I first joined the project a couple of years ago, one of the first things that we did was move the infrastructure from, you know, on site to be fully managed on AWS. We're using Atlas, and we are using, what do you call this? On Rackspace, we're using SQL server. So I'm really happy to hear that you are definitely looking into supporting a SQL server. Yes. And I would like to echo that Mongo would be amazing for us to have as well because honestly, the way that we have architected our product is basically Mongo is for rights and then SQL server for reporting. So that means that we really don't need caching for SQL as much as we need it for Mongo that we use it to, for example, we use the data from Mongo to print, well, to render landing pages and those need to be rendered in 200 milliseconds or less. And we are not really hitting that Mark and having something like this, like we've gone through several iterations of using Redis and things like that would definitely improve our life significantly. So, I'm really gonna keep my eye out on Polyscale for this, like there are several of our team here looking at this and our heads have been exploded and, that's great. That's great to hear. Great here for the company looking into a solution like this if you guys implement Mongo and SQL server. Definitely. So as I say, SQL service kind of in an imminent and work has begun on that, and then Mongo will be shortly behind it. So yeah, we hear the same from a lot of people, we've definitely heard that. Thanks for sharing your use cases there. I think, yeah, I mean, certain places cashing just make so much sense while we need super low latency and half a second isn't good enough or five seconds isn't good enough. And it's great to hear that you're in kind of that world where you do have those sort of two different systems. You're in that polyglot world and that's only going to, I guess, based on your business success, that will scale. You need to get out to different regions and more people and the data's increasing in size. So the problem's not going away. I don't know if it's something you want to talk about Sean, but your experiences with Redis and obviously Redis is a fantastic product. And did you find that that wasn't gonna work or you just haven't attempted that project or did it just become too complex or kind of where did you get to down the Redis path? Hi, I'm- We're still in the middle of it. Actually, as I said, we've done a few iterations. Like there's even an old iteration right now which is breaking our heads because it has a little bit of a memory leak on a node. That's because it's a legacy server that should be sunset soon. But, yeah, adding Redis is not as straightforward as people might think when it comes to scalability. For example, we're using AWS with Fargate and Fargate only allows you to expose port 80. So, when you try to add Redis internally, because that's one of the things that we were trying to do, like put Redis right there in the Docker image so that the application could contact it directly, bare bones and just have all of our applications be read replicas of the master that was unsuccessful because of that, or it just added complexity. So, those are the kinds of things that you will usually find that would be much better to just give it to somebody like your team and let AI take care of it. We would love something like that. Yeah. Fantastic. That's great to hear. Thank you for sharing that. And I just want to take one question quickly in the chat. Are there any way to make manual overrides through code? Absolutely. So, if you hit the docs, we've got a full API that you can yeah, all the features are exposed through there so you can do whatever you want to do. So again, that kind of goes against our no code strategy, but yeah. There's situations where you're definitely gonna need to or want to, you know, you know better than the machine and you may want to just invalidate at the right point. So, yeah. Good, anyone else for a question? Is it Mabeko? You're unmuted there? Yeah, I have a question. Now, I'm sorry if I might have missed it.

Integration with Multiple Databases

Short description:

Polyscale allows you to have different databases in a workspace. Each cache is a different database, independent but under the same workspace. The performance of small databases versus large databases is agnostic to Polyscale. It acts as a pass-through cache, serving from the cache or letting queries hit the origin database. The platform is the same for everyone, but specific accommodations can be made. Polyscale eliminates the need for caching at the application tier, providing the benefit of not worrying about code overhead and complexities.

I heard you towards the end. You were talking about this being basically a service that you target when you have one big database, and that's one thing that I wanted to get clarity on. What happens when you have many different databases? Say you have hundreds of departments that some may have databases of their own, and others may have databases that they want to keep completely separate, do you then need different accounts that are completely separate for each one of them, or is there integration that can be done?

I think what you're describing here is Polyscare has concept of workspaces. Show you that really quickly, I'll sign in as a different user, and a workspace is effectively an entity that can have one or more caches. So you could use a workspace for a department. You could use, here I'm in my default workspace, and each workspace is associated with billing. So you can set up one or more caches in that workspace, and I could come in and create a brand new one that could be department, whatever. And in there, I could create different caches, and then you can set permissions on your workspaces, who you invite into... If I go into the access, who do I invite into these workspaces and set their permissions? So you can set that to be a large enterprise or different departments or whatever you need it to be. Does that answer your question?

Yeah, yeah, I think that addresses the question. So just one more thing with those workspaces, are you then allowed to have different databases in it? Could you say it'd have a manager database?

Yeah, absolutely. So when you create a cache, yeah, when you create a cache, each cache is a different database, right? It's independent, but you just select your database and you can have... In my demo account, I was just in a minute ago, we had Postgres, MySQL, MariaDB. It doesn't matter, they're all under the same workspace. No problem. Okay, all right. Okay, thank you very much. Thank you. A couple more in the chat, let's have a look. No congress repo. Appears to be private. Oh no, what a fail. Let's see if I can change that really quickly. Can I interrupt? Of course, yes, please do. Yes, I have one small question. I'm located basically from India. Yeah. It was very nice interacting with you on the poly scale application. And I have a very small question. Is a poly scale application what could be the large scale database, any impact on performance while trying to operate on a small database? So I think if I answered your question, performance of small database versus large database, that's the area you're thinking about. Was that the-

Yes. Yes, I mean, with regards to the origin database, poly scale is kind of agnostic, it doesn't matter, it's just a pass through cache. So whatever load your origin database is dealing with today, it doesn't matter to poly scale. So we're just either serving something from a cache or it lets it through to hit the origin database. So nothing changes with your origin database. You can run that on a $10 a month, $30 a month RDS account, right up to customers having 25 grand a month single instances. So the origin database is nothing to do with poly scale other than we're a pass through back to it, if that makes sense.

Yeah, thank you. Great. Another question in the chat, is there a student developer pack? No, nothing on the box, kind of specific for students. Obviously we do free accounts we can do nonprofit accounts definitely, but no, I think the platform is the same for everyone scales up for everyone, but yeah, let us know if there's something specific you had in mind and we'll look to accommodate. Okay. Yeah, I do have one more question too. Yeah. Yeah, I just thought about this. I'm not quite sure if it's actually possible. What I'm getting is that, you're basically trying to possibly reduce the amount of integrations that you would need to make costs to the database and so what's up? So, would it be possible then to just completely eliminate something like Redis or would it need an integration at the same time?

Yeah, I mean, we see use cases where people they either completely remove Redis so I mean I don't wanna talk about Redis specifically cause I love the products, great, but the specifics of running caching at the application tier can be removed. So, you're taking a step down, you're caching it in that data tier rather than actually in the application tier and then just think about the luxury of not having to worry about that code overhead and the complexities there. That's really the benefit that we bring. I've just made the replay public by the way so feel free to have a play but that's exactly right.

Repo Name, Hiring, and Remote Work

Short description:

We take away the complexity of writing and maintaining your own logic. The repository name is 'polyscale' on GitHub. We're currently hiring for various positions, including developer advocate, full stack react and TypeScript engineer, and C++ backend engineer. We're a remote-first company and changing how databases are scaled. Check our website for more information and career opportunities. We're not currently offering intern positions, but we may consider them in the future. We're completely distributed and welcome remote work. Thank you for your interest.

We take away that complexity of having to write your own logic and maintain that over time. Okay, yeah, all right. Now, I think I understand now. Great.

So what's the name of the repo again? It is, let me bring up the slide. Just hit the polyscale, GitHub polyscale and it's node congress 2022, which is yeah, that one there.

I have to drop the shameless plug that we are hiring and we would love to speak to anyone if that's interesting, interested. At the moment, we're hiring for a developer advocate, full stack react and TypeScript engineer and a C++ kind of proxy focused backend engineer. And you can read more about the company on the website, but we're a small company where we're 100% remote. So yeah, we'd love to speak to anyone that's thinking about if this looks interesting to you and we're changing how databases are scaled, that's really the mission we're on. So, yes, have a look on the website and the careers section there and you can read more about the roles.

Let's have a look. Any other questions in the chat? Let's have a look. Thanks for posting the link there. Yeah, and any other questions from anyone? Sir, is any internal opening is also there? Sorry, Nishant, can you repeat that? Intern, like the proper software engineering- Oh, intern, yeah, and not at the moment, probably next year, as in later this year. So yeah, not at the moment, we're just hiring full-time positions at the moment, but yeah, we'll definitely be looking to run an intern program within the next six months or so. Sorry.

Okay, that's nice. So you said those positions are remote-first? That's right, so we're a remote-first company. We don't have offices, we are completely distributed. We have people across the globe in all sorts of great places. So yeah, we welcome that. We are a remote-first company.

Okay, all right, that'd be nice to look into. Great, fantastic. Glad it was helpful and yeah, okay.

Watch more workshops on topic

DevOps.js Conf 2024DevOps.js Conf 2024
163 min
AI on Demand: Serverless AI
Featured WorkshopFree
In this workshop, we discuss the merits of serverless architecture and how it can be applied to the AI space. We'll explore options around building serverless RAG applications for a more lambda-esque approach to AI. Next, we'll get hands on and build a sample CRUD app that allows you to store information and query it using an LLM with Workers AI, Vectorize, D1, and Cloudflare Workers.
Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Workshop
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
WorkshopFree
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
WorkshopFree
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
WorkshopFree
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
Workshop
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Top Content
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here. 
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.