If you have some GraphQL data that you think would benefit from CDN caching at the edge, it’s actually really simple to get everything working well. This talk will walk you through the interplay between several tools: * Automatic Persisted Queries with Apollo Link lets queries use GET while mutations still use POST * Apollo Cache Control lets you specify cache control information in a fine-grained, schema oriented way * Apollo Engine generates small query IDs you can use in those GET requests to limit the cache key size, and sets the Cache-Control header for the CDN Then, when we put it all together, you can see those results getting cached in your favorite CDN service, tada!!
Power Up your GraphQL Apps with CDNs
AI Generated Video Summary
This Talk discusses how to grow GraphQL apps with CDNs by exploring concepts like caching, freshness, and validation. It explains how CDNs cache content closer to end users, improving delivery speed. The use of persistent queries and cache control headers in GraphQL is explored as a solution to caching challenges. The talk also highlights the interplay between automatic persistent queries, Apollo cache control, and Apollo Engine for efficient CDN caching.
1. Introduction
How to grow your GraphQL apps with CDNs. Enable and caching, two words that don't really go well together. Let's give you a little bit of intro about me. My name is Naz. I am currently an engineering manager working at LinkedIn.
How to grow your GraphQL apps with CDNs. Faster GraphQL queries with caching and CDNs. This is what we're going to talk about today. Enable and caching, two words that don't really go well together. It's been a lot of talks in the community, how are we going to enable caching and GraphQL queries together? Well, before we jump into that, let's give you a little bit of intro about me. My name is Naz. I am currently an engineering manager working at LinkedIn. Before LinkedIn, I worked as an engineering manager and individual contributor at Netflix. I'm currently running JavaScript Weekly with a group of amazing individuals on Twitter spaces and also hosting career QAs on LinkedIn events. I'm also a career coach and a mentor on Mentor Cruise, mentoring and coaching a lot of engineers across the globe. If you want to learn more about me, visit my website, naz.dev.
2. Caching and CDNs
So, let's talk about caching. HTTP caching has two main concepts: freshness and validation. Freshness determines how long a resource can be kept in the cache, while validation checks if the resource needs to be refetched. Last modified and ETAC headers are used for validation. CDNs are content delivery networks that cache content closer to end users, delivering it more quickly.
So, let's talk about caching. Before we learn about GraphQL and caching, let's talk about HTTP caching. What is HTTP caching and how it's done. HTTP caching has two main concepts. One is freshness and two is validation.
Freshness means, as a browser, how long can I keep this resource in my cache. Freshness is a way for server to give a resource to client and then instruct the client on how long it can keep a resource. In practice, this is done through the HTTP header cache control. Cache control max age equals 60 means the browser can keep the resource for 60 seconds and then start for re-requesting the resource to the server again.
But we come to validation. Validation means when that 60 seconds is done, if the client decides to re-request the resource again from the server, it will ask the server, hey server, do I really need to refetch this again? So there is a way for the server to actually know if the client really needs the resource again or does it have the latest and updated and valid resource. So if nothing has changed on that resource, there is not really a need for the server to re-send the resource back to the client. And this is actually done through last modified in ETAC headers on server side. Last modified is a date and a time and ETAC is a token that indicates the state of the resource. For example, if not matched, the ETAC.
These are very important headers, but can GraphQL really actually use any of these mechanisms? Why are we saying they don't go together? They are super and we can just attach it to HTTP headers. Well, we'll see. Before we dig into that, let's talk about CDNs a little bit. If you're not familiar with what a CDN is, a CDN is a content delivery network, which caches content like images, videos, webpages, anything that is in proxy servers that are located closest to the end users than the original servers.
A proxy server is a server that receives requests from clients and passes them along to the servers. Because the servers are closer to the clients who are making the request, a CDN is able to deliver the content more quickly and seamlessly to the clients. Let's explain this easier. We can think of CDN as being a chain of grocery stores. Instead of just having one grocery store, one walmart, which is the main branch of walmart that all the houses in the area or all the people go to that walmart branch because that's the only branch to shop. We can have small branches of walmart at every neighborhood. So instead of people need to go to the main branch to pick up their stuff. They can actually look for stuff in the smaller branch first. And if that thing that they want to shop exists in that smaller branch. Awesome. They can pick it from there.
3. CDN Caching and Persistent Queries
CDNs cache static content on proxy servers at the edge of the network, saving copies of requested content. GraphQL queries can have cache control headers, but attaching them to POST requests is challenging. Persistent queries provide a solution by using GET requests and shortened query IDs. This brings GraphQL closer to regular HTTP GET requests. Another option is poll cache control, where a cache control header is returned from a specific REST API endpoint.
It's way faster and very quicker. If not, they can go to the major branch or the main branch and then also ask the branch to have those things in the smaller or pricier branches so they can get it from there. This is how CDNs caching work. It's basically replicating the static content on proxy servers at the edge of the network. So when a user requests content from a website using a CDN, the CDN fetches the content from the origin server or the main server and then saves a copy of the content for future requests. Cached content remains on the CDN cache as long as users continue to request it.
Well, what about GraphQL queries? Like where are we going with this? Well, CDNs know how to cache resources when they actually have those request headers we talked about attached to them. But can we use those request headers with GraphQL queries? Yes, we can. We can set cache control headers on a GraphQL query, right? Well, except we usually use resources that are query documents. Well, still a resource. So we can set headers. In the example that you see here, one document is our resource here and we could undoubtedly attach the cache control last modified and some e-text headers to it. But even though that is possible in theory since GraphQL uses post, but in action we basically can't attach those headers to post and you need to use get. So that's why we come to persistent queries as solution number one to go around attaching those headers we talked about to GraphQL queries.
A central principle in REST that we talked about is that you use a URL to identify a piece of data, a piece of resource, and then we use get gets verb in our HTTP request to indicate that you're doing some data read, not a write. That tells our CDNs it's OK to catch that result since it's not expecting to modify something on the back end. In contrast to that, historically, most graphical tools sent HTTP requests using posts. Instead of a URL, they used a complicated request body that contains a query and variables. As an added complication, in some browsers, there is relatively a small URL size limit. That means you can fit the entire query that you're making and also the valuables in the get requests. So what can we do? Well, persistent queries come to our rescue. By combining ApolloLink, our modular network interface for the client, and Apollo's engine automatic persistent queries feature, we can address both concerns at once. After setting up the engine, you can easily add ApolloLink persistent queries to your client code. So here is a code example of using a persistent query link. This will do two things for us. First, sending queries over HTTP GET instead of POSTs, right, because CDNs need that GET request to understand that resources are not changing, while still using POST for mutations. And second, use a shortened persisted query ID in the GET URL so that the cache key for CDNs is shorter and we don't hit the URL size limits. This brings GraphQL much closer to the regular HTTP GET requests that CDNs are designed to handle.
Well, what else can we do than persistent queries? Let's talk about poll cache control. What's that? With our REST API, we can simply return a cache control header from a specific endpoint.
4. Cache Control and CDN Caching
With GraphQL, we constantly improve queries on the front end, adding and moving fields as needed. Poll cache control ensures cache hints stay up to date with query changes over time. It allows specifying cache expiration at different levels while maintaining front-end query flexibility. The engine combines cache hints into a cache control header understood by CDNs. Cache control can also be used with Apollo Engine 2 for caching without a CDN. This talk highlighted the interplay between automatic persistence queries, Apollo cache control, and Apollo Engine for efficient CDN caching.
Just like we talked about, until we write a new endpoint, it will remain the same, right? But with GraphQL, we'll constantly be improving queries on the front end. You're adding fields and moving fields. You have different versions of the UI needed. So how do you make sure your cache control hint stays up to date with the shape of the query, even as the data included in the result changes over time? That's what a poll cache control is designed to solve.
This is a spec for how the GraphQL server should return cache hints on a per-field level. So here we can see that comes with a reference implementation for JavaScript that kind of shows us how we could specify cache hints with different levels of specificity. So here we do have the cache hints, max age of 5 seconds for the whole schema set with the cache control. Or we could have that set on a graphical type or field, as we did here. Or even we can set that on a single execution of a resolver. It doesn't need to be on the whole schema. This is super important because it allows our API to specify the expiration of different pieces of data. We don't want everything to expire at the same level. While maintaining the freedom of the front-end code to specify whatever queries it needs. Well, cache control.
At the end of the day, the engine combines all these hints into one convenient cache control header. That's our winner that our CDN can understand. Just a note here. If you're not using a CDN, you can use cache control to power the caching feature of Apollo Engine 2, so you don't have to specifically use a CDN. So, to wrap everything up we talked about today, if you have some graphical data that you think you would benefit from a CDN caching at the edge, it's actually really simple to get everything working well. This is a great example of how interplay between several tools we've been working on for a while. First, automatic persistence queries with Apollo. Link lets queries use GET while mutation still use POST. Second, Apollo cache control lets you specify cache control information in a fine-grained schema-oriented way. And third, Apollo Engine generates the smaller query IDs, so we can use those queries IDs in our GET requests without hitting the cache key size. And set the cache control header for the CDN. I hope you really enjoyed this talk. If you have any questions again, or if you want to connect with me, feel free to find all my handles at mass.dev. And I'm looking forward to chat with you all on the Discord channel of the conference.
Comments