@defer has been long discussed within the GraphQL working group, and while not part of the specification yet, it’s an exciting new addition that may help with your application’s user experience and mask performance problems.
GraphQL won’t solve your performance problems, but @defer might help
From:

GraphQL Galaxy 2022
Transcription
Hello everybody, my name is Lucas Ledbetter and I'm a Solutions Architect here at apollo. I'm extremely excited to be here today, at least virtually, to talk about the defer directive and what it can mean for your clients. To get the full picture, we'll need to cover a few background details before diving into what defer actually is. Lastly, I'll give a few examples showing why defer can be an incredibly powerful tool in your arsenal to help your clients give a better user experience. So to take a step back, you might be aware that graphql requires the entire body to be sent as a single response. This is great for most queries, as your UI expects all data to be there. You certainly wouldn't want to return only some of the data for a user on a user profile page for example. As your graph grows, you realize not all fields are built the same. Some fields require additional time to resolve for one reason or another. For many organizations, this is simply due to scale, but for others, it could be tied to a third party service that they may not have full control over, thus overhauling the other party's response times, or a number of other reasons. Regardless of why it takes more time, they realize that the request for those fields cause the entire response to slow down. As clients begin to access those fields, they start to see the overall graph latency increasing as a result, which isn't ideal and means the user experience is impacted. The client should be able to partially render data as it comes in, but is unfortunately not possible with a single request. So to solve this growing problem, companies came up with two different approaches. Both addressed some of the problems, but each came with trade-offs. The first solution we'll cover is query splitting. This solution was to split the single query into multiple queries and require clients to orchestrate the responses. This sounds a lot like one of the problems that graphql was built to solve for, however this addressed the weakest link problem that plagued traditional graphql, until performance problems could be mitigated or addressed on the server. In addition to having to handle the responses on the client, it also required the server to handle multiple unique queries from the clients, each with traditional HTTP and graphql overhead, as well as being unable to use already-fetched data, such as a parent data type. Losing that additional context, especially for fields needing to make many calls to other services, could be quite expensive. On the other hand, prefetching became another popular way to address this problem. For requesting data ahead of time, it is possible to mitigate performance concerns by making assumptions about the user's intention. For example, grabbing the user's payment information prior to hitting the checkout page. This had the added benefit of it still being a single request. However, the user's experience on the client doesn't change much, if at all, when it works. As might be obvious, servers are required to handle this additional load for assumed interaction, even if the user never interacted with the returned data, making this a mostly informed guesswork. Both of these solutions made progress towards the end goal of a good user experience, but never got there all of the way. That takes us to the defer directive. As you might have guessed, defer is a possible solution to the ongoing problem of higher latency fields while still having a single response. The first question you might ask yourself is, what is this defer directive thing? And it's a great question. Defer is a directive used by clients to inform graphql servers that the request can be delivered incrementally by specifying parts of a query that can return later, or in other words, by deferring the response. As of today, this directive is not part of the official graphql specification. However, it's been used by a number of companies in production already. It's currently under the working group with a publicly accessible RFC, which covers ins and outs of the directive. And I'll have a link to the RFC at the end of the presentation. As a quick sidebar, we've all seen directives used in server-side applications, from formatting information to apollo Federation notation and more. But the graphql specification also calls out the ability to use them in the query itself. In general, I'm a big proponent of executable directives or client-side directives. They allow for clients to provide additional context to defer for use in processing and not simply notation in the schema about expected behavior as with server-side directives. We've seen client-side directives in the past with things like skip and include, as defined in the specification, and defer is much the same. So why do I bring this up? By making defer indicated by the client instead of the server, we can now let the clients inform the server of what fields can be responded to later and which ones cannot. Model requests are the same and be located for portions of a query in one component, but that same query may be returned as one response in another component. That context is hard to convey without defer. So now back to regular schedule programming. So how does this directive actually work? Is it magic? Well, it's not magic, but it might seem like it after working with it for the first time. We'll be covering this at a high level today for the sake of time, but there are excellent talks and posts covering the more technical aspects of the directive, as well as the RFC from the working group. So consider this basic schema for a fictional social media site, which has a query to return a list of users, as well as a user type with a list of friends. In this example, the company running this graphql schema, that is the friends field isn't as performant as it probably should be. I want to explicitly call out something here, however. The friends field, while returning a list of non-users, is itself nullable. This is important, and it's something we'll touch on in a bit as to why that's the case. Now given the query on the left to fetch the list of users along with their friends, we send the requests and it takes a second for the entire response. This isn't great, and as a client developer, we could render the user data partially before rendering the list of friends for each user. On a user profile, we probably render their username, contact information, and more. Well, the friends list can load when it arrives instead of blocking the entire rendering process. So we've now modified the query to include the new defer directive. You first might note that the directive is on an inline fragment. This is because the current RFC states it must be included only on fragments or inline fragments. Now, if you execute this query, we get back a multi-part response shown on the left. The first response has a new field, but otherwise contains the same structure as a normal graphql response. Subsequent responses have a new format. While it's not critical to understand all of this, there are two parts I'd like to highlight. First, each response has the has next field indicated in blue, which indicates to the client that more response is coming. Once it's false, the entire response has been sent. Each subsequent response has the incremental field, which contains the delta and path to the additional data. As you can see, the additional info here is inserted into the user's array at index zero. Now to put it together, here's a short GIF of the results. You can see we get back an immediate response in about eight milliseconds with basic data, then a second later, we get the remainder. If you're curious, the GIF is from a apollo sandbox and supports defer out of the box now, including a nice timeline to scroll through the various states of the response and could be a helpful tool to understand how your deferred response is happening. So as you begin to think about defer, you might notice that it doesn't actually improve performance. You're still waiting for response after all. But to really dive into how it can help, you must first ask yourself, why do you care about performance? This is a question I have to ask a number of companies to answer for one reason or another. For some, it's measuring how third parties are performing within their graph. For others, it's about how the user experience is going. Most of the time, however, the reality is that users are the most impacted by performing graphs. Many client teams measure against time to first byte, time to interactivity, payload size or any number of other metrics. But defer specifically targets time to first byte and can intersist with time to interactivity, depending on how it's implemented. So let's look at a few examples of defer in action and talk about how they can improve the user experience. We'll first want to look at a relatively straightforward way to leverage defer as a way to do partial rendering or lazy loading. Going back to our previous schema, we have now added a few more fields to the fictional social media site. As mentioned earlier, we can properly render the user's profile information quickly before adding in friends later. Now we'll see the two queries side by side. Note that we've added the avatar URL, username and title of the base query in both, since we need that right away. Otherwise, we'll defer the friends list, same as before. Now excuse my terrible mock up. I'm most certainly not a UX designer. In the normal query, we're blocked by the slowest fields, which is in the case the friends field, and we can't render any other data until it's finished. But you can see right away that we get the profile information in the deferred requests and can clearly see that we get our data quickly. Our time to first byte has dropped significantly. And probably even more importantly, time to interactivity has decreased as well. The user can see and engage with parts of the data before the full response is sent. Now going a step further, perhaps our company identified that the friends field wasn't or slowed down based on the number of friends selected for whatever reason. There's another potential path we could take. We've now added two new parameters to our friends field in the schema shown on the left to allow us to specify the number of friends to show and an offset. Using defer as well as graphql aliasing, we can now load the first 10 friends quickly and then offset the remaining friends in the deferred statement. Traditionally, this would consist of multiple queries to fetch the other friends to render. Allowing the server to run these queries in parallel gives other performance benefits beyond simple UX improvements. In fact, this is quite similar to the stream directive, which is a part of the same working group as the deferred directive, but falls a bit out of scope for today's discussion. In summary, it uses much of the same patterns as defer to send large lists incrementally with an initial list of items sent across. Each subsequent chunk in the multi-part response would then contain additional entries to the list until finished. There are a number of great talks and documents about stream, and I highly recommend taking a look at it if the, taking a look at the RFC for that directive, if sending large lists is a concern for your organization. These sorts of optimizations are enabled by using defer. And so as you go down the path of considering defer as a solution, consider how you might also leverage other aspects of the graphql to layer features together. You'll likely realize you can create solutions with deferred returns that weren't previously possible and can help with improving flights and performance. Note that it's also possible to have more than one defer or even nesting defers. In this example, we have a user which has a company profile associated with it. That company also lists their owner. In this case, it's quite possible that the user's service communicates to the company service, which then goes back to the user service to resolve the owner information. These hops take time, and in this case, we can defer each step to start returning data along the path, rendering data as it comes in. It's quite possible that the company name isn't immediately critical on a profile, and the list of owner isn't probably important at all. Now let's consider some other potential UX improvements. Fetching data is pretty straightforward. We can all envision how to do partial loading since there was a solve prior to defer. We'd simply use multiple queries or queries per component in react. But what about changing data? Mutations are in graphql as well, and thankfully, defer supports mutations as well. As an example, here's a sample mutation for a fictional payment service. Breaking this down, it returns a payment ID and a user, as well as payment status interface. This interface either indicates in success or failure states. As a side note, I personally love codifying errors and schemas for things just like this, and we'll actually touch on errors in just a moment. Diving a bit deeper into this, for folks that work with e-commerce, you know that payments can take a while. Defer affords us the ability to return a transactional ID, which we can use for later, such as the user refreshing the processing, and otherwise show a loading screen. Once we've gotten a result from the payment status field, we can then render appropriately, either success, in which case we have the build amount, or it's a failure, in which case we have the message, or reason in this case. From a user perspective, however, we can probably render the right result without subscription or polling to get an updated status. Instead, we use the existing HTTP connection to forward the information to our clients for use later. As much as I love talking about codifying errors in your schema, and shout out to Sasa Solomon for her excellent talk on the subject, errors are a fact of life. As much as we strive to build error-free code, codify all the various error states in our schema, and make tests to avoid them, they happen. And as we start discussing performance, errors are a natural topic of discussion, since they are also a measurement of success and user experience. With that in mind, you might wonder, what happens when an error occurs during a deferred statement? Well, to see, we have this certainly not at all suspect query to return definitely not an error in a deferred statement. I mean, who could have seen that coming? But in all seriousness, errors and defers are strange. The best analogy I have is that they're much like javascript promises. You're pushing the result to come at some undefined time in the future, and have to check to make sure that it's not projected or errored in this case. So for the client teams watching this, it requires you to think about partial error states as well more carefully. Previously, you might retry the whole request if you saw an error. But in the world of Fer, you might have a success state initially, but an error state later request. So you should consider your retry strategy on a case by case basis. To demonstrate this, on the right, we have a GIF of apollo sandbox once again, you'll notice immediately we don't have any errors, and then one comes through and matches the format we already saw. Looking through the timeline, we can see the first response was excellent, no errors, but the second reply does have an error. And this expands with multiple defers. If you have two deferred certain statements error, you get multiple errors in the error key as you would in a normal graphql response. And to briefly touch on some considerations you should have as you use defer around errors, we recommend codifying any expected error failure states such as user not found into your schema whenever possible. This creates a cleaner experience for your client teams when they get failure states and a deferred fragment, avoiding them having to parse through errors. When you can't, however, expect everything in the kitchen sink. Be ready to handle errors within your client by checking throughout your deferred responses to see if you run into errors. Lastly, if you have partial data, use that partial data and don't kick a user off the page if one piece fails to load. As obvious as this sounds, defer is best used for lower priority data, things that can wait to be returned. And as such, retrying or other mechanisms can help make sure the user experience is still excellent. And if you recall earlier, I explicitly called out their social media site's friends field was nullable. Defer supports both null and non-null fields, but the behavior of errors differs depending on how you use defer and how the field is typed. This differs from traditional graphql, so it's important information to know. Our friends list was a nullable field with a non-null list of users. The query we used before is great. It's likely that the request would be an all or nothing request where if we fail to get the friends for one reason or another, the field will remain null, as you can see here. Great. Now what happens if the friends field was null, non-null? In a normal query, we get back an empty data object since a non-null field returned null. Now how about deferred statements? You might expect that the same would happen. However, we get back a partial response without the friends field, but including the ID field. This is an important aspect to consider when dealing with deferring clients. The error behavior changes from traditional graphql, and you must be aware of this change when migrating to using defer in your queries. The same also applies when deferring specific fields. For example, we have a query to get a user and their avatar URL, which comes from a service that isn't 100% reliable. This field needs a response, however, so we mark it as non-null so clients can expect it to exist. Same as before, query on the left is not deferred. And as expected, we get back an empty data object. Now for the defer, we get back the partial object again. So all of this in mind, my recommendation is this. Be careful with nullability as defer introduces additional complexity when it comes to the response. Just be aware that this exists as it currently stands in the RFC, and you'll be okay. And while we talk about what defer is and how you can use it in principle, to use it in practice is thankfully fairly straightforward. apollo recently announced support for defer in both their react and Clon clients, with iOS support coming soon. For deferrers considering defer, this can be a massive help if you're currently using apollo's clients, since defer can plug into existing applications without headache-inducing rewrites. And once you have a client that supports defer, you simply need to rewrite your queries with defer in mind. As mentioned earlier, apollo Studio and apollo Sandbox can be great ways to test defer and see how it works on your graph, and ensuring what it does what you expect. In fact, that's what you're seeing on the right. It's the nested defer example from earlier. apollo has also released a version of defer for organizations using a federated graph, and this is especially unique since it doesn't require any support beyond being able to resolve federated entities. I won't have time to dive in deep here on this, but the short version is this. If you're using federation and have even a remote interest in defer and what can afford for your clients, check it out. The implementation is agnostic of library. So even if your subgraph library doesn't support it out of the box, this can enable it, so long as you're using apollo Router to serve your supergraph. To summarize how it does this, the router leverages the existing entity relationships and manages the asynchronous process of getting the data for you. Those subgraphs are not needing to handle sending the responses as multipart as all. If you don't currently have a federated environment, adding federation and entity declaration, which is a syntax, is a simple way to utilize the router as a way to get defer without much code change on your server. We've seen how great defer can be. It solves a lot of fairly complex problems in a declarative manner, playing to the strengths of graphql. But at the same time, consider the experiences you drive currently and what use cases might benefit from being deferred. We've seen examples of how to do this. One where it was dropped in fix for the team and another was schema needed to accommodate defer to provide a better client experience. Keeping in mind those experiences can ensure your schema remains expressive while still enabling defer as an option to improve user experience. Finally, as mentioned before, defer is driven by the client. They will be most aware of what use cases can be driven by defer. We've seen defer that is, or we've seen that defer is best used for lower priority data. So simply stuffing a query full of defers will provide little benefit. Make sure defer is used carefully and ensure it makes sense to do so when you do use it. And while you can improve the user experience and clients and performance, defer isn't a panacea. It won't solve your server side performance concerns and should be treated as such. Identifying and resolving server side performance issues can also remove some of the need for defer. So doing both can help your graph seem and be more performant for clients and servers. And so a few parting thoughts around this. First, defer is a powerful tool for your consumers to your graph and should be leveraged if possible. There are a menagerie of ways to improve your user experience using it. And to that end, while specification is new, there are a few libraries that already support it. And apollo recently announced support for organizations using a federated graph, as mentioned previously. And my last point today is this. If you are interested in defer, but it doesn't meet your requirements or you feel like it can be improved, get involved. The graphql community is deep and wide and the working groups can always use more feedback about specific use cases, concerns, or more. Get actively included if you can. Lastly, just some resources, as mentioned, we have the link to the RFC, Sushil Solomon's talk, the 200 OK talk about codifying errors in your schema, docs for both apollo react client, which was used to demo defer earlier, as well as the apollo Router Federation defer support. With that, thank you. You can reach me at LLEdBetter on Twitter and that's my email, lucas.apollographql.com. Hi, everyone. Nice to meet everyone. Thanks for joining today's talk. Awesome. Thank you so much for being here. So just a reminder to everyone, you can join the Andromeda Q&A channel on Discord to ask your questions for Lucas here in this Q&A. But let's go ahead and get started by taking a look at the answers to the poll question. So again, the question that we had was, how do you currently handle slow fields within your graph? And it looks like you have a pretty overwhelming strong response for just wait for the response. A little bit behind that is split the query and prefetch when possible. But Lucas, what do you think about the results? Is that in line with what you expected? Yeah, for sure. I think in general, we see a lot of companies doing this where it's not currently feasible or possible to orchestrate the responses to handle split queries or making assumptions about user interaction. We may not have data there. And so in general, it's really just we wait for those slow fields to show up and then render when we get it. Yeah. And just to answer a question here from the audience, does defer replace things like query splitting or prefetching? It doesn't necessarily replace 100% of the time. There are certainly use cases for handling both of those situations. For example, you may not want to have the data requested all the time. There may be an extensive field that the user is needing and it may be necessary on the page at some point. For example, if they scroll down. So split queries or prefetching might be a more appropriate solution. But in general, defer can help with most of these. And when you're talking about implementing something like defer, are there certain areas that you may want to try it out first? Or is there a way that you can do more of like a proof of concept or implement it into a code base? Or do you typically see people going all in on it right at once? Yeah, it's definitely an incremental process. We don't see people just all of a sudden add a defer to every single query that they do for a number of reasons. Primarily the fact that defer requires server support. And so server support can vary across languages. I know we heard that graphql Yoga, for example, supports it now. However, on the apollo side of things, we just introduced support for entity-based defer. So if you're running Federation, this has just become available and it may be something that you're not aware that exists. But as you continue to look at it, it's also part of the thing or part of it is you need to measure what success means to you. Whether that's time to first byte or whether it's when the user can start interacting with it. And that's going to be something that you have to kind of measure and see how beneficial defer can be because you may realize that it's not providing the benefit that you expect and may not be worth the continued investment or engineering investment towards that specific use case. And so with all that being said, I definitely think taking a look at it and doing a proof of concept, if you haven't already looked, the router is an easy way to do this through kind of a Federation-like model where you add entity declaration to interior existing graphql server. That can be an easy way to get started and measure whether defer can be a good tool for you. Yeah. And are there a certain best practices or tools that you would use in order to maybe like measure or stress test in order to determine what may be a good place to apply this first? Yeah, I would definitely look at seeing where it takes most time to render on your site. This can vary based on UI framework you're using. But understanding where your rendering time comes from most can be usually where you can start to drill down and find out why this is happening. And oftentimes it's related to your data service. react, for example, has a really good profiling tool, which you can use to get an understanding of how long it takes to render a specific component. And that could be a good place to start to see how your data is being rendered within your UI and seeing, can we delay this and partially render this component until we get the rest of the data. And you talked about performance problems in general and some of those issues. Are you seeing any kind of newer approaches or best practices to performance problems aside from defer that are of interest to you? Yeah, I think obviously there's things like the stream directive, which obviously didn't fall into this discussion, but I think it's definitely worth talking about because stream also solves a lot of the problems that defer may not fully address, things like sending full lists or arrays of data across. This is a really common problem where you have a gigantic array of, for example, statistic information that you don't want to request all of it at once, which may slow down your request end point. But in this case, defer may not be appropriate given constraints like the ability to do offsets like we talked about during the talk. And so stream is one area that I think is really interesting and something that is also alongside defer. So once that becomes official, it will be interesting to follow. I also think that there are other areas and the tools you can leverage to improve performance. I know a lot of people use subscriptions or effectively live queries, and those are also good utilities to help with some of this data responsiveness effectively. Awesome. Great. Thank you so much for pointing those out. So it looks like we don't have any more questions from the audience, but is there anything else that you'd like to add before we hop into a break? I think the only thing I would add is obviously I'm going to plug the apollo sandbox one more time. So if you are looking to integrate with defer, I think this is a super helpful tool right off the bat. It doesn't require any integration with apollo tooling at all. You just point it towards your graphql server and you get a nice little UI. But probably more importantly, in the context of defer, you also get that nice timeline. So you can see how your deferred response is happening. Things like this first response takes 100 milliseconds, and then the remaining response takes three seconds. And you can get a clear image on how soon that first response can be sent for you. And that's just part of our tooling that's available for free. So I definitely recommend taking a look at that as well as Federation entity-based defer, because that can be an easy way to do this without any support from your existing graphql library. And so if you have any questions or anything like that, always feel free to talk to me on Twitter. I'm always happy to help. And I also believe I put my email in the slides as well. Awesome. Great. And just to reiterate, where can people find you on Twitter, Lucas? It's at LL.EdBetter. My last name's a little bit long, but it should be on the slides. And then my email is just lucas.apolographql.com. Perfect. Awesome. Well, thank you so much for being here, Lucas, and for answering our questions and for the super interesting talk. We really appreciate you being here today. Yeah, thanks for having me.