Though GraphQL is declarative, resolvers operate field-by-field, layer-by-layer, often resulting in unnecessary work for your business logic even when using techniques such as DataLoader. In this talk, Benjie will introduce his vision for a new general-purpose GraphQL execution strategy whose holistic approach could lead to significant efficiency and scalability gains for all GraphQL APIs.
Step aside resolvers: a new approach to GraphQL execution
AI Generated Video Summary
GraphQL has made a huge impact in the way we build client applications, websites, and mobile apps. Despite the dominance of resolvers, the GraphQL specification does not mandate their use. Introducing Graphast, a new project that compiles GraphQL operations into execution and output plans, providing advanced optimizations. In GraphFast, instead of resolvers, we have plan resolvers that deal with future data. Graphfast plan resolvers are short and efficient, supporting all features of modern GraphQL.
1. Introduction to GraphQL
GraphQL has made a huge impact in the way we build client applications, websites, and mobile apps. It allows us to be more efficient, minimize round trips, reduce risks, increase productivity, and deliver data in the shape the client expects. This makes it easier and faster to build things that our users love.
♪ Hello, everyone. My name is Benji and I love GraphQL. I think GraphQL is awesome. It's made such a huge impact in the way that we build our client applications, our websites, mobile apps, and more. It's shown us how we can be more efficient and get rid of this over-fetching and under-fetching. Really do things in a way that is highly optimized to the way that the client needs. Minimizing round trips and having type safety reduces the risks of something going wrong. Handling partially successful results means that even if things go wrong, we can still render something useful to the user. Built-in documentation increases productivity. And of course, by delivering the data in the shape that the client expects, we can minimize the amount of wrangling that we need to do on the client side. All of this makes it much, much easier and faster to write our client applications, which makes it easier and faster to build things that our users love. GraphQL is amazing!
2. Challenges with Resolvers in GraphQL
I hate resolvers. The GraphQL language is declarative, and yet resolvers turn execution into a procedural approach. They rule out optimization strategies, require additional effort for optimization, and make servers do more than necessary. Despite the dominance of resolvers, the GraphQL specification does not mandate their use. We can execute operations in any way as long as the observable result is the same. Resolvers enforce the graph nature of GraphQL, but there are alternative ways to execute operations.
I hate resolvers. I've always hated resolvers. The GraphQL language is declarative, and yet resolvers were not built to leverage this awesome capability. Instead, they turn execution into a procedural, layer by layer, field by field, item by item, approach. To my mind, resolvers are very much a minimum viable product approach to execution. They're simple to understand and to specify, but they punt things like solving the N plus one problem into user space, requiring schema designers to remember to use abstraction such as Data Loader to achieve acceptable performance. And they rule out entire classes of optimization strategies.
If you wanna optimize what you're asking your business logic layer to do based on the incoming GraphQL query, for example, to only select certain fields from a database, or to tell your remote API to include additional resources that are needed, you have to dabble with abstract syntax trees or similar look ahead or transpilation complexities. It's unpleasant and a lot of effort. Even people who put in this effort tend to only do so to a fairly superficial level, but the real efficiency gains would come from pushing this a bit further. All of this means that GraphQL is making our servers do more than they should need to, burning more CPU cycles, making more network calls, using more energy, putting more pressure on the environment. And it's not doing as much as it could to save us money on our server bills.
In fact, my hatred for resolvers is actually why I've joined the GraphQL working group in the first place, back in 2018. The GraphQL specification seems to dictate that we must execute our operations using resolvers. It just seems so unnecessary. GraphQL being a declarative language, why must we stipulate that we must execute it in a procedural manner? As I grew to learn the GraphQL specification, I realized, of course, that we don't stipulate that we must use resolvers at all. One paragraph right near the start of the GraphQL spec, which I must admit, when I first read, I completely skipped over, went straight to the execution section, states, conformance requirements expressed as algorithms can be fulfilled by an implementation of this specification in any way, as long as the perceived results look equivalent. We stipulate that it must look like that's how it's executed but that doesn't necessarily have to be what we actually do on the server side. So long as the observable result is the same, do whatever you want. But we still have resolvers.
Resolvers are still the dominant way of executing GraphQL. Even in projects that delve a little deeper into optimizing backend queries using tools such as my GraphQL pass resolve info module to look ahead and figure out what fields are being requested, we're still using resolvers. And the reasoning behind them is sound. The way they describe execution is correct. Without this definition, GraphQL could become more of a transfer format than an execution engine. Clients wouldn't be able to rely on the same assumptions, the assumptions that make things like normalized caching possible. Because resolvers enforce the graph nature of GraphQL, where we traverse from node to node, the value of the node that we're on being dependent on neither where we came from nor where we're going next. And yet, this doesn't have to be the way that we actually execute operations. I've been battling this problem on and off for almost six years. I've tried many experiments over that time.
3. Experiments and Constraints
I've tried many experiments over that time. I've really delved deep into the GraphQL specification to understand exactly why resolvers are specified, what they're protecting us from, why are they declared the way that they are. I've tried to work within the constraints of GraphQLjs to solve this problem, and I've been reasonably successful.
I've tried many experiments over that time. I've really delved deep into the GraphQL specification to understand exactly why resolvers are specified, what they're protecting us from, why are they declared the way that they are. I've tried to work within the constraints of GraphQLjs to solve this problem, and I've been reasonably successful. But it's always irked me that it just doesn't feel as ergonomic as it should be. The workarounds that I've come up with have been really clumsy and rigid. Admittedly, many of those solutions I came up with came before I had the support of my many sponsors, so I couldn't invest much time into their research. But thankfully, a lot of people have found that the software I write is very helpful. And as my sponsorship has grown, thank you, sponsors, the amount of time I have to invest in this problem has increased.
4. Introduction to Graphast
About two and a half years ago, I set out to solve the problem of execution planning in GraphQL. Relational databases use SQL, a declarative language, to execute queries efficiently. While GraphQL has resolvers, it lacks a powerful general-purpose query planning system. Introducing Graphast, a new project that compiles GraphQL operations into execution and output plans, providing advanced optimizations. The execution plan is built using steps that represent future data, similar to data loaders. Batching is a core feature of Graphast.
About two and a half years ago, I set out, very part-time, to solve this problem once and for all. But before we get into that, let's talk about my inspiration.
Those of you who know me will know that I love relational databases. Well, one relational database in particular, really. Relational databases use SQL, which is a declarative language. It specifies to the server what data is required, and the server decides how to execute it. And it can choose many different ways of executing that query. For example, factoring in the presence of indexes or figuring out how much data is expected so that it knows which type of operations to attempt to do. This is called query planning. Modern Postgres even has things such as genetic algorithms to help choose the best execution plan, just-in-time compilation to compile expressions down to machine code so they can be executed more efficiently.
Now, GraphQL is declarative too. So we have resolvers. We don't have generic execution planning for GraphQL. But it's not fair to say that we have nothing. Loads of people have felt this pain. So now we have specialized planners such as the GraphQL to GraphQL planner found in Apollo Federation, or optimization of the GraphQL internals in GraphQL JIT. We also have ways of optimizing particular patterns such as projections in Hot Chocolate or GraphQL to SQL transpilation with Hasura. But we don't have a powerful general purpose query planning system that takes a holistic approach and allows advanced optimizations no matter what your underlying business logic may be, at least until now.
So after that really long introduction, I'm here today to say, step aside, resolvers. There's a new way to execute any GraphQL operation. The working title for this new project is Graphast, and it works by taking your GraphQL operation and compiling it into an execution and output plan. The output plan is a straightforward mapping from the data that we retrieve through the execution phase to the GraphQL result that we want to output for the user. So we'll skip over that. Mostly what we care about is the execution plan. We build the execution plan by mostly following the execution algorithm in the GraphQL specification. However, we do this before actually executing. So we're not dealing with concrete data unlike we would be in the spec. Instead we're dealing with what we call steps that represent this future data. When it comes time to execute, a step is quite similar to data loader in that it accepts a list of inputs and it returns a list of outputs. Batching is built into the very heart of GraphFast.
5. GraphFast Plan Resolvers
In GraphFast, instead of resolvers, we have plan resolvers that deal with future data. By walking through the selection sets and calling the plan resolvers, we can build a plan diagram. The GraphFast system deduplicates, optimizes, and finalizes the plan. Deduplication simplifies the plan diagrams, optimization improves performance, and finalization prepares the plan for execution. For example, in the Stripe payment API, steps can communicate and pass information to fetch data more efficiently.
Whereas a field in a traditional GraphQL schema would have a resolver, in GraphFast it would have a plan resolver. The plan resolver has similar looking code as you might be able to see from a glance. But again, rather than dealing with concrete runtime data, it's dealing with future data, these steps, which we represent with a dollar symbol in the code on the right.
By walking through the selection sets and calling the plan resolvers at each stage, we can build out a plan diagram, like this one, that expresses what needs to happen. Now this plan diagram is not the final thing that will be executed, this is just our first draft. The GraphFast system will go through each step in the plan diagram, and give it a chance to deduplicate itself, to optimize itself, and to finalize itself. These are the three main lifecycle methods and that these steps may implement.
Deduplication allows similar steps to be amalgamated to simplify our plan diagrams. Optimization is the main phase, the most critical one for performance, and it allows steps of the same family to communicate with each other and pass information between one another. For example, if we're dealing with the Stripe payment API, we might have a GraphQL field that fetches the customer from Stripe. And then in our query, we might have a child field of that that fetches the subscriptions for the customer, and then various fields below that. When the step responsible for getting the subscriptions is being optimized, it could determine that its grandparent is pulling data from a Stripe customer and thus tell the customer step to fetch the subscriptions at the same time. Stripe has a feature called expanding for this. This would make the subscriptions fetching step redundant. So as the last action in Optimize, it can replace itself with a step that simply extracts the relevant data from the customer response. This way, we now only need one round trip to Stripe to get all the information that we require.
6. Graphfast Finalize Method and Plan Resolvers
The finalize method gives steps a chance to prepare for execution, such as compiling optimized functions or preparing final SQL or GraphQL queries. Steps can be implemented by users and have optional lifecycle methods. Graphfast provides prebuilt steps for common concerns and is designed to pass additional information to the business logic layer for more efficient execution. Graphfast plan resolvers are short and efficient, supporting all features of modern GraphQL. Graphfast is compatible with existing GraphQL schemas and a release is planned for the first half of 2023 under the MIT license.
Steps are designed to be something that you can implement yourself, much like you would with a data loader callback. They're a little more powerful than data loader and have these optional life cycle methods that we just discussed. But in the simplest case, all they need is an execute method that takes a list of values and returns a list of values in the same way that a data loader callback does.
We also have a number of optimized prebuilt steps that you can use to handle common concerns, including load one for batch loading remote resources similar to data loader, or each for mapping over lists or access for extracting properties from objects. And we're building out more optimized steps for dealing with particular concerns. For example, issuing HTTP requests, sending GraphQL queries or talking to databases.
Ultimately, our intent is to use these steps to pass additional information to your business logic layer, no matter what that is, so that it may execute its tasks more efficiently. Just like GraphQL helps eliminate over and under fetching on the client side, Graphfast helps you eliminate it on the server side. For example, if your business logic is implemented with an ORM, or something like that, you can use this additional information to perform selective eager loading to reduce database round trips. If your business logic is from an HTTP API, you could use this contextual information to dictate which parameters to pass, better controlling the data you're retrieving, reducing server load, and network traffic. And since Graphfast is designed around the concept of batching, you never need to think about the N plus one problem again. It's solved for you, out of the box, by virtue of how the system works.
Just like with GraphQL resolvers, Graphfast plan resolvers are intended to be short and only express what is necessary to communicate between GraphQL and your business logic. And despite their pleasant ergonomics, they unlock much greater efficiency and performance than resolvers can. Graphfast has been built from the ground up to support all of the features of modern GraphQL. Queries, mutations, subscriptions, polymorphism, and even cutting edge technology such as the stream and defer directives. And it's backwards compatible with existing GraphQL schemas. So you can use Graphfast to execute requests against an existing resolver-based schema and then migrate to plan resolvers on a field by field basis. If you already use Dataloader, then migrating to using Graphfast plan should be very straightforward.
We're hoping to release an open source version of Graphfast under the MIT license, the very same license that GraphQL.js uses in the first half of 2023. To be notified when we're ready, please enter your email address at graphfast.org. If you'd like to help me continue to invest time in projects like this, please consider sponsoring me on GitHub sponsors. And you may even get early access. Feel free to reach out to me with any questions. Thank you for your time, and I hope you're as excited about the future of GraphQL as I am. Thank you.