As with any technology choice, choosing GraphQL as an API style involves tradeoffs. Some tell us GraphQL is here to replace everything else, others tell us using GraphQL is a mistake. In this talk, we'll explore why both these things are false, and how everything is context dependent.
It Depends — Examining GraphQL Myths and Assumptions
AI Generated Video Summary
Today's talk explores the nuances of GraphQL and how its effectiveness depends on the context. Caching is a polarizing subject in GraphQL, but there are tools available for caching. The trade-off for client-side flexibility in GraphQL affects performance predictability. GraphQL as a backend for frontend offers flexibility but limits true decoupling. It's important to consider the context and trade-offs when deciding whether to use GraphQL or REST. The speaker emphasizes the need for better conversations and understanding the nuances of GraphQL.
1. Introduction to GraphQL and Caching
Today's talk is called 'It Depends.' We'll explore the nuanced discussions around GraphQL and how its effectiveness depends on the context. Caching is a polarizing subject in GraphQL. Some say it's impossible to cache, while others say it's not a concern. Let's dive into it. GraphQL clients have powerful normalized caches, and backend resolvers can be cached too. However, the challenge lies in HTTP caching and the absence of shared HTTP caches. Shared caching allows responses to be reused across different clients and servers.
Hey, hello again, everyone! Thank you for coming to watch my talk. So, today my talk is called, It Depends. It may sound a little boring, and it might be, but we'll see. We'll kind of examine how there's certain discussions around GraphQL that these days we either hear like that's totally true or totally false. But there's many things about GraphQL that just depend a lot on the context you're in and have so much nuance. So, we'll just try to dive real deep into some of that stuff.
So, my name is Mark. I work at GitHub, and I'm from Montreal, Canada. So, It Depends. What do I mean by that? So, as I was saying earlier, with GraphQL, there are some topics that we're going to see today that either people say you can't do with GraphQL, or that GraphQL is the best solution for. But in reality, something I've noticed, working with a lot of different people, working at different places on GraphQL, is that the context in which we take these decisions are way more important than GraphQL versus REST, or GraphQL versus gRPC, or these kind of things.
And the first thing I want to start with is caching. And I've talked about this before. So, if you've seen a talk of mine on caching before, there might be a little bit of a repeat here, but I think it's one of the most interesting subjects around GraphQL because it's such a polarized subject. On one end, you hear some people saying caching GraphQL is literally impossible. The other end, people say it isn't a concern at all. So, let's dive into it. When we talk about caching, and when we hear that GraphQL is hard to cache, sometimes it's a little hard to understand why. Because on the client side, literally any GraphQL client you look at has powerful normalized caches on the client side with a lot of features. And on the backend side, well, we're using just regular programming languages. We can cache what we want, resolvers are just functions. And we've got tools like Data Loader that allow us to cache data loading, which is great. So it's hard to tell really what is hard about it when we have so many good tools to do it. Reality is that when people say caching is hard, they usually talk about different caching than what we just saw. They talk about HTTP caching. And in particular, what they say will be missing is shared HTTP caches. So if you've never heard about this, it's actually quite a complex mechanism, but the problem itself is kind of basic to explain. So when clients generally hit your GraphQL server, they got a response back. With a shared cache, you can cache these responses and reuse. Because it's shared, you can reuse these responses across different clients and across different servers.
2. Caching in Authenticated APIs
This kind of allows one client to take the cost for computing a response and other clients just getting it for free. The problem is the spec itself for HTTP says if you've got an authorization header in there, shared caches shouldn't cache things. If we do have all these great tools to do caching where we can and you've got an authenticated API, does it really matter? Well, it maybe does a little bit, but it's not the end of the world as we saw in the previous conversation where somebody says you literally shouldn't be using GraphQL because you lose caching. So, I think the first thing I want to talk about here is how every specific context informs a lot if GraphQL is a good choice or not so much of a good choice. But there's some truth to it as well in the sense that HTTP caching by itself is all about conventions. We have years of experience with HTTP caching. Browsers speak it natively. A lot of clients speak it. With GraphQL, as you saw, we do need to build those smart clients with normalized caches. But these tools are built and are ready to be used, so it might not make a big difference for you.
This kind of allows one client to take the cost for computing a response and other clients just getting it for free. This is great. We should always strive to have a cache like this if we can.
The problem is the spec itself for HTTP says if you've got an authorization header in there, shared caches shouldn't cache things. It may be a little weird if you just read the spec, but if you think about it, it makes a lot of sense because you don't want to be caching things that are specific to a user and be serving this in a shared way to other clients. The idea of a shared cache in authenticated API context really doesn't make that much sense.
If we do have all these great tools to do caching where we can and you've got an authenticated API, does it really matter? Well, it maybe does a little bit, but it's not the end of the world as we saw in the previous conversation where somebody says you literally shouldn't be using GraphQL because you lose caching. If your API is an authenticated API and deals with a lot of live data that just you can't afford to have stale, well, we aren't losing that much. We aren't losing that powerful shared cache because it would've been useful to us anyways.
So, I think the first thing I want to talk about here is how every specific context informs a lot if GraphQL is a good choice or not so much of a good choice. But there's some truth to it as well in the sense that HTTP caching by itself is all about conventions. We have years of experience with HTTP caching. Browsers speak it natively. A lot of clients speak it. With GraphQL, as you saw, we do need to build those smart clients with normalized caches. But these tools are built and are ready to be used, so it might not make a big difference for you.
3. Caching, Performance, and Query Optimization
The flexibility of GraphQL comes with a tradeoff in caching. However, this tradeoff is not unique to GraphQL and can be found in other API types as well. Performance is another consideration, and while GraphQL may not always be faster than REST in every use case, it offers advantages in handling evolving app requirements. Additionally, the efficiency of REST in large-grain hypermedia data transfer may not align with GraphQL's goals. The idea that one query is always faster is not universally true, as large queries can present optimization challenges on the server side.
The other part of GraphQL that's a bit harder to cache is the fact that it's just so flexible, right? Even if we had a shared cache with GraphQL, if clients were querying in various different ways, caching a response wouldn't be so great if not many clients use the same thing anyway. So, that's a cost that comes with flexibility. But it's not unique to GraphQL, right? You can write flexible REST APIs. You can write flexible gRPC APIs. The reality here is more of a tradeoff between writing something that is kind of a good fit for all clients, but heavily cacheable, or something super flexible where we miss some caching. So, the nuance is really here. And coming back to the authenticated API needing less of a shared cache, maybe a good example of that is some kind of social media platform. And if you think about it, if you think about a famous social media platform that invented an API layer like GraphQL, it starts making a lot of sense.
Let's move on to performance. Performance is another kind of can of worms. Here's another thing we hear. We hear of people moving to GraphQL for performance reasons. But what does that mean? Is using GraphQL instantly a performance boost for our APIs? Not necessarily. So if you, depending on the context, do we think that GraphQL is faster than REST in general? Sometimes you read that, yeah, it's faster for different reasons. But if you take a look at this really silly example, let's say our API only could do one thing. And it's fetching the current user, name and age, which do you think would be faster here, a get query for a user that's very optimized in its own endpoint definition or the overhead of a GraphQL engine, parsing, validating and executing a query like this. In this specific case, I would bet on the REST API because the REST API knows ahead of time what a use case is and doesn't have that execution overhead. But, of course, in a real world, your app probably doesn't have only one use case. And as your app evolves, more and more use cases evolve as well, and the one-size-fits-all of REST API, for example, is not necessarily what's best for every single individual client. So, this is even a trade-off that REST itself recognizes. It's not in the business to deal with being optimized for every single client. It wants to be efficient for large-grain hypermedia data transfer. So, it's not optimal for what GraphQL wants to do, and that's totally fine.
So, one other thing we hear is that one query will benefit us a lot, because back where people had REST APIs, people had to make a lot of different queries to resources. Following links like hypermedia could be very slow. With HTTP2 and HTTP3 even, that cost is becoming lower and lower. But you could still argue that one query could be faster in a lot of contexts. But that's not always true either. As when you're using GraphQL, maybe you're working on a large page, your one query which was faster can become very, very, very large. And the optimizing that query on the server side becomes harder and harder.
4. Parallel Fetching and Trade-offs
Some people might think we're going in circles with GraphQL, but that's not true. We can still craft different use cases from the client side using declarative queries. With the new defer and stream directives, we've made trade-offs, but we can mitigate the downsides.
And if some of that query fetching could be fetched in parallel, maybe you're now back to wishing you had different resources that you can fetch in parallel over HTTP2. So what do we do then? Well, you already spoke earlier today about the new defer and stream directives. These are exactly there for that problem. When we have large declarative queries, it doesn't mean we want everything to be computed before we receive it. Maybe we want to fetch different things in parallel or get something first. So at this point, some people might be aren't we just doing full circle here? Why aren't we defining resources? But that's not exactly true because we still have the declarative query, that's amazing. We still have the ability to craft different use cases from the client side. So here what with the defer and stream and keeping GraphQL, we've made trade-offs, but if we make trade-offs, it doesn't mean that what we trade off for is absolutely zero. We can still mitigate the downsides and that's exactly what we do with things like defer.
5. Performance Trade-offs in GraphQL
Performance in GraphQL is less predictable due to the trade-off for client-side flexibility. It depends on the specific use case, but the bet is that the benefits of client decoupling and schema offering outweigh the potential overhead.
One thing that is true, unfortunately for GraphQL is that performance is way less predictable than if you have well-defined use cases by the server. But again, that's the trade-off we make for the flexibility on the client side. So as far as performance goes, as you can see, it depends a lot on context. It depends on your specific use case, but with GraphQL, we make a bet that we'll have different use cases that are hard to power through something like ad hoc endpoints on the server side, and we want that client decoupling and offering a schema, even if it may come at some overhead, we've decided that the trade off was worth it.
6. GraphQL as a Backend for Frontend
GraphQL as a backend for frontend offers flexibility in representation, but doesn't provide full autonomy between teams. Types and fields are still shared, limiting true decoupling. GraphQL strives for one server, while BFFs allow complete separation and technology choice. One graph offers benefits, but requires careful consideration to avoid one-size-fits-all schemas. Client-specific fields and different user representations can be a concern. The success of GraphQL is important to us at this GraphQL conference.
The last thing I want to talk about is the idea of GraphQL as a backend for frontend. So if you're not familiar with the backend for frontend pattern, it's a pattern where teams define their own API for their own experience. So here you can see that team A owned an iOS app, and they also own a BFF, so an API server that is tailored exactly to their needs. Same thing with the Android app.
So at first sight, this looks exactly like a thing that GraphQL could be great at. It offers flexibility for the iOS app to consume a different set of data than the Android app, kind of decoupling the server from having to define maybe an iOS endpoint and an Android endpoint. The reality is that, BFFs, the back end for front end pattern, is not only about representation, so not only about the response or the resources we're creating, but it's also about complete autonomy of teams. And it's very good to realize that while GraphQL can offer us that benefit in terms of representation, it doesn't offer a full autonomy between teams. Types are still shared. Fields are still shared. And if you take a look at this query, for example, excuse the capital F on friends here, if we really wanted things to be different, you'd maybe have to do something like that where you maybe have a name, but you have a name for Android and you have an age and you have age for this one other client. That would be kind of truly decoupling these fields, but we can't necessarily ensure that. So that can be something that's annoying about GraphQL. Something can be amazing because we do share that graph, which can offer consistency. But it's good to realize it's not a replacement, an exact replacement for BFF, which is also a cultural pattern and a full autonomy pattern. And GraphQL we're kind of in between zone where clients can select a representation, but they still select it from a common base.
That complete separation has other meanings as well in the sense that with BFFs, you can write one in Go, write one in Ruby, write one in Java. With GraphQL, we're often striving for that one GraphQL server. So it might share the same rate limits, the same set of middlewares. Whereas with BFFs, you've got complete separation and you can choose any technologies you want. So I think it's good to realize that, not to mean that BFFs are better than GraphQL and the opposite, but that they imply very different things. And the idea of having one graph, which is very popular, and which makes a lot of sense, there's a lot of benefits to having one graph that all clients can integrate with and having that one source of truth for our domain has one schema. But it also comes with the responsibility to not go back to the one-size fits all we were fearing with REST, and define one schema that's the same for all clients. So we can't hesitate here. And really, adding client-specific fields should be a concern with GraphQL, to try and avoid reusing the same things. And another danger with the one graph idea is that while BFFs could have their complete different representation of a user, for example, here, we're still, if we're not careful, we're still sharing the same user type, and maybe a user in the Android world is different, or on Xbox, a user's completely different. So we have to keep that in mind.
So as you can see, a lot of things depends with GraphQL. And what do I want to say with all this? And why do I think this is even important to think about? I think we all want GraphQL to succeed. We're at a GraphQL conference.
7. The Nuances of GraphQL
For GraphQL to succeed, it's important to have better conversations and not recommend it where something else would be better. We need to consider the context, compare it to other options, and understand the trade-offs. Thank you for attending my talk. If you're interested in learning more, check out my book, Production-Ready GraphQL, and follow me on Twitter.
We're all GraphQL enthusiasts. I am. And one thing that's very important to me, for GraphQL to succeed, is we want developers using GraphQL to succeed as well. And one way to do that is not to recommend GraphQL where something else would be way better for the use case, and also not allowing somebody who maybe is a REST enthusiast to say that caching can never work with GraphQL. So it's important to realize all these nuances to have better conversations.
And yeah, I really don't want us to encourage somebody who has a perfect use case for the back and forefront and patterns with REST to just use GraphQL instead. So I think we want to recommend, we want to be able to know when GraphQL is the sweet spot or not, and make decisions based on that. So here's a few questions I like to ask in every decision. So in which context is this better or not? So as you can see, the same API but authenticated in public versus maybe a private, unauthenticated in public, and maybe a private and authenticated API changes the caching thing completely. Compared to what is also a great question. So maybe this thing is better, but compared to what? Because when we choose something, we always choose it versus something else. So we're leaving something else on the table, and that something else can impact our decision. So making kind of a graphical is best with no comparison doesn't help anyone and doesn't make any sense. And finally, what are the trade-offs? What are we leaving on a table by having such a flexible query language? And are there ways to mitigate what we're leaving so it's not as bad as cooling fully on the other side? Thank you very much. That was my talk. If you were interested in these trade-offs and want to know what the best way is to deal with them, I have this book called Production-Ready GraphQL. I have 30% off for GraphQL Galaxies as a coupon if you want to use it. I have the stream called GraphQL FM that you can watch. We meet with a lot of GraphQL experts almost every week and follow me on Twitter. Thank you so much, everyone.
8. REST vs GraphQL: Suitable Use Cases
In certain cases, REST may be a more suitable choice than GraphQL. For example, a public API with static data, such as a list of countries and their population, can benefit from the shared caching capabilities of HTTP. While GraphQL can also handle this scenario, REST may be a better fit due to its existing conventions and grammar.
So, unfortunately, I had this wonderful question lined up saying, you know, it depends on such a judgment-based question and how can people learn to make these decisions. But at the end you dropped this all like, oh, I have a book to help you with that, so I'm very, very disappointed. So instead, I'm going to ask this question. What is a good example where REST is the appropriate choice and GraphQL isn't? So I'll give you the example I always give, and it doesn't mean that GraphQL would completely be terrible in that context, but I think it's maybe a sweeter spot for REST. It's a public API with data that doesn't necessarily change between different authenticated users. So my favorite example is a list of countries, for example. If you've got a list of all the countries and their population and everything, it's data that's very stable that doesn't change often, it's data that's the same for everyone, and that's where a shared cache using the conventions of HTTP at their fullest potential is usually the sweet spot here. Of course, you can do that with GraphQL as well but it's more of the sweet spot. Because you can leverage this wonderful pre-existing grammar to get you what you need.
9. Concerns with Public GraphQL API
Having a public GraphQL API offers amazing possibilities for clients, but there are concerns. It's important to strike a balance between generic types and use case-specific fields. This nuance is challenging, but we focus on getting it right.
That's right. Cool, well we have a question from Rana. Have you seen any concerns with having a public GraphQL API? I know GitHub was one of the first to provide one. I didn't fact check this but I trust our Q&A. Yeah, I think GitHub was one of the first ones at the very least. And yeah, it's definitely not an easy thing to provide. It offers amazing possibilities for our clients and this is why we do it. But there are definitely concerns we need to take care of. The main one to me being that thing we covered in the talk where you have to be careful not to providing a GraphQL API that's just similar to what we had with REST with types that aren't very useful to any specific clients, but generic enough to be usable by everyone. So the balance here, we do need to stay a bit on generic side because we don't know who all of our clients are, but we also want to be very listen to use cases and provide use case specific fields at the same time. So that nuance here is very hard to get right. And it's something we try to focus on.
10. Caching Mechanisms and Future Plans
The caching mechanisms used in the public GraphQL API at GitHub are not anything magical or unique. The main approach is using a data loader to batch calls to external services and cache calls made during query execution. This ensures that queries don't overload the databases. While GraphQL is a powerful tool, it may not be the best choice for scenarios involving file handling, where HTTP and caching are more suitable. Despite the challenges of writing a book, the speaker is interested in exploring the nuances of GraphQL through dialogues in a future publication. The GitHub GraphQL roadmap is continuously improving, and feedback from users is highly valued.
Sounds like a really classic problem of just trying to find the right level of abstraction to best describe the data structures you have available. Absolutely. Same problems everywhere, right?
We have a question from Bastian. This is going to ask for an insight scoop, so I'll leave it up to you to answer or not. How many people are actually using the graph, the public GraphQL API at GitHub? What kind of caching mechanisms do we use? Yeah, I can't answer the numbers question but I can answer the caching question. So actually, that was a good subject we talked about in the panel earlier. But we don't do anything very, very magic for caching that you wouldn't see anywhere else. Our main way of caching is using a data loader approach. So first of all, batching calls to our external services but also caching calls we've already made during the execution of a query. So being careful not to be stuck in weird cycles where we query for way too much data or just having enormous queries overload our databases. Yeah, that's a good answer. And it shows that you don't have to do anything too complicated to run a GraphQL API, even at quite a large scale, I imagine. Yeah, we definitely have a huge scale. And the beauty of it is that before GraphQL API, we were running a big REST API. We were running a big web UI as well. So I think thankfully, we rely a lot on what our database team has been doing for years, what our infrastructure team has been doing for years. So on the GraphQL side, really the most important thing is making sure we never forget about using data loader. So that's the biggest danger we tried to check. Sounds good. All right. We have a question from Juan. Could you name some common and non-edge case examples/scenarios where GraphQL will definitely not be the way to go? Well, I kind of already asked this question, so I may have robbed everyone, but I've written it as well. Yeah, I think that's the basic one I have in mind. Another example is like anything that deals with... If you're dealing with files, for example, HTTP is so great to deal with getting files and using caching there. Fetching a file through a GraphQL field, for example, works, but it's just less of a sweet spot than getting an endpoint and getting a file back.
Amazing. Great. Are you going to be writing any more books now? Do you manage to commit these up to one or has the experience tarred you from writing more GraphQL literature? It's the kind of thing it seems like after you've written one, you're like, I'm never going to do that again, but in the end, I think I really want to write another one. One subject that really interests me is the topic of my talk. So one thing I've been looking into is this book that has these dialogues I had in the talk. Dialogues between different characters and exploring the nuances of all these topics, so keep an eye out for that.
And can you tell us what's coming up for the GitHub GraphQL roadmap, or is that entirely secret? It's I can't say anything secret for sure, but we're always improving it, so keep an eye for it. And we really would like your feedback as well. If there's anything that you feel is missing in a GraphQL API, maybe features that are present in our UI and not in GraphQL, definitely do reach out we're focusing on improving it. Okay, perfect. I think I've managed to exhaust all the questions I have, and we have satisfied all the questions in Q&A, so I think we're...