GraphQL Performance and Monitoring

Bookmark

GraphQL abstracts the downstream API calls from the frontend and all that frontend has to do is request fields in a query that are required for the rendering of the component. The frontend is not aware that a corresponding field might result in an API call or heavy computations on the backend side. This abstraction hits the performance when the GraphQL schema is not structured properly. Let's take an example to understand more:

Here's the query to get available rooms for a hotel:

hotel (id: $hotelId) { id rooms { id type name } }

The frontend doesn't know that the rooms field inside the query hotel will fire another API call even type field would be fetched from another API endpoint. These nested API calls worsen the performance if there are more rooms. We can effectively solve this structuring the schema well and using data loaders.


Transcipt


Intro


Hi everyone. My name is Ankita Masand, I'm an associate architect at Treebo. I work on Treebo SaaS product called Hotel Superhero. Hotel Superhero is a cloud-based hotel property management system used by various hotel chains across the world. It provides a pool of features for creating, managing bookings, generating invoices, various reports, configuring prices of rooms in a hotel. We have been extensively using GraphQL for this application. Today I'll talk about GraphQL performance and monitoring.

[00:49] Here's a list of topics that we look at in this talk. Performance implications of using GraphQL in the stack, Designing GraphQL Schemas, Batching and caching at a request level using DataLoaders, Lazily loading some fields and streaming responses in a GraphQL query using the new defer and stream directives, Caching in GraphQL and finally Monitoring GraphQL queries using New Relic. Let's get going.


A Use-case of Hotel Property Management system


[01:14] Understand the implications of using GraphQL by using a simple use case of a Hotel Property Management system. What you see on the screen is the representation of The Big Bang Hotel. It has three room types, room type one, two, and three. One A, One B, One C. Other rooms for room type one and booking like the blank boxes indicate bookings for particular rooms. Booking 1 and Booking 2 are for room number 1A. The bigger box indicates that booking extends for more number of days.

What would it take to render this UI using a frontend client? So, this is our view that a hotelier sees on a screen to understand and manage bookings for his hotel. Let's look at the HTTP calls that we would have to make from the frontend client to render this user interface and show the list of bookings in this hotel.

[02:07] We'll first make parallel calls to fetch hotel data like name, location. Then make a call for fetching room types, rooms, bookings. And once we get a list of bookings from the downstream services, we'll make calls to fetch bills. So, every booking is associated with a bill. And there are also some attachments in a booking that we have to show on that user interface. So, for every booking, bill fetch is corresponding bill and attachments. Every booking is done for a particular user. We'll then call the users API to fetch more details about that user and also his preferences.

This looks okay. And there are not too many calls and something that we are used to. These are the API calls only when there are three bookings in a hotel. If you have seen carefully, we were fetching bill API thrice, which means there are three bookings in a hotel. A hotel cannot afford a Property Management system for managing three bookings. What would happen if there are hundreds of bookings in a hotel? So, we'll have to make hundred API calls to fetch bills attachments. There would be too many round trips from frontend client to server. How do we solve these issues?

[03:15] We were at the early stage of building this application and it looked okay to experiment with GraphQL and see how it turns out. The GraphQL would give the response in one API call, there are no multiple round trips to the server. It is declarative in nature and the client specifies the data that it needs and hence a considerable reduction in the response size. There is no repeated boilerplate code for different frontend applications. And the strongest of all it has Strong Type System. The pitch went really well, and the frontend team got a chance to experiment with GraphQL and add it to the stack.


GraphQL query used for collectively fetching the response of all the API calls

[03:50] Let's look at the GraphQL query that we are using for collectively fetching the response of all the API calls. If you see there's a query hotel by ID. Inside hotel by ID, we are fetching room types. For every room type we are fetching rooms. We are also fetching bookings in the same query. And inside bookings we are fetching bill. Bill for every booking, attachments, customers, and for every customer we are fetching his preferences.

Let's see how each of these queries would be resolved on the GraphQL server. For resolving hotel by ID query, we are making a call to the downstream service to fetch details of a hotel. For resolving room types, we are making a call to fetch room types from a downstream service. And for every room type, we are fetching rooms.

[04:40] This is the call that we make to fetch bookings. And for every booking we'll make call to fetch its corresponding bill, attachments, customers, and for every customer, we'll make a call to fetch his preferences. This looks cool because we don't have to write code on the frontend site to manage when to file particular APIs. Like bill is dependent on booking. Once we get response of bookings, then only we can file bill APIs. We also don't have to run maps on the frontend to map room types to rooms, rooms to bookings, bookings to build. GraphQL does all of that.


Performance implications of using GraphQL


[05:14] So, what are the performance implications of using GraphQL? After all the hard work, we felt that the performance of the application is not improved much for bigger bookings. Bookings that extend for more number of days that have more customers or for hotels that have more room types and more rooms.

So, let's look at a list of API calls that we are making to the downstream services. Let's consider an example of 25 bookings. For example, in that query we are fetching 25 bookings. We'll make one call to fetch the details of a hotel. One call to fetch room types. Let's say return six room types. Now for every room type, we'll fetch corresponding room. So, we'll make six calls to /rooms API, one call to fetch bookings. This API returns 25 bookings. That's why we'll make 25 calls to fetch bills and 25 calls to fetch attachments, which are for every booking. Collectively, there are hundred users in all of these booking. One booking can have multiple users. So, we'll make hundred calls to fetch user details, a hundred calls to fetch his preferences. Total in all, there are 259 calls that we have to make for fetching 25 bookings and showing it on the user interfaces. Yes, we have to make 259 calls to the downstream services for resolving just one GraphQL query.

[06:33] What would happen if there are more room types and rooms in a hotel? The number of API calls would keep increasing based on the number of room types, rooms, bookings, and the number of users in these bookings.

How do we make this right? Let's analyze what we are actually doing in the query. We are fetching the entire response in one GraphQL query, including the users and their preferences. We are fetching room types inside the hotel object, which is all okay. But for every room type, we are calling the rooms API to fetch rooms, because this is the structure the user interface expects.

[07:09] We are doing this because we don't have to run loops on the frontend to get the entire map object as it is required by the user interface. We are fetching bills inside the booking object, which means we'll call the bills API for every booking. We don't have to display the user preferences on load. Why fetch it in the initial call and show loading state to the user? This is unfair to the user because he's not even concerned of seeing user preferences in the first go.

How do we solve these issues? Should a GraphQL Schema map 1-1 to your user interface or to your domain models? If you see carefully, we are fetching room types inside the hotel object, because room types is a property of a hotel, which is all okay. But we are fetching rooms inside each of these room types. Rooms is not a property of a room type. Rooms is a property of a hotel. Domain model considers rooms as a property of a hotel and not of room type. We are doing this because that's how the user interface that we are trying to build expects.

[08:13] So, at which level in the query should be fetch rooms? We should try to design our GraphQL Schema more like our domain model, because if you design it based on your user interface, you would be able to satisfy just one use case of your frontend clients and GraphQL Schema should be designed in a robust way, so, that it can be used by different frontend clients. Rooms field should be a property of a hotel and not room types.

Let's move ahead. Here we are fetching preferences inside the user object and preferences is a property of a user, that's how exactly the domain model also expects. How do we solve this problem of multiple API calls for user preferences field although this is mapped the right way? 


Batching and Caching using DataLoader


[08:56] Here comes the interesting part of the talk, batching and caching using DataLoader. You would be able to understand the importance of DataLoaders. Only if you have seen multiple calls going to your downstream services for resolving one single GraphQL query.

DataLoaders basically collects the API request within a single frame of execution. What you see on the screen is the number of API calls before using DataLoaders is 259 to the downstream services. It is now reduced to two calls to the downstream services. How did this happen?

[09:28] So DataLoader doesn't data fire API calls immediately to your downstream services. It would collect all the API requests and then within a single frame of execution, and then would fire one badge to API call. For example, for resolving every bill, we were calling bills API 25 times. Here if you see, this is a badge to call, DataLoader is collecting all the bill IDs and would fire downstream service only once. Same goes for attachments, user ID, and also for preferences. So, if you count this, there are just eight calls that we have to make to downstream services for fetching 25 bookings.

DataLoader is a great addition to the stack, and it is not very difficult to implement this in the GraphQL server. So, this creates magic by reducing 259 calls to downstream services to 8. And this is a great performance booster to your applications.

[10:23] Let's look at the implementation of billsloader and usersloader. So, billsloader this takes in some keys as in some bill IDs and it then calls the batch billsloader. We look at the implementation of batch billsloader function in a moment. You have to return these DataLoaders from the context objects. So, that is available in all the resolvers. This is how a batch billsloader function looks like. It collects all the bill IDs within a single frame of execution and would then call the downstream service only once. It would return an area of bills.

So, from your resolver, you have to write billsloader.load. This is the first bill ID. Second bill ID, all these bills are different. But GraphQL server won't call the bills API immediately. It would collect all these bill IDs and then put fire this batched call at once. Same goes for users. This is the badge API that we can use for users. And these are all different users, but we would call the downstream API call only once. This is the batching really improves and helps a lot in boosting the performance of the application.

[11:32] Let's see how DataLoaders help in caching at a request level. If you're calling the same key twice in a DataLoader in the same request, the subsequent responses would be solved from the cache. Let's understand this with the help of an example.

Here's a GraphQL query that we are using to fetch JavaScript conferences. We are fetching name, location, speakers of that conference, their talks and profile of every speaker. Bob is speaking at four conferences, which means that without DataLoaders obviously, his profile is going to be fetched four times, even though it has the same user profile ID. But if you're using DataLoaders to fetch this user profile, the subsequent calls for his profile would be solved from the cache.

[12:16] Let's look at the implementation. This is a user profile loader badge to function. It is going to collect the keys and then would hit the downstream services. If you see userProfile.load, this is the same user ID. This is the user ID of Bob. First call we'll call downstream services. These are all same API calls. So, it is going to call downstream service only once. Perfect. Looks like we are able to solve the multiple API calls disaster using batching and caching. Obviously, 259 calls getting reduced to eight.

Can we do more? Can we improve further? If you have noticed carefully, we have the preferences filled in the user object is not on the critical path of user interface interactivity. Can we defer the execution of that user preferences resolver? Yes, we can. We can use defer directives to defer the execution of user preference resolver.

[13:16] Let's see how we would do it. All we have to do is just add defer directive to the preferences field in your query. The response that we would get... So, preferences would be null in the initial response until GraphQL results are on the backside. And in the same HTTP connection, we would get the response of preferences as patches. What you see here is the response of bookings object one, customers one, and these are the preferences of that particular customer. Preferences would be loaded asynchronously on the user interface, and it would block the main view.

What we have been thinking till now is request gets resolved to a response. So, we send a request in the HTTP connection to a GraphQL server. GraphQL server resolves all the fields in a query and sends back the response. It is easy to understand different stream, if you imagine it this way, we send request to a GraphQL server. You are subscribed to an observable. The same HTTP connection is still open and GraphQL server keeps sending responses and patches as the execution of the fields.

[14:20] Let's also see how we would use stream directors. Let's say, you have hundred bookings for a particular hotel. You don't want to fetch all hundreds of bookings in the same query. You won't even be able to fit those bookings in the viewport of the user. So, in this query, we are telling GraphQL that please send me 10 bookings and stream rest of the bookings as patches in the same HTTP connection. We'll get 10 bookings since the initial response. And the subsequent bookings will be solved as patches.

This all works well for a single request. But there's one important thing that we have missed. The hotel room types, and room numbers don't change quite often. Can we do something about this?

[15:02] Let's look at what resolver level caching is. So, we can cache static data that do not depend on the identity of the user in Redis or Memcached for some fixed TTL. RoomTypes and rooms fields of a hotel they are static and would be same for all the users. This is a good example of resolver level caching.

Let's see how we would do that. When a frontend client request for hotel bookings query from GraphQL server, while resolving room types, we are first checking if room types is present in the Redis cache. If it is present, we'll directly solve it from the cache. If it is not present, we'll hit the downstream service toward the response in Redis, and then send it back. Same goes for rooms resolver. We'll check if rooms is present in Redis cache, if it is present, solve it directly from the cache, otherwise hit the downstream service and store the response in Redis, and then send it back to the resolver.

[15:55] Here in this case, even the room types and room fields don't change that quite often, we are still hitting the GraphQL server. Is it possible that we get this response from the nearest CDN server, and we don't even have to hit the GraphQL server? Yes, we are talking about, GET request here. In GraphQL if you have seen, we always make post requests. This is because we send the query string to GraphQL as a request payload and post request. Because it doesn't fit well in cacheable GET request. So, the first milestone to hit here is to convert post to GET request. How do we do that? We can do it using Automatic Persisted Queries and it is very simple to implement.

With Automatic Persisted Queries we send the hash of queries string in GET HTTP request to the Apollo server or in GraphQL server. Then server figures out that this particular hash corresponds to this particular query string, it would resolve that query and send the response back in the same response format that the client would expect. So, here we accomplished this goal of sending GET HTTP request from the frontend client using GraphQL. Now what's next. Once we are able to send GET requests, which means that we can take advantage of all the cache-control headers that are available. We can specify the max age of a field in a query that for which this particular field is valid, and it is okay to solve this field from the nearest CDN service instead of hitting them GraphQL servers. So, we can take advantage of all the cache-control headers, and this would improve performance even further.


Caching in GraphQL - Recap


[17:38] Let's recap what all we learned in caching. We learned that we can do caching for requests using DataLoaders. We can do resolver level caching to store the response of downstream services in Redis or Memcached. We can use Automatic Persisted Queries to send the hash of the query string to GraphQL, and we can use cache-control headers on different fields in the query.

Here's a question for thought. Now that we are using GET request, can GraphQL support 304 Not Modified HTTP response status code?

I'll move to the next topic, which is Monitoring GraphQL Request New Relic. This is really interesting, and it is a great addition to the stack. What you see on the screenshot, this is a top 20 most time consuming GraphQL queries. You can also see the slowest average response time top 20 GraphQL queries.

[18:30] You can look at P95, P99 times of a particular GraphQL query using this powerful tool.

This screenshot source the calls that you're making to external services. The percentage of calls that you're making to your different microservices.

This shows the trace details of a GraphQL query. So, whenever there's a new requirement or a new feature on the frontend client, all we think that we have to add this particular field in a query and that's it. We would get the response and we render it on the UI. We don't even think that this particular field is going to make an API call to downstream service or is going to hit the database. And that would impact the performance of this particular query. So, by looking at the trace details, we come to know that this particular field takes this much time to resolve. On this particular field hits the back-end servers. This particular field is hitting your database. So, by looking at the trace details of a particular query, it is very easy to understand, what is consuming time in this GraphQL query.

[19:44] Let's recap all that we learned at this talk. The clients might face a performance hit for bigger and more nested queries in GraphQL. We should keep an eye on how many API requests or database queries we are making in a GraphQL query. We should design the GraphQL Schema more like our domain and not map it to the frontend user interface.

DataLoaders are used for batching multiple API calls to the same endpoint in one request and thus reduces the load on your downstream services. The way we saw for bills and users API. DataLoaders also help in caching the same API request and call the downstream service only once for the same cache key. We saw that Bob was speaking at four conferences, but we were loading his profile only once because his user profile ID is the same. We can use defer directives to defer the response of some of the fields in a query that are not in the critical path of user interactivity, take more time to resolve or have a bigger response size. We saw that we can defer the execution of preferences resolve or in a user object. We can use stream directors to stream the response of the list field in a query. We can use Redis or Memcached to cache the response of downstream services at a resolver level. We saw that we don't have to hit backend services for getting static data like room types and rooms of a hotel. We can get it directly from Redis or Memcached.

[21:07] Finally, we can use Automatic Persisted Queries and make GET requests from frontend client to GraphQL servers. Here we sent the hash of the query and a GET request to the GraphQL server, using this we can take advantage of all the cache-control headers by specifying the TTL for some of the fields in that query. And then we can fetch these from the nearest CDN server. Finally, Newrelic makes our life easy, and it shows that trace details of a particular GraphQL query. It is really a good addition to the stack. Thank you so much for listening. Have a great day.


Questions


[21:41] Mettin Parzinski: So, looking at the poll results, just as a reminder, the question was if GraphQL supports a four or three response status code and 50% says no and then between yes and not sure, divide 25 and 25. Well, let us know.

[22:04] Ankita Masand: Yeah. So, it's 304 response status code. Let me explain what 304 response status code is. It means that it's a Not Modified response status code. That means a particular entity that you're requesting from the server has not been modified since your last fetch, which means... Let's consider an example of a rest API. You are fetching some particular entity and you have let's say version one of that entity already on your client's side, and you're requesting it again. Okay. And this entity has not been modified. So, the server will return you 304 response status code, which means please use the entity that you already have, that would save some time to download that particular object and parcel. So, that time would be saved. My question was, this is this weekend easily use and rest and take advantage of it. My question was, can we support 304 response status code in GraphQL? Currently it is, no. Because GraphQL... We don't actually fetch just one particular entity. It's a set of different and many entities that we fetched in one particular query. So, sending a response code as 304 doesn't make sense because we are trying to say the particular query as a whole has not changed, which may not be true. So, we can take advantage of cache-control headers, but currently GraphQL sends 200. Even if all the fields in the query have not changed.

[23:39] Mettin Parzinski: Okay. Then a follow up question. Does it support 418 as status code?

[23:49] Ankita Masand: 418... It's what I am a teapot status code, which means the server refuses the attempt to brew coffee with a teapot. I can guess it doesn't support, but I'm not quite sure, I've not tried it.

[24:10] Mettin Parzinski: I think we should implement that. So, if anyone has some time to implement this, it's a important status code that been need in GraphQL.

Ankita Masand: I'm with you. Yes.

[24:22] Mettin Parzinski: All right. Let's jump into the questions. If you still have any questions, there's still some time to type them while we're talking. So, be sure to jump on the Discord channel, Milky Way Q&A. First question, can the response size of a GraphQL query affect the performance of the application? And if so, how?

[24:42] Ankita Masand: Yeah. So, response is actually affecting the performance of the application. Let's consider an example. Now, you are fetching hundreds of objects in a list and these objects are nested. Okay. So, we have seen in our application itself, we have seen that you're transporting around more than two to five MBs of data over the network, because the array is quite big. There are too many objects, and it is quite nested. Okay, so, if you see the downloading time that this object takes is also more because the object is quite big. So, yeah, it affects the performance of the application. And if you want to overcome this problem, what we have done is instead of having too many filters on the client side and getting all the response from GraphQL and having filters on the client side, we could try having filters on the server side so that we send lesser fields and the filtered response to the client. So, in a way that we are downloading lesser and smaller content.

[25:51] Mettin Parzinski: I'm just wrapping my head around it. So, yeah, that makes a lot of sense. But what about things like Pagination of course can also help, but then again, if your object is just too big, you can apply…

[26:06] Ankita Masand: Pagination. Yeah. Pagination is a good idea. I think we should always be fetching. Like I said, in the talk as well. We should always be fetching, not if you have hundreds of bookings, don't fetch all hundreds of them. You won't be even able to fit them in the view port of the user. You are fetching 25 bookings, in some cases these 25 bookings are nested objects because in GraphQL you have a flexibility to query field inside field, inside field... So, there can be very nested fields. Yeah. That could affect the performance of your application. Like I said, if you see it on URL like you would be able to... Yeah. You'd be able to understand that these particular nested queries are also affecting the performance.

[26:51] Mettin Parzinski: I muted myself and now welcome to the digital world. It's like this is new thing. Meeting remotely. So, performances of course. Well, one of the most important topics to make a good web app, of course, there's a lot of things that make a good web app. And well, Pagination is one thing you can do, but is there any other thing you can think of that can help, of course, with GraphQL to improve performance?

[27:29] Ankita Masand: Yeah, actually, like using differing some particular fields that you don't need on the viewpoint of the user. I saw a talk by Uri that defer stream directors are quite powerful and being supported by GraphQL Yoga. So, with these fields, we don't really fetch whatever's not required on the user interface on the first load by the user. So, that's quite a good performance booster that you can give to your application. But more than that in our application, what we have seen is just fetch lesser fields. On the frontend side, we think that let's fetch five to six more fields and you would be done with the tasks that you are given at hand. But we don't understand that these five to six fields are going to call your downstream services. That's when you realize that's when you understand that the cost of the query is hitting two, three database queries. Hitting two, three downstream services. That's when you realize that it's going to take more time.

[28:33] Mettin Parzinski: Yeah. Yeah. It's sometimes... If I understand correctly, you add two or three more fields to your query and you think it's just two or three more fields, but that can actually be a really heavy query. So, basically the advice would be to always just keep monitoring. When you are editing a query, keep monitoring, keep an eye on what's happening. Test before, test after and make sure you're not slowing down your application significantly.

Ankita Masand: Yes. Yes. Absolutely.

[29:05] Mettin Parzinski: All right. Awesome. So, okay. That's good stuff. And going back a little bit to the UI end. So, we are saying just fetch what's in the viewport, but are you also using something like skeleton screens in the UI?

[29:27] Ankita Masand: Yes. Yes. So, until you are fetching your data, you can take advantage of... We are using suspense. Okay. Suspense and React, where you can show the skeleton of your UI till you get the data. You get an improved, perceived performance on the UI side. We can do a lot of stuff on the client side as well to gain improvements on the perceived performance to the user. User fields that something is loading, there's no blank screen, or you don't just show the loader to the user. There's something that is being there on the screen and he's patient enough that okay data will come now.

[30:04] Mettin Parzinski: Yeah. Yeah. I like that term perceived performance. In my opinion, actually more important than actual performance. Because well, if it's fast, but it doesn't feel fast, then it doesn't matter. And something like a skeleton screen can really help the user to feel like, "Okay, something is happening." I remember actually when I was still in school, and I was learning about performance at perceived performance. The example they gave... It was about an airport that had a long walk or a really short walk from when you get out of the plane to the conveyor belt, where you get your luggage. And people were complaining that they have to wait 20 minutes for their luggage. And what they just did is make a longer route. So, it could be a two-minute walk, but they made it just 20 minutes. So, people are not waiting 20 minutes, they're just walking around the airport for 20 minutes. And then everyone was like, "Oh my God, I don't just wait. My luggage is already here." And I love that example. But yeah, that's perceived performance in the real world for you. It was a great example. So, that's all the questions we have at the moment. So, Ankita, thanks a lot for joining us here today. We're going to be taking a short break and if you have any more questions, Ankita, we'll be on here special chat. So, click the link in the timeline below. Ankita, thanks a lot for joining us and sharing your knowledge!

Ankita Masand: Thank you.

Ankita Masand: Thank you so much.

Mettin Parzinski: Bye bye.

Ankita Masand: Bye bye.

Ankita Masand
Ankita Masand
32 min

Check out more articles and videos

Workshops on related topic