Hard GraphQL Problems at Shopify

Rate this content
Bookmark

At Shopify scale, we solve some pretty hard problems. In this workshop, five different speakers will outline some of the challenges we’ve faced, and how we’ve overcome them.

Table of contents:
1 - The infamous "N+1" problem: Jonathan Baker - Let's talk about what it is, why it is a problem, and how Shopify handles it at scale across several GraphQL APIs.
2 - Contextualizing GraphQL APIs: Alex Ackerman - How and why we decided to use directives. I’ll share what directives are, which directives are available out of the box, and how to create custom directives.
3 - Faster GraphQL queries for mobile clients: Theo Ben Hassen - As your mobile app grows, so will your GraphQL queries. In this talk, I will go over diverse strategies to make your queries faster and more effective.
4 - Building tomorrow’s product today: Greg MacWilliam - How Shopify adopts future features in today’s code.
5 - Managing large APIs effectively: Rebecca Friedman - We have thousands of developers at Shopify. Let’s take a look at how we’re ensuring the quality and consistency of our GraphQL APIs with so many contributors.

164 min
01 Dec, 2021

Comments

Sign in or register to post your comment.

AI Generated Video Summary

The Workshop covered various topics related to GraphQL at Shopify, including the Nplusone problem, batch loading with lazy resolvers, contextualizing GraphQL APIs with directives, optimizing GraphQL for mobile, managing large GraphQL APIs, and handling deprecated fields and nullability. The speakers shared insights and solutions to these challenges, such as using Active Record Preloading, GraphQL Batch library, and custom directives. They also discussed the importance of measuring performance, pagination design, and parallel querying for mobile optimization. Overall, the Workshop provided valuable information on improving GraphQL performance, API design, and versioning strategies.

1. Introduction to Workshop

Short description:

Thank you all for joining us for the workshop. Over the next three hours, five different speakers will discuss the problems they've tackled with GraphQL at Shopify. Each session will be 20 minutes long, followed by a short break and an hour-long Q&A session. Feel free to ask questions in the chat throughout the workshop.

So, once again, thank you to everyone who's been able to join us here. I truly appreciate your time. We all do, all our speakers, all of Shopify. Thank you very much for being there and joining us for the workshop.

So this is how the next three hours will look like. It will be a bit different from what we are used to, but I hope it's good different. So hopefully you can also share that feedback with us at the end of this, but we are going to have five different sessions by five different speakers. All of them are engineers here at Shopify. And all of these sessions will actually be talking about had problems that we've had to tackle with GraphQL here at Shopify. So we hopefully hope that that will shed some light and you're going to get some insight into how the teams were able to look at the problems and you know, the various methods they use to tackle the problem.

So each session or each talk will be 20 minutes long. And at the tail end of that, we are going to have a break, a short break, a 10 minute break, you know, and then after that we are going to have a one hour office hours, or a mail session, if you may wish. So if you do have questions, just keep on dropping them here on chat, we'll make sure we take note of them. And at the tail end of it during the last hour during the office hours, we are going answer your questions. But also if you have a very specific question to any of the sessions as they roll, they are pre recorded, but our speakers are here live, you can definitely get to ask your questions on chat, and they're going to take care of that. So I hope that is fine. Sounds fine and makes sense.

2. Introduction to Jonathan's Session

Short description:

Our first session is by Jonathan, a staff developer at Shopify, focusing on customer storefront applications. Jonathan is here to clarify any questions during the pre-recorded session. Let's get started!

So without much further ado, I think our very first session is by Jonathan. Jonathan is here. Jonathan, I don't know if you want to take just a moment say hi to you know the people who are joining us before we get to a session on? Hello, everyone. I'm Jonathan, as you will see in the talk I'm a staff developer here at Shopify, focusing on our customer storefront applications. So that's from a GraphQL standpoint, that's the storefront API, which is one of two of our public GraphQL APIs and welcome to be here.

Fantastic. Tell us one thing about you that you don't know probably let me make it much more closer home, what's your favorite or your current read or favorite blog or newsletter right now? Shopify's engineering blog. All right, thank you so much, Jonathan. So Jonathan is our first speaker, as I mentioned, all our sessions are pre-recorded but feel free to drop your questions as the session runs. Jonathan is actually here to just help clarify anything just in case we get to that. So let me see. I will share my screen here, share sound as well. All right, fantastic. Let me see. Give me a thumbs up, actually, if you're able to get the sound up here.

3. Jonathan Baker on the Nplusone Problem

Short description:

Hello, everyone. My name is Jonathan Baker, I'm a staff developer here at Shopify and I've focused on features for building custom storefronts like the Storefront API. Today I will be talking about one of the most common performance-related concerns that anyone a GraphQL-powered API will eventually run into, the infamous Nplusone problem. GraphQL at Shopify started all the way back in 2015 when we had our first related commit. Since then we have accumulated over six GraphQL schemas for both internal and external APIs. Those schemas are comprised of over 5,000 types and mutations and more than 1.7 million merchant storefronts and 60,000 partner applications are powered by them. In total our GraphQL endpoints handle an average of one million queries per minute. You can see GraphQL is very important at Shopify and we've become heavily invested in it. That is the mplusone problem. What is it? In a nutshell it is what happens when you need to present a list of objects and each object has an attribute stored in another model. These usually are completely separate from the parent model and in a different data base. Let's take a look at example schema and see how these problems can easily manifest themselves. So how do we fix this? Well, Rails has another bit of magic up its sleeve called Active Record Preloading. By another modifier to the end of our article query, Rails will automatically load all the unique authors related to the result set. And because it knows ahead of time who those authors are, it can do so efficiently in only one query. Now, there's another difference in the logs beyond just the number of SQL queries, the response time. Now, you may be saying eight milliseconds doesn't seem like a big difference. However, to put that in perspective, let's assume our GraphQL endpoint was not multi-threaded. It can only handle one request at a time. This is actually pretty common. That 8 milliseconds is the difference between handling 4,600 requests per minute and over 12,000 requests per minute. That's a huge deal. Well, what happens when we make a query where we don't care about the author? Hm, that's weird. We're still requesting authors even though we're not using them.

Hello, everyone. My name is Jonathan Baker, I'm a staff developer here at Shopify and I've focused on features for building custom storefronts like the Storefront API. Today I will be talking about one of the most common performance-related concerns that anyone a GraphQL-powered API will eventually run into, the infamous Nplusone problem.

However, before we begin, I thought it would be a great idea to give you a bird's-eye view of what GraphQL looks like at Shopify. GraphQL at Shopify started all the way back in 2015 when we had our first related commit. Since then we have accumulated over six GraphQL schemas for both internal and external APIs. Those schemas are comprised of over 5,000 types and mutations and more than 1.7 million merchant storefronts and 60,000 partner applications are powered by them. In total our GraphQL endpoints handle an average of one million queries per minute. You can see GraphQL is very important at Shopify and we've become heavily invested in it. At the scale performance is of the utmost importance so let's talk about one of the biggest causes for slowdowns. That is the mplusone problem.

What is it? In a nutshell it is what happens when you need to present a list of objects and each object has an attribute stored in another model. These usually are completely separate from the parent model and in a different data base. Let's take a look at example schema and see how these problems can easily manifest themselves. As you can see we have a fairly straight forward schema here. We have an article type which has a title and an author and then we have an author type which just has a name. Finally at our query root we have an articles field which provides us with a list of articles optionally limited to the number that we'd request. Now, behind-the-scenes like I said articles and authors are stored separately in the database and they're linked by their corresponding IDs. This is a pretty common technique known as database normalization.

So what happens behind the scenes when we query for a single article? You may notice a few things from these application logs. First, we're running a Ruby on Rails-powered application, which is the standard at Shopify. Because of that, we have a powerful object-relational mapping framework, also known as an ORM, called Active Record. Now that detail is not that important in this talk since n-plus-ones can show their ugly head regardless of what language or framework you use. But I will try and explain the nuances here specifically for those not familiar with Rails or Ruby. Second, you can see two distinct SQL queries being performed here. The first one fetches a set of articles, in this case only one, denoted by the limit clause. The second one fetches an author record by its ID, and that ID came from the article's author-ID column. This is a little bit of Rails magic behind the scenes. That all made sense and seems relatively harmless, though. So what happens when we ask for more than one article? This is where we get our first glimpse at the problem at hand. Let's take a look at what has changed. Our article's query has not changed that much, except for the limit clause now asking for five instead of one. However, as you can see, for every article we're being resolved, we're now performing a separate SQL query to fetch its author. This is why it's called an n plus 1, or some of you a 1 plus n. One query for the list of parent objects, and then n queries, or in this case, five, to fetch each child object. As you can imagine, if we increase the number of articles requested, the number of author queries also increase. Also as a side note, you may notice the word cache appearing next to the sum of the author queries. This is one of the things that comes with Rails and ActiveRecord. By default, a queried cache is maintained which prevents unnecessary trips to the database when asking for a specific record by an ID. This feature prevents some of the pain from being felt by the n plus 1, but is not always available. For example if every article had a unique author, the cache would never have a chance to be used. It also doesn't really help us in scenarios where the child record is not stored in the database. It might be stored somewhere else, like a local file, or a remote HTTP server. So how do we fix this? How do we make sure as few database queries are being made as possible? Before we can talk about solutions, we need to understand how the GraphQL query is being resolved behind the scenes. Again, we're using Ruby, and for our GraphQL logic, we're using the GraphQLRuby opensource library, which we actually use at Shopify and contribute to from time to time. Most GraphQL implementations work based on the concept of resolvers. A resolver is a function that knows how to take a parent object and return a certain field or attribute of that parent. By default, the GraphQLRuby implementation simply calls the method named the same as the field on that object and because we have an active record object that lines up perfectly with that schema, it all just works. However, at the root, we do not have a parent object so those resolvers must be implemented manually as you see here. Following the GraphQL query step by step, we first visit the article's resolver which takes the limit argument that we give it and returns a set of article model objects. From there, each article object is given to the article type implementation and its field are resolved one by one. The title simply defaults to the title property of the model object. When it comes time to resolve the author field, a bit of Rails magic happens behind the scenes to automatically query the database for the related model. Then, that result is passed to the author type and so forth. It is this magical, automatic query to fetch the related author that seems to be the root of our problem. So how do we fix this? Well, Rails has another bit of magic up its sleeve called Active Record Preloading. By another modifier to the end of our article query, Rails will automatically load all the unique authors related to the result set. And because it knows ahead of time who those authors are, it can do so efficiently in only one query. If we take a look at the logs, we can see now that there's only two trips to the database being performed. One for all the articles, and one for all the related authors. Now, there's another difference in the logs beyond just the number of SQL queries, the response time. Now, you may be saying eight milliseconds doesn't seem like a big difference. However, to put that in perspective, let's assume our GraphQL endpoint was not multi-threaded. It can only handle one request at a time. This is actually pretty common. That 8 milliseconds is the difference between handling 4,600 requests per minute and over 12,000 requests per minute. That's a huge deal. So we're done, right? Well, what happens when we make a query where we don't care about the author? Hm, that's weird. We're still requesting authors even though we're not using them. To understand why, we have to go back to how resolvers work.

4. Batch Loading with Lazy Resolvers

Short description:

By default, we start at the top of the graph and work our way down. The GraphQL Ruby library has a cool feature where you can look ahead to see if certain fields have been requested, and then based on that do things conditionally, like preload child objects. GraphQL Ruby's lookahead solves the problem of knowing when to preload but makes our resolvers too dependent on each other. There is a better way called batch loading. Batch loading utilizes lazy resolver execution, where instead of returning a value, it returns a promise. This promise's execution is delayed until the entire query has been resolved. It can reference shared state tied to the lifetime of the query.

By default, we start at the top of the graph and work our way down. When we're in the resolver for articles, we have no idea what the fields are being requested below us. So we have to greedily assume the worst-case scenario. The result is a potentially wasted SQL query, which has its own effect on the overall request time. Remember five seconds ago when I said we don't have an idea of what fields are being requested below us? Well, I kind of lied. The GraphQL Ruby library has a cool feature where you can look ahead to see if certain fields have been requested, and then based on that do things conditionally, like preload child objects. By looking if the author field has been selected, we can conditionally add preloading to our query. Awesome. Now we're only loading what we need. This of course works, but there is a slight codesign or philosophical issue at hand here. Should this resolver really care about what other resolvers do? You can easily get into a mess where resolvers are referencing fields and if we want to give a change, we can. You can see here that we are loading 10 levels deep, not to mention a circular dependency. So let's recap our options. We have add to record preloading, which is pretty magical and easy to add. However, it is greedy by default since we don't know if it's required downstream. It's also important to know that this does not help us if the child object is not stored in the database. GraphQL Ruby's lookahead solves the problem of knowing when to preload but makes our resolvers too dependent on each other. There's got to be a better way, right? Spoiler alert, there is, and this is something that we call batch loading. Batch loading utilizes yet another awesome feature of the Ruby library, lazy resolver execution. The idea is rather simple. Instead of returning a value, it returns a promise. This promise's execution is delayed until the entire query has been resolved, just in time before the result is ready to be returned to the client. Another key feature of these promises is that they can reference shared state tied to the lifetime of the query. So what does this mean for our sample app? Let's take a look at how we might use a lazy resolver to solve it.

5. Batch Loading and Preventing MPlusOnes

Short description:

In this part, we explored the authorLazyLoader class and how it helps in batch loading related resources. We also discussed the GraphQL Batch library and its benefits in solving relationship loading issues. To prevent MPlusOnes, Shopify employs techniques like unit testing and staging environments. It's important to optimize roundtrip requests and have tooling to recognize unoptimized resolvers. Nplus 1s can occur in other APIs as well, not just GraphQL. Thank you, Jonathan, for the insightful talk. We encourage the audience to ask questions in the chat for the upcoming Q&A session. Next, we have Alex, an engineering manager at Shopify, discussing contextualizing GraphQL for pricing in different countries.

OK, we have a lot going on here, let's go through it all. On the left, we have our article type and author resolver. On the right, we have the new class called authorLazyLoader, that's a mouthful. We will return the property of the class to the author and return a value. We will call the instance of this class as our promise instead of fetching the author immediately. The lazy loader is initialized with two things, a context and an article. The context is a shared state that I mentioned earlier. It is available to all resolvers and its life cycle is tied to the execution of the query. In this context, the lazy loader stores a hash, or a dictionary containing two sets of IDs. Ones of ones that we have been requested to load and IDs that we have already loaded.

Each time an instance of this lazy loader is created and returned, we're building up that list of author IDs that we need to fetch. When the time comes for the promise to be fulfilled, the Ruby library calls the author method, which we talked about when we defined our schema. You can see that in the bottom left. In this method, we check our shared state to see if we have already loaded the specific author that we're being asked for. If we don't have it, we go ahead and query the database for all the authors that we have gathered up so far. Finally, we return the author that was found.

Now, let's take a look at the logs and see what happens. This looks like identical to when we used Rails preloading, which is good, however, it really shines when we perform the query without asking for the author. Awesome. Now the author records are not loaded at all and the article resolver is none the wiser. What's even cooler about this approach is that it makes loading related resources from other databases, Redis' external services, or even a file system just as easy. You can batch the requests to be optimized as possible. Now at Shopify, we think batch loading in GraphQL is so cool, in fact, we built an open source library around it, aptly named GraphQL Batch. This library takes what I showed you earlier to a much more refined, reusable, and powerful level, and is the de facto standard within Shopify for solving any sort of relationship loading headache within our schemas. With this library, we can very easily create a generic, reusable batch loader to handle many different situations. Here are a few examples of batch loaders that we have used in the past at Shopify, one for loading active record relationships, and another for fetching from the Rails cache. Using the ActiveRecordLoader class, our author field resolver could look something like what you see on the right.

Now, we have taken a look at what MPlusOnes are and how we solved for them with a few approaches, but how do we make sure that they don't happen again? How do we stop extremely unperforming queries from shipping to production and ruining everything? Well, the answer is much harder than it seems. At Shopify, we have a few techniques for helping to prevent MPlusOnes from causing headaches for us and our merchants. The first of them is called unit testing. You may have heard of it. Every type and mutation in every schema we have has a matching unit test for testing resolver business logic, a requirement that we enforce via linting on our CI environment, is that every schema type must also have a matching test case for checking for preloads. In our sample schema, we would have a preload test for the article type. Now this test simply calls a single line, a helper method provided by our shared internal testing library. The implementation of this helper iterates over every single field the type has and looks for any unbatched database, Redis, or other known unoptimized code path that is taken within the resolver's execution. Now this required test is not foolproof unfortunately. Because it is automated, there are certain fields that we cannot test, like ones with required ROrgs. For example, those types of scenarios are handled when dedicated test cases are hopefully added through our manual codereview procedures. Beyond unit testing that we can do in static environments, we also leverage production-like staging environments where much larger GraphQL changes can be tested and monitored for regression. Using database-level instrumentation and metrics, we can see spites in performance and then hopefully trace them back to the individual query and request, and hopefully finding the suspect code path. To wrap up, we've learned that Nplus 1s can severely impact query execution performance, which at scale can mean the difference between handling 1,000 requests per minute or 1,000,000 requests per minute. You should leverage several techniques to optimize and limit the number of roundtrip requests to your database, or to minimize them if they are in excess. Having tooling in place to recognize unoptimized resolvers is also critical to ensuring performance is maintained over time. Also, finally, know that while we focused on GraphQL today, and our examples were written in Ruby and Rails, Nplus 1s can happen in other places too. Traditional REST APIs, or the concept of REST APIs, have a model relationship. This can happen. That is all for today. Thank you. Thank you so much, Jonathan. That was amazing. Again, I'd just love to reiterate for our guests, and thank you for joining us, for those who joined us later. Feel free to drop in your questions on the chat function. We are going to take a few questions, and then we will also see who can do our next speaker. We are going to take a note of that, and in the third hour, we will have a proper AMA with all the speakers, including one more of the speakers, who is going to give a lightning talk next week. He has been able to join us. So keep your questions coming. Make sure that if you have anything that needs to be clarified, drop it on chat. So without further ado, I think we will move on to our next speaker, who happens to be Alex. Alex, do you mind going off mute, saying hi to everyone, as I get your talk ready? Hi, everyone. My name is Alex. I'm an engineering manager at Shopify on the pricing team. We've been working most recently in GraphQL contextualizing the storefront API to show pricing for different countries. So that's what today's talk will be about today. And I worked on this with Lana, who's also here with us today to help answer questions and such. Fantastic. Lana, do you want to say hi too? Hi, everyone. I'm Lana. As Alex mentioned, we were working together on contextualizing GraphQL first off from... And yeah, I'm based in Ottawa, so reach out if you want some coffee or have any questions about directors. Yeah, so I'll pass this to Alex.

6. Contextualizing GraphQL APIs with Directives

Short description:

Hello, everyone. My name is Alex. I'm an engineering manager at Shopify. Today, I'm super excited to speak with you about a challenge we faced at Shopify's scale, contextualizing, GraphQL APIs. We'll discuss the background, the problem we faced, and the four different approaches we considered: headers, arguments, and the viewer approach. Ultimately, we chose directives. I'll explain what a directive is, provide examples of custom and built-in directives, and discuss how we can create our own. At Shopify, we have the Storefront API, a GraphQL API that powers custom storefronts. As requirements changed, we introduced multi-currency and international pricing, leading to the need for a more contextualized GraphQL API. We explored using headers and arguments but found limitations in their ability to support API versioning and ensure a consistent customer experience. The viewer approach showed promise but fell short. That's when we discovered directives as the solution. Now, let's dive into the details of how directives work and their benefits.

Alright, thank you so much, Alex and Lana. If you have questions, our speakers are here. But for now, I'll just share my screen again and get the session up.

Hello, everyone. My name is Alex. I'm an engineering manager at Shopify. And today, I'm super excited to speak with you about a challenge we faced at Shopify's scale, contextualizing, GraphQL APIs. So today, what I'm going to talk about is first the background and the problem that we're trying to face, four different approaches that we use to try and solve this problem, and ultimately, the one we landed on, directives. I'll speak with you about what a directive is, examples of custom and built-in directives, and how we can create our own.

So before I jump into it, all good GraphQL talks start with a query. I'm going to give you a little bit of a background on how we got here. So at Shopify, we have several APIs and one of those APIs is called the Storefront API. It's a GraphQL API that powers custom storefronts. And one of the most basic queries you're going to make when you're showing a storefront is showing products, such as a t-shirt, their variants such as a size or color of that product, and then their price, what the customer is going to pay for it. And this is a very basic query. Here we're looking at that and we're returning the product, the variance, and the price of $7.00 USD. No problems just yet.

But, as with all great software platforms, our requirements are changing over time. So Shopify introduced multi-currency, where customers could purchase products for different currencies based on what they want to pay. We evolved our GraphQL API to include a new field called presentment prices. Here you can see we have the product, its variance, and the presentment prices field, which takes an argument of presentment currencies. So in this case, I'm fetching the prices, but for Australian dollars. This works as you would expect. We're returned the response that has the same amount, but converted to Australian dollars. This is really useful if you can imagine a custom storefront that has a currency selector where you can select what currency the customer wants to purchase in. Again, it's relatively simple, so we haven't had too many problems just yet.

And, as you can imagine, our requirements change yet again. Shopify introduced a new feature called international pricing. Under this model, we're no longer showing prices based on a currency, but rather where in the world someone is located. For example, you could have a customer in France who's paying 10 euros, but a customer in Italy who's also paying in euros might spend 15 euros on the same product. And this could be based on different factors that the merchant wants to control, like margin, or any kind of cost center that they need to account for. So we're starting to see this problem where we have a GraphQL API that once started simple, but is introducing more and more requirements, and our experiences are becoming more and more contextualized to where the buyer is, what the buyer needs to see, and so on. So we needed an API to support this. A little bit more context here is it wasn't as simple as us needing to just show prices. We need to show price, we need to show compare at price, which is something like MSRP, we need to show price ranges, what the minimum and maximum prices are for product and other fields that need to be aware of where the buyer is located. We also knew that we were moving towards a model where, again, our experiences are becoming more and more contextually aware. For example, we want to contextualize a storefront based on a customer's preferred retail location so that we can show information, such as, how much is available in stock for them to pick up in-store. So now you get the problem of what we needed to solve for a more contextualized GraphQL API based on the buyer experience. I'm going to dive into the different approaches that we used to try and solve this problem.

So first we have headers. Here you're looking at headers that we could pass to contextualize the API. We knew that we were already passing an access token header for API clients to get access to the fields that are requesting, so we thought they could go ahead and pass another header with the country as an argument. It would be a pretty light lift because API clients are already sending these headers and we'd be able to fetch this value and return the correct response based on that. The pros here, all of the fields are going to resolve to the correct country price with that currency, there were no big changes in the API schema so existing clients could pick it up. But on the other side, one of the reasons why we love GraphQL so much is that it's a typed system. And so there's no way for us to proactively validate that the country code that is being sent in the header is an actual valid one that we support. It also doesn't support our API versioning and deprecation system. We introduce APIs every three months, and there would be no easy way for API clients to know whether either a new piece of context is introduced or one is taken away as we've developed new features. So it becomes very hard for us to manipulate over time. For that reason, headers did not seem like the play.

The next approach we took was arguments. This is what I would say is the simplest solution that could possibly work. We could add a country code argument to each field that needs it. So here you can see we're fetching product. It's variance again, but in this case we have a price with an argument of France, a compareat price with a country argument of France, and a price range with an argument of France. The pros here are that it's super explicit. There's definitely no confusing about which fields are going to return which values. We also benefit from it being in our schema definition and having those validations. So we know that we can only send a country code argument to these fields. We also have the API versioning support that I mentioned before, we're able to evolve this over time as we introduce either new arguments or remove them. But the cons here are that it does open a possibility for API clients to add different values to different fields. We really wanted to make sure that we only powered a consistent customer experience, and it seems a little icky that you could be able to show price for France, but a compare at price for Italy. Plus, as we introduce more fields and more context, such as the buy online, pick up and store, you're going to have to add those arguments to every field that needs it. Here you can see the example of how that might not work so well. You have the price for France, a compare at price for the US, and a price range for Italy.

Our next option that we considered was a viewer approach. This is something GitHub uses in its GraphQL API for showing authenticated user information. In this approach, we have a top-level field called viewer that takes a single country argument for France, and then the rest of the fields are resolved underneath it for product, variance, and the prices. This is trading in the right direction. Now, when we resolve this response, we're returning a consistent experience that have prices only in euros, only for France.

7. Directives for Contextual GraphQL Queries

Short description:

In this way, country is passed once and available to all fields. We landed on the solution of introducing a new directive called in context. It sets the country once and is supported within our schema. Clients can easily add this directive to their queries without complete rewrites.

In this way, country is passed once, and it's available to all fields, and again, it's supported in our schema. So we have all of the benefits of adding removing arguments over time. The cons here is that maintaining another root field is a little bit cumbersome, and duplicates a lot of existing behavior in our API. But in addition to this, we've only been talking about queries, but any other mutations that need to return these contextual responses also have to be updated so that they return a viewer that returns this information as well.

We're getting really close here, and we're starting to see a pattern of what we want, and this is how we landed on directives. This is the better way. So the solution we landed on is introducing a new directive called in context. Here you can see we're querying products in France with an at in context directive that takes a single country argument of France. Similar to the viewer, then all of the other fields below it we have product variants and their prices. And their resolve to a single country price and single currency. So the pros here are the country is set once and available to all fields. It's supported within our schema, and it's also really a lot easier for clients to pick up right away. They don't have to do complete rewrites of their queries. They only have to add in this in context directive and benefit from there.

8. Using Directives in GraphQL

Short description:

Directives provide a way to describe alternate runtime execution and type validation behavior in a GraphQL document. They can be used to change the behavior at runtime and are categorized as runtime directives and schema directives. Runtime directives, such as include and skip, modify the behavior of the GraphQL executor. Schema directives, like deprecated, annotate the schema and provide information to clients and tools. Custom directives can be used for various purposes, such as translations, theming, feature flagging, access control, and transformations. Shopify's GraphQL Ruby gem offers a transform directive for string field manipulation. Building custom directives is straightforward, and they can be attached to the schema with a resolve method defining their behavior. In summary, we explored the problem of contextualizing GraphQL APIs, the different approaches we considered, and how directives emerged as a solution. Directives provide flexibility in modifying GraphQL behavior, and they are a valuable tool to consider when needed.

Now that we see directives for our winner, you might be asking, what is a directive? So let's jump into that. Directives provide a way to describe alternate runtime execution and type validation behavior in a GraphQL document. What this means is a directive is describing some behavior that you want to have and the GraphQL executor updates the result based on this additional direction or information. GraphQL implementation should provide skip and include directives. So there are already two directives that are built in that you may have seen or used. Let's jump in on how to use those to give you an idea of what a directive looks like. Here we have a query product that has an include price argument that is a Boolean. We're fetching the product, the variance, an ID, and we're optionally including this price field if that include price argument is true. What this looks like when it's resolved is if the include price value is false, then price is not returned in our response. However, if include price is true, the value is returned in our response. So in this way, the include directive is giving a direction to the GraphQL executor on how it should respond to the query. We also have the built in skip directive, which is the same as include, but with the opposite behavior. These are what we call runtime directives, meaning they edit the behavior at runtime. We also have another type of directive called schema directives. One directive you might be familiar that's a schema directive is the deprecated directive. Here you see an example of a product that has an old field that is labeled as deprecated. It provides a reason, use new field instead, and then we have a new field as well. What the schema directive is saying is it's annotating the schema to give us a little bit of information about how to use it, and this is really helpful because we have tooling such as in graphical, whenever you have a deprecated directive on a field, a little yellow underscore shows up along with a tool tip that says, hey, don't use this. So these are the type of directives that we have. So how might you use custom directives and what could you use them for? A couple examples could be translations or theming. Feature flagging or access control, and transformations. A gem we use at Shopify called GraphQL Ruby, for example, provides a transform directive that allows you to transform a string field to either be uppercase or lowercase and so on. So it's giving additional directions on how the GraphQL executer should resolve those fields. Now that we have an idea of what a directive is, why don't we go ahead and build one? So for an example, let's look at the requirements that we want to do for this directive. We have a query of a product page, and now we want to introduce a theme directive that has currently two variations, a dark variation and a light variation. And when we resolve this product page query, we're going to return different background, foreground colors, and a different image URL based on this theme. So when I pass it the dark variation, the directive is going to make sure that the resolvers respond with dark colors and then dark place holder image. And when I pass in the light variation, it's going to respond with lighter colors and the light placeholder image. At Shopify, many of our applications are using Ruby and Ruby on Rails, and we also use a gem called GraphQL Ruby, which we contribute to from time to time to power our GraphQL schema definitions. So the examples I'm going to walk through now are in Ruby using the GraphQL Ruby gem, but the approach can be applicable to a lot of the other languages and frameworks that you have. So here we're defining a directive called theme, which inherits from the directive class provided to us by GraphQL Ruby. It takes an argument, in our case, one of the theme variation enum types, and it's required. It's also given a location. We haven't touched on this yet, but directives are defined at different locations. For example, the in-context directives that we introduce to solve our problem was defined at the query level because it's the topmost level that the directive is applied to. Another example of a location, is the field level, which is where the include and skip directives are defined. Here, this directive is applied on the individual field level so that those fields are resolved rather than the entire query which our theme wants to resolve. And then once we've defined our directive, we attach it to the schema by just giving it the directive declaration. In GraphQL Ruby, you have to also define a resolve method which tells us how the directive is going to behave when it's resolved. And the resolve method takes three arguments, an object, the arguments to the directive, and context. What we're doing in this directive is we first wanna fetch the variation of the theme that we want to use. We're going to find the theme based on this variation name. And then we're gonna set the value in context so that it can be used for any resolver that needs it. Then in our types, here we have our product type with the background color, foreground color, and image fields. And then we also have our image type with the URL field. And in GraphQL Ruby, in order to resolve these fields, all you have to do is define a method that corresponds to that field name. So to use our theme directive, what we have to do is first we fetch it from the context that's been provided to us. And then we find the theme for that product and return the correct background color that's contextualized this directive. For the URL, we can do the same thing because the value is in context, we can resolve it here too. We fetch the theme from context and then we resolve the image based on this. And this works a lot like how the in context directive does. We find the country code argument, we set the value in context and then any field that needs to use that context can do so. So in summary, what did we learn? First, we introduced the problem of contextualizing our GraphQL APIs as they evolve over time, and we learned that when we're solving a difficult problem, we should try different approaches to weigh the pros and cons. It's something I encourage you all to do because directives didn't stick out as our first option. In fact, it was the very last option we explored, and it was good that we looked into these different options to land on the one that fit our requirements for the time. We also learned that directives in our case work because they gave us a way to change the behavior at runtime. We built custom directives and learned how easy it is to do. It's another tool in our toolkit that we should consider when the need arises. Thank you so much for listening. I hope you found this talk helpful, and I look forward to your comments and questions. Thank you so much, Alex and Lanna, we appreciate you. I think the next person on stage is Theo. I'm trying to keep it going so that we have enough time at the tail end for questions during the AMA hour. Theo, let me see if you're on. If you are, please unmute, turn your video on. And I'd love you to just say hi to the people who've joined us as I get your talk ready. Hi Marcy, hi everyone on the stream. Yeah, glad to be here. My name is Theo, I'm a dev manager at Shopify. I lead a mobile team here working on the main product and I'm based in Ottawa.

9. Optimizing GraphQL for Mobile

Short description:

Today, we will cover GraphQL performance on mobile, measuring and interpreting performance, pagination design, querying strategies for new fields, and parallel querying. Optimizing for mobile is crucial due to slower and unreliable networks, scalability and faster shipping, and user scrutiny on loading times.

Fantastic, good stuff. So let's just jump right in. I'll share my screen again. Hi everyone, my name is Theo. I'm a dev manager at Shopify. I lead the team of iOS and Android developers. And today I wanna go through some of the GraphQL querying strategies we discovered through the years in Shopify. And the first thing I want to show you is the free query management. In Shopify mobile. GraphQL is a fantastic language for mobile clients. Specifically because of type safety and performance. It has a fantastic synergy with said clients. So if you haven't tried it already, I encourage you to do so.

What are we gonna cover today? So GraphQL performance on mobile, aka the why are we doing this and some of the intended outcomes we're trying to achieve through querying strategies. We're going to go also to measure performance accurately and interpret the results. Design with pagination in mind and querying strategies for new fields such as when you're rolling out the feature. And we'll wrap up the talk with parallel querying.

So why optimize on mobile? It seems very obvious, but first reason is to optimize for slower and unreliable networks. We don't have the luxury of always having a stable network or maybe sudden drops when people go into the subway, for example. So we need to leverage every bit of querying layer whether it's through cache or just performance to make sure we can always have something to display or at least try to come close to that ideal. Also it allows us to scale and ship faster if you already establish a strong pattern right from the start on your querying layer, this allows you to not worry too much about new fields to load on every screen. So this should result in scaling and shipping faster on your app. Also you have a bigger user scrutiny on loading times usually on mobile just because users are consuming content on a faster rate because of the screen size and they end up scrolling, searching, interacting more. So this is also important in terms of UX approach as well.

10. Measuring TTI and Querying Strategies

Short description:

Now let's discuss TTI (Time to Interactable) and its importance in measuring user interaction on mobile apps. TTI combines querying time and rendering time, and establishing thresholds for each product is crucial. Factors like loading times, network performance, rendering time, device mode, and view lifecycle impact TTI. Pagination is an effective querying strategy for infinite lists, ensuring good performance and scalability. Safely rolling out new fields in a mobile app requires considering multiple live versions and avoiding missing data issues.

Now that we've talked about why it is important to improve the querying layer, let's talk about who we're gonna measure our games. So what we use at Shopify is called TTI for it stands for Time to Interactable. It defines the moment where we start loading the screen to when the first time the user can interact with it. So essentially it's querying time plus rendering time. We have some strategies on mobile to make it a little bit smoother and disguise some of the loading times. We're just for the sake of the engineering approach we're gonna take today, excluding some of the UX possibilities. We're just gonna talk purely about right from when you instantiate your screen view controller fragment to when it is actually rendered and interactable.

So when we're talking about TTI, we need to establish some thresholds for every product and it's different. You might not expect the same thing depending on the content you want to display in your app. People might have a better tolerance to loading times when it comes to larger content, such as videos, images. For text, it might be a little bit different. So this is something to keep in mind as well that through the examples we're going to get, you need to interpret that in terms of your own product and app.

We also want to build accurate inventing and there are a lot of factors that can make your TTI wrong when it comes to interpretation. So we're going to go through this. So first one, talking about threshold, this is a simple model where we score from one to zero based on how much time it took to load the page and make it interactable. So we get one, if it's under one second, we get 0.5 if it's between one second and two seconds and zero about two seconds. Based on that average, this is the query score we get. So this is how you measure pretty much every screen. Again, the thresholds are very different for each product. So you need to make sure to keep in mind that highest scores are for what represent the ideal. I advice you to be very aggressive. So at least you have the more, the more aggressive you are usually, the more nuance the query score is gonna be and the bigger room for improvement you're gonna get. Again, here we have only three thresholds, but you can have more as well in your product.

What about performance context? So we talked about cellular networks. This is important to take into consideration when measuring TTI. We need to also think about the type of performance we're looking for on every type of network, ranging from 3G to WiFi. This is something important to take into consideration as it will impact your TTI score interpretation. Then rendering time versus query time. This is also a big thing. You will probably improve your querying layer as the time passes, but it's also important to keep an eye on the rendering time and differentiate those two when it comes to the TTI tool in terms of average. Then again, device mode. Not every device might be as performing. So that will directly impact rendering time for some, especially if you're displaying complex UI and animations. Then view lifecycle as well. When exactly are you calling the start event from and the end event? How often are you doing it? Implementation might vary between clients, between the way you measure it on the web, iOS and Android. So this is also something to take into consideration when implementing the instrumentation.

Let's jump into our first querying strategy, which is pagination. It's well-known, but we're going to cover it really quick. So this is the ordered list, which is one of the most important screen in the Shopify mobile app. I've worked on this screen for a few years, and we are displaying an infinite list of orders that are necessary for merchants to interact with, whether it's to fulfill or manage. And in this example, we have our first page of 25 orders. So again, it's an infinite list as you scroll down. When you come close to 25, this is when we're going to trigger page two. This allows us to have very good performances as we don't have to worry about how many orders on each shop we're gonna be able to support. So this is what it looks like from a GraphQL perspective. You have two parameters on the orders endpoints. So page height, which is gonna be 25. And also specifying after which page do you wanna load. So first one is gonna be UCD optional on string. It's gonna be for page ID, sorry. Which is gonna be null the first time you call it because you start from page zero. But pretty much after that, you'll grab the page ID you get when calling the second page. And we also have the field hasNextPage under page info, that tells us if there's a next page to load to avoid getting the screen frozen at the end when you're at the very last page. It's as simple as that. So I really encourage you to push back back inside to make it scalable when you're dealing with infinite list. Or even finite list, but it will always make the whole implementation faster. And also, you don't have to worry so much about scalability if let's say, your business model changes, and now instead of a fixed list of 50 items, you have an infinite number or way bigger number. But you can't really support on mobile clients. So again, infinite list with great performances can be achieved through pagination. And it's also doing it right the first time, then you've got auto scalability pretty much forever, unless the every items fields are getting too large, and then you'll need to rethink either your UX or your query.

So querying strategies for new fields. Here we're gonna talk about safely rolling out new fields in the mobile app. And we're gonna take a look specifically at two different approaches, compare the pros and cons. So you can roll out new fields through the add, include, add, skip tags. We're gonna take a deeper look at what the GraphQL example and query looks like in this case. You can also duplicate queries and yeah, let's compare the impact of each approach. So something to keep in mind where you're dealing with mobile client is that you have multiple version live at the same time in production. In this example, we're gonna add a new field called toll price in version 1.1, remove it in version 2.0. The problem here is that if the version 1.1 is still living in production by the time we removed toll price, we're likely to have either a crash, a case for missing data, or other unexpected issues. If a field's missing when you're querying from a GraphQL perspective, the query is likely to fail.

11. Optimizing GraphQL Queries

Short description:

To optimize GraphQL queries, you can use the skip/include approach or duplicate queries. The skip/include approach is easy to implement and scale, but unsafe for work in progress. Duplicating queries is safer for work in progress and immune to field changes, but requires more effort and can become complex. Parallel querying can significantly improve performance by splitting fragments into separate queries and running them in parallel. This approach scales server-side and makes mobile clients faster. However, it requires initial investment and a new querying layer on the client side.

Otherwise you'll have to put extensive fail save from a mobile client perspective. So this is the toll price we're rolling out here. I've taken an example of the field you see on the right, which is gonna be the toll price of each order. And what we wanna do in the first approach is to use the add include tag. We'll have a flag, which is Boolean call price enable as primary in our query, and this is our fragment right now for every row you see, every order included if the Boolean flag is true. If the Boolean flag is false, then a price will be null pretty much. This makes your field nullable, and it's very easy to include when you wanna flag something and prevent it from being loaded in your GraphQL query.

The second approach you can take is duplicating the whole fragments. So right now I have renamed the base fragment and call it order details with price. I include it price directly without any type of flag here. The flagging is gonna happen whenever you're querying. So I created a second query for that, which calls this new fragment. And at the execution time, pretty much you'll select each query based on the proper flag. So what are the pros and cons of each approach? So skip include, we've seen it. It's super easy to implement. It's pretty much a one layer type of job. On the duplicated query side, it's a little bit harder. You have to duplicate your query. You also have to include maybe a harder flagging system. So yeah, bear in mind that we took an example of a very simple query, but as it grows, it might be more difficult to duplicate. Skip include, big con. It's unsafe for work in progress. If the field gets renamed, moved or removed altogether, again, you're looking at a crash because of versioning. And duplicated queries, then it's safe for whip because technically in production, if you've done your work with your local feature flag, then you never query for it. So we know it's never in production unless we're ready. On skipping include side, you have an easy rollout. Same thing for duplicate queries. I think if you've done your work right with your local flag, it shouldn't be too hard. Skipping include is easy to scale. You can have multiple skipping include within the same query and rely on different flags. or duplicated queries, it's a little bit harder. It's exponential because you'll need every query possibility with every feature, meaning that if you have two flags, for example, instead of having two queries and duplicated those, you'll need four, you'll need one where both flags are true when one where flag one is true, one where flag one is true and one where all flags are false. So this makes it a little bit hard and yay, if you have three or four of this becomes even more complicated. So this might work only when you're shipping one feature at a time.

Skipping include it breaks with any field change pretty much we mentioned that. If you move a field rename it, delete it from a backend perspective, then yeah it might cause a crash or yeah just missing data from a client perspective, depending on who you're handling errors. Duplicated queries on the other side, it's immune to rename and deletion. So if you have very large teams, like we have a chop fight, it might be also the best path forward to make sure that your client is not gonna be impacted in production.

Now that we've talked about rollouts and pagination, we can have a look into parallel querying. So we're gonna keep our same example, but this time we're gonna be using the detailing screen for each order. So in the Shopify mobile app, if you ever use it, it's very big. We load a ton of information time line, the payment card, the list of line items, which again can be very large, the list of fulfillments, the list of returns and so on just to mention a few of them. The issue we had pretty much with this implementation is that because the query kept getting bigger and bigger, we're facing major loading times and we already witnessed that on wifi. So we were getting worried as the app at this point might be wrecking under certain scenarios with a slower network. We don't believe that it was acceptable to make our merchants wait for five seconds upon each loading time, especially because this is one of the most used feature in the app. So the approach we took this is the initial query we have. Again, it's not completely accurate. We have way more fragments than that. So each fragment was pretty large and we're loading many different things at the same time. The approach we took now from this initial query is to split every fragment per query when possible. So instead of having the example we have here which is five different frameworks within the same query, we went for five query with one fragment each. Then we run it in parallel. So it's all a sync. We don't chain it. And then we assemble a few statements once we've collected enough to display. It looks like this pretty much. Again, every fragment's been broken out into each query and the result is that let's consider that we started from a two milliseconds total loading time. Or new loading time is the highest among each query. So that also gives you a perspective of each query is the longest. And maybe start thinking about partial rendering so that you can even go lower in terms of time to interactable. Again, time to interactable is not completion in terms of all the queries but when the user can start interacting. So this might offer you a different perspective as well. So here we go directly from two seconds to 800 milliseconds, which is a major gain, but you can go even lower and you can split again, more and more every fragment into multiple queries.

To summarize this approach, you scale server-side so that mobile clients don't have to and this results in making mobile clients faster. Parallel querying moves the load server side. We believe it's a better approach just because scaling server side is easier than scaling on the client side. The problem with this approach is that mostly it requires investment at the start when you're transitioning to it. On the client side, you'll need a whole new querying layer. It will require likely extensive tests and probably a way to mock the chain of possible querying events such as Intempestive failures, partial success, it might be a little bit hard, but it's definitely worth it. It helps to balance and distribute the performance drops on each end point as well.

12. Efficient Query Distribution and TTI

Short description:

Instead of blocking the entire thread with a gigantic query, distributing the efforts can balance server loading. Syncing with the backend first ensures no performance issues. TTI is determined by query loading time and rendering time.

So instead of blocking pretty much the entire thread back inside with a gigantic query, you distribute your efforts. So this might make actually the performance in terms of server loading, a little bit more balanced on the backend side. Again, it really depends on each end points that you're querying. So if you go down this road, the way we've done it at Shopify, at least on the mobile team is to sync with backend first to make sure that this wouldn't cause any performance issues on the backend side. And your new TTI, like we mentioned, is the highest query loading time plus the rendering time. So it's being shaped quite a lot. Especially because query loading time in our case was way, way bigger than rendering time.

13. Greg McWilliam on Building Tomorrow's API Product

Short description:

Thank you for listening to me and feel free to reach out if you have any questions or want to discuss parallel querying. Greg is our next speaker, and he will talk about building tomorrow's API product today. At Shopify, schema releases are on a quarterly schedule, while features are developed on longer time horizons. Meta-fields are custom key-value pairs that can be attached to major core resource types. Adding a new parent type creates challenges in representing future values in schema releases. Strategies like unimplemented placeholder types and simple nullability can help address these challenges.

That's pretty much it. Thank you for listening to me and feel free to reach out if you have any question or if you want to discuss parallel querying, mostly is a fairly new approach that we've been trying, we are likely to refine it. I encourage you to comment on it and yeah, thanks again. I see claps all around. Thank you so much Theo. That was amazing. I'd just love to reiterate, if you have any questions, feel free to drop them on the chat function. We are going to have a proper AMA at the tail end during the last hour of this workshop and all our speakers will be there to answer your questions. And like, if you'd like, we'd also bring you on stage if you'd love to ask your question in person, but I think let's move on two more talks if I'm not wrong, Greg is our next speaker. And before I load up your talk, Greg, feel free to go off mute, say hi to everyone and then we'll kick off.

Hi, everyone. I'm Greg McWilliam, I'm on the content and data extensions platform team working on bringing extensible data to Shopify core and look forward to sharing some thoughts and notes with you. Fantastic, all right, thank you, Greg. Greg is still around, he'll be there even if the talk is pre-recorded, so feel free to engage him on chat, but let me share my screen now and kick us off.

Hello, GraphQL Galaxy. My name is Greg McWilliam. I'm a staff engineer with the content and data extensions platform team. And I'm here today to talk about building tomorrow's API product today. So we're probably familiar with the pattern of release early, release often. We do this all the time. So code enters the main branch and when we're building APIs, that translates into a new schema release. So we deploy domain that updates the application, updates your schema, and we have a new cutover of a new schema to go with the new application code. We might do this multiple times a day, day after day, and so do this twice on Wednesday, once Thursday. We don't deploy on code Friday because you know, weekend. At Shopify, we're more along the lines of release late, release right. So what I mean by that is our schema releases are on a set schedule. So we do quarterly schema releases. Once a quarter, we'll have a new officially supported public schema that comes out. And by comparison, our features are in active development on very long time horizons. So we might actually start putting a new feature onto main branch, say at the beginning of the year, 21. And that feature might take until, you know, fourth quarter of 21, before we actually release it. And when we do release it, nothing says that we actually make it available via the API. It could just be released as an admin only feature so you can use it inside of our admin screens, inside of our liquid APIs, but we don't actually provide it through a public GraphQL API for potentially additional quarters until we get everything ironed out and are ready to support API access to it. So this creates an interesting situation with, we have application data, which is moving at a different pace than our schema. So let's talk about that. And to do that, let's look at meta-fields.

So what are meta-fields? Meta-fields are custom key value pairs. They can be attached to many major core resource types such as a shop, a product, and order. And meta-fields allow native types to be extended with custom attributes defined by the merchant. So if you're a merchant selling hockey sticks, this is how you would go and put a flex rating on your hockey stick products, set a meta-field for it. So, meta-fields belong to a finite collection of all native parent types in any given schema release. What happens if we then add a new parent type? So we're seeing an example schema here. It has a key, a value for a meta-field. And then this parent resource we're saying is a shop, a product, a product variant or an order. And this is what we've actually released to the public. Now, behind the scenes, we are doing internal development and we update our application to now allow customer to be a meta-field's target. So this creates an interesting situation. Our database now holds new parent values but those values have no representation in public schemas. So our database holds product, parents order references. Now, all of a sudden our database may also hold customer references. However, older schemas don't actually have the customer type included in the metafield parent resource. So if we were to try to take one of these newer values and represent it in an older schema, we end up with one of these runtime object errors where customer is not a possible type for metafield parent resource. So this is tricky. We need strategies for representing future values in schema releases. An unimplemented placeholder type is one option. So, with an unimplemented placeholder, where they, we build something called unimplemented resource it has nothing but an ID because we know we will always have an ID field on new objects that we make available. And so this unimplemented resource is simply a placeholder, which says new values have a way to be represented in older schemas. You won't be able to interact with them in a meaningful way. However, they will not produce errors. This is one option. There is a simpler one. Simple nullability can limit the impact of these potential errors. So, before we were looking at parent resource was a not null field. It was guaranteed to be present. And so just making this nullable, it doesn't necessarily clear up the error that we saw before. However, it does localize it and localizing it is much more useful in the long term for long term development. So actually, let's talk about nullability because this is a pretty important one for planning out schemas and supporting them long term. For me, nullability does not necessarily promise that a value exists as much as it promises that the value will resolve without error. Case in point, but we're just looking at, we might know that a meta field always has an owner type, but we may not necessarily be able to resolve it without error in all schemas, at which time nullability helps. So, nullbubbling was really responsible for some of the conflict that we had.

14. Nullbubbling and Nullable Fields

Short description:

Nullbubbling occurs when a low level field violates its nullable contract, causing its parent object to be invalidated. This invalidation chain can bubble up to the root, resulting in the loss of the entire payload. To address this, we can strategically place nullable fields to localize errors within the graph. Not null values are suitable for leaf values guaranteed by a database constraint, while nullable fields provide flexibility for future schema development.

What is nullbubbling? Nullbubbling says a low level field violating its nullable contract invalidates its parent object. So in this example that we're looking at here on the money type, let's say that the amount decimal field resolves as null for whatever reason. So the parent is invalidated into another null value, which may then violate the nullable contract of its parent and this invalidation chain may bubble all the way up to root. So that one amount decimal that turned out as null could go up and invalidate its parent and that invalidates its parent. And we go up and up and we get something that looks sort of like this. Null bubbling may invalidate all resulting data in response to isolated field errors. So we had that one error on the amount field at a fairly low position inside of the documentary. And yet what we end up seeing in the resulting data is that we lose the entire payload because that one low level error bubbled all the way up the tree. Everything was promised as being not null and then everything was invalidated as a result of that. So this isn't ideal. And so what we would like to do is halt the bubble strategically. Bubbling stops upon the first nullable scope that's encountered. That means strategic placement of nullable fields allows resolution errors to be localized within the graph. So in this case, if that amount decimal were to is null, we could at least contain it like this, by allowing null fields, errors are localized without blocking resolution of the remaining document. So here we've seen that same error occurred on the amount field. However, instead of invalidating the entire document, the only thing that we lose is the price range field, which had that null value bubble up a couple of layers to go and invalidate it. But it still allowed us to resolve the parent product and the root query that we're after. This is definitely better. So in general, not null values are most appropriate for leaf values that are guaranteed by a database constraint. Hard coded namespace constructs. These would be objects that we're building inside of a resolver that we know we're going to be able to produce. And then codependent objects where one successfully resolved guarantees that the other will also resolve. Otherwise, nullable fields provide flexibility for future schema development by leaving nullable holes in places where we're not sure what's gonna happen here. It gives us a lot of ability to innovate in the future and kind of work around these flexible contracts as opposed to being locked in by rigid guarantees for data.

15. Scoping Metafield Values

Short description:

Metafield values can be represented as strings or structured formats. By implementing a dedicated scope, we can maintain the singular intent of the value field while allowing for future growth and additional access paths.

Let's also talk about scoping. So continuing with the meta fields example for a minute here, meta field values are strings. However, additional accessors may represent these values in structured formats. So looking at our meta field type, it has a key, it has a value, it is a string, and that string may be, hello world, or it could be this GID of a Shopify product reference. And so in the case that this is holding a reference, we actually build this reference object type here. And this is the thing that allows us to get a rich selection off of the metafield references itself. So instead of just getting a GID string that we fetch metafield, then we have to go and make a second request to go and get the actual object that it references. We're able to get that at once. We can get the GID as the value, and then we can also make a rich selection off of the shop product, product variant, or order that's being referenced. So that's useful. As new structured value types are implemented, though, these access patterns may become confusing. So this is a scenario that we're kind of noodling on right now. None of this is final, but we are thinking about how we implement metafield values as arrays. And so we know that we know on some kind of array connection that we would give you, and then that might be confused with what value is as a plain value. And then, of course, reference is still floating around. In general, there's a lot happening here. And we're starting to lose track of, you know, just how do we actually go and get the value off of this as we add these additional access paths. And so a dedicated scope could help here. The singular intent of the value field is maintained while opening up value implementation paths. So here, the metafield value is a singular field. We always know how to get a metafield value because there's only one value. And then that value object is a dedicated scope where we can go and get specific implementations. And the scope is kind of structured in a way that it anticipates future growth. And so in general, giving scopes room to grow is useful for future development.

16. Rebecca Friedman on APIs and Infrastructure

Short description:

Let's consider two mutations as a simple example. The second mutation has room to grow, allowing for structured errors and additional artifacts. Anticipating future region types, the one-of pattern is a versatile solution that works today. Versioning applications independently from schemas requires constant planning and consideration. Nullable fields offer future flexibility, and intermediary object scopes provide space to grow. Thank you all and enjoy GraphQL Galaxy!

Let's consider these two mutations as a really simple example of this. Which one of these has room to grow? So in the first one, we've got market create one has a input that it receives and then it returns the market object that was built versus market create two here, receives that same input. And instead of returning the market object directly, it returns a market create payload and that payload offers the market object as a field. So the second one might seem more verbose and just needlessly heavy on schema. And yet the advantage of the second one is that it has room to grow. We might end up in the future realizing that we want to give out errors that are specific to this transaction. And this gives us a place to give structured errors. We might want to give out instant artifacts that are built in conjunction with the creation of a market we again have a place to do that. If we were just giving out the market directly then we sort of lose the ability to go and make the results of this transaction smarter over time. So this kind of goes into also thinking about how we do inputs in the longterm. How do we structure our code in a way that we can receive inputs that grow with our application. And so, we were talking about markets in that last example. So let's keep talking about markets. What are markets? A market configure settings for selling internationally. It configures currency, language, domain, et cetera and a shop can have many markets and strategically configure the settings per region targets. So each market may target one or more geographic regions together and market regions are countries but this may eventually vary by grain to extend down to provinces and zip codes or upwards to things that are larger than a country like blocks. So what we need to do is anticipate the submission of future region types but we don't have or need input unions because we have one of patterns. So what we're trying to do is we wanna say today we are receiving country codes and a market region is only a country. It will only receive country codes but we know in the future that we might wanna submit province codes. We might wanna submit state codes. We might wanna submit zip codes. And so we want to design a structure for input that will anticipate scaling with this need. So this one-of pattern is great because we say we're gonna set up one of these inputs. We can put as many unique input keys as we want on it. And then we just write a validation rule that says this input will respond to exactly one of these types of input. So today we can be creating markets that contain nothing but countries but in the future we could go and mix and match a country plus a province plus a zip code could compose a market targeting strategy. So why we like the one-of pattern. It is more versatile than the proposal of input unions. Members may include input objects, enum values, scalars or lists. The real beauty of it is it works today using plain GraphQL structures. Today, we can enforce this pattern using server-side validations. And in the future, if the one-of spec is adopted then this would become a GraphQL level validation and we could even remove the server-side validations that do this type of thing. You can read this back, check it out. We use it a lot. It is one of our go-tos for designing inputs that anticipate future growth.

So all said here, all this is in the mission of making commerce better for everyone and that everyone includes ourselves, product engineers. So we want to go and build the best experience for other developers and for ourselves coming in to innovate and make this able to be flexible and scalable for future use cases that have not yet been conceived. So to recap, versioning applications independently from schemas is hard, it takes constant planning and many watchful eyes. We have very large review panels and a lot of just consolidation of opinion before we go and commit to a new direction that we go in. These APIs, they need a lot of love to go and be steered in a direction that will be valid for years to come. Nullable fields are more appropriate, more often than we would expect, and they offer future flexibility when incorporating new features. We err on the side of making things nullable because we don't necessarily know what the future and the field will need to do in the future. And then adding intermediary object scopes provides space to grow into both an input and in output contexts.

So thank you all. Enjoy GraphQL Galaxy, and we'll see you later at the panel session. Awesome stuff. Thank you, Greg. That was fantastic. I see a lot of comments, a few questions in there. We are going to get to that during the AMA. We only have one more of the five topics remaining so let me move to that last, definitely not the least, Rebecca is going to also be our speaker here. Rebecca, I don't know if you're on. Feel free to unmute, say hi to everyone, and then we'll kick off your recording. Cool. Hi, I'm Rebecca Friedman. I'm a dev manager at Shopify. I manage our APIs and infrastructure group, so focused on both push and pull. So I'm living in the UK. My job is to manage both push and pull APIs. I'll talk more about what our team does in the video. And I'm based in Toronto, Canada, and actually someone from my team is also here. So Guy, why don't you introduce yourself? Yeah, sure. My name is Guilherme Piera. I work with Rebecca on the API finance team. I don't have a talk in this workshop, but I have a lightning talk on December 9th. Yeah, I invite you to have a look at it. I'll be talking about other uses of GraphQL besides HTTP APIs. Awesome. Thank you. Thank you, Guy.

17. Rebecca Friedman on Managing Large GraphQL APIs

Short description:

Hello everyone. My name is Rebecca Friedman and I'm a dev manager at Shopify. Today, I'm gonna talk to you about some of the work that our API patterns team has done to make it easier to manage large GraphQL APIs. We identify and establish best practices for GraphQL API design, but we also own some foundational code that empowers developers to design and build those APIs. We review the GraphQL schema changes and assess the quality of the new types and fields being added. We also review the schema implementation, ensuring no missing tests and assessing the overall API design. To streamline the review process, we built a GitHub app called GraphQL Schema Bot that leverages GitHub Actions to identify and flag common GraphQL issues in schema changing pull requests. This allows us to focus on the overall design and not get distracted by manual tasks. Our app performs a series of actions, such as linking the schema, creating GitHub issues for updated descriptions, and integrating a linter as a check in our CI pipeline.

Thank you, Rebecca. Let me share my screen just now. And this will be the final of the five talks and then we'll kick off our A meeting.

Hello everyone. My name is Rebecca Friedman and I'm a dev manager at Shopify. I currently manage our APIs and infrastructure group. And today I'm gonna talk to you about some of the work that our API patterns team has done to make it easier to manage large GraphQL APIs. So I'll take you back in time and talk through some of the hurdles we've faced and how we've overcome them.

We mentioned before that there are over 5,000 types, enums, inputs, unions and scalers in our schemas. That's a lot. And with all the developers at Shopify and there are many of us, that means that the APIs are evolving quickly and that different pieces of the API are evolving simultaneously. So the mission of our API patterns team is to advance API best practices and to make them more user-friendly. And evolve foundational infrastructure for API creators. So we do several things, we identify and establish best practices for GraphQL API design, but we also own some foundational code that empowers developers to design and build those APIs. We have our own layer of abstraction on top of the GraphQL RubyGem that provides convenience methods for common code patterns like defining and validating global IDs.

So how are we involved in API design and development? Well, we don't usually build APIs ourselves but we sit at the intersection of a lot of different teams who build APIs. We aren't gatekeepers of the APIs, we're stewards. So we trust you to build the API yourself but we're like hovering in the background to make sure that your API is well designed and consistent with our existing APIs. We're also proactively looking for trends in how APIs are being developed and the types of design decisions that teams need to make. And then we use those to establish and formalize best practices. From the start, we've been providing feedback to developers through technical design like doc reviews but also through code reviews.

Our team has a dedicated ATC person each day who is responsible for providing support to internal teams. ATC is Air Traffic Controller and our ATC will escalate or triage issues that come up and review PRs and then keep an eye on operational dashboards. When we reflected on what specifically we were reviewing in PRs, we noticed that there were multiple distinct things. So, first, the schema design. We reviewed the GraphQL schema changes and we use those to assess the quality of the new types and fields being added. So, are they consistent with other types already in the schema? Do the field names make sense? And then second, we review the schema implementation. So that's the Ruby resolver code and the code structure of any new concerns or modules added. We make sure that there are no missing unit or other integration tests. And then third, the whole feature holistically. So, we're assessing the overall API design. Is this a good solution to whatever problem is being solved? What is the experience that a developer will have interacting with this API?

Here's a screenshot from a brainstorming session where we listed the things we were looking for during schema design reviews. So you can see lots of interesting things here. We're concerned about input structure and field structure, the names of mutations and the names of types and the quality of descriptions for everything, if fields have the right types, and especially if they're using custom scalars when they should be. If people understand polymorphism and if they're using unions and interfaces correctly, and the ever-present question of how are people using nullability? So reviewing PRs seemed like a great way to make sure that the APIs were evolving in a direction we were happy with, but there were a few issues with it.

So you can see from our brainstorming session that we had all these mental checklists that we were running through. So these were like all these manual repeated tasks we were performing, which is like an indication that there could be an opportunity for automation somewhere. And then the quality of the reviews were inconsistent across the team, sometimes just due to the tenure of the reviewer on the team, right? So someone who'd been on the team a while had like stronger opinions or could kind of see hints in PRs that were leading to underlying issues. And then PR review was only part of the ATC role. So if you were flooded with other like high priority work, the PRs were relegated to a quick check that was more like, are you doing something terrible here? If not, it's probably fine to ship. The next problem was that our teams didn't scale at the same rate that Shopify scaled. So more developers meant more PRs for each ATC to review and like the volume was growing out of control. I think there was also this like self-imposed pressure to work off the whole queue of open PRs during your ATC shift. And then at the same time to like not miss anything important in all those reviews. So that made the ATC rotation stressful. There's also always the best factor, which is what if someone gets hit by a bus? There was so much tribal knowledge on our team about how to design good APIs. And we were disseminating it through PR review so that the tribal knowledge like grew out of years being stakeholders for other teams. And then as folks would like leave our team or get hit by a bus, which is the same thing clearly, we would likely see a drop in PR review quality. And then the last point, I think the tipping point was that our team was unhappy because we knew that we were missing bigger design concerns because we are using our time to chase down these more basic issues.

So here's what we did. We built a GitHub app that leverages GitHub Actions to identify and flag common GraphQL issues in schema changing pull requests. So that's like a bit of a mouthful but basically any PR that adds new functionality to the API or changes existing functionality, produces schema changes. And we commit these schema changes in the PR along with our code changes. So there's the clear opportunity there to leverage the schema as a replaced review the whole design. So, Oh, and our app is called GraphQL Schema Bot. So whenever a PR is opened, we perform a series of actions. We link a schema. And then if there are any descriptions being updated, we create a GitHub issue for the technical writers so that they can review them. We calculate some like stats about the PR. What schemas are they changing? Because we have a few. Are they making incompatible API changes? How many changes are they making? Stuff like that. And then we integrate the linter as like a check in our CI pipeline. So we have two checks. The first one is if there are any incompatible changes being made, we require the PR author to manually acknowledge this. So they would have to hit that approve changes button in the screenshot. So that makes them aware like one last time of the impact that their change could be having on the developer community. And then the second one is if there are limp violations, we annotate the PR. What we're doing is like abstracting away all the tasks that a computer can do better than us. And we're filtering out the noise so that when we do review PRs, we can focus like on the overall design and not get distracted. So here's a list of all our Lint rules.

18. Analyzing GraphQL Linter and Design Tutorial

Short description:

The GraphQL Linter at Shopify has 16 rules, with five added in 2021. It aims to direct developers towards the best GraphQL API design, but allows for extenuating circumstances and limits false positives. The rules ensure consistency and encourage critical thinking about schema design. Examples include banning UI terms in field names, grouping fields with common prefixes into separate types, and defining mutation return fields with error handling. The GraphQL Design Tutorial, published in 2018 and regularly updated, provides 23 design rules and emphasizes creating sub-objects for closely related fields. The tutorial also discusses the structure of mutation payloads and the importance of error handling.

It's a pretty long list, but I'll point out a few interesting things. Firstly, there are 16 rules here. In 2021, we added five of these. So our Linter doesn't change that much. And I think that's partially because there are no like controversial rules in it. We have three levels of Limp violations. We have warnings, failures, and then notices. So we hardly like ever fail PRs when we're Linting. Most of the time, we're just adding suggestions. And you can see that here. I've indicated if something is a suggestion or a requirement. The only schema designs we require are you need to have descriptions that are like full sentences. We have a specific format we expect for mutation return fields. And then you have to use uppercase letters for enum values. So we use Lint failure sparingly because even though the Lint rules are meant to direct you towards the best graph QL API design. There can always be extenuating circumstances. So we're trying to limit false positives. And we also like don't want to hit people with too many errors because they don't really like that. And also something else about the rules. They accomplish different goals, right? So some of them are just to create consistency in our APIs. We need the uppercase enum values because we want enum values to all look the same. And then there are other rules that are meant to make you like stop and think critically about your schema design.

Okay, so here's a simple example which is a banned user interface terms Lint rule. So we all know that naming things is hard. So we created a deny list of UI terms like button, popover, toaster, modal. And we make sure that they don't appear in type or field or input argument names. So let's take a look at our product type here. Most of the fields look great. The issue here is that we don't actually want a field named URL for modal. And that's because the front end might wanna display this thing in some other kind of UI design in the future, right? And then like if it's all of a sudden a URL for button, are we gonna change the name to URL for button? Like, probably not. So it's not really the back-end concern whether this URL is displayed in like modal or a button or whatever. And there's our helpful lint warning. We tried to be a little bit smart with this one as well. So we have a configurable list of denialist terms like button, but then we also have an allow list in case there's like a valid reason that a schema needs to use a term from our denialist in their schema design.

Okay, so that was a pretty trivial example but let's talk about a more interesting one. This is the common field prefix lint rule. So in the bad example, which is one on the left, we have three fields in our shop type that share the time zone prefix. So we have time zone abbreviation, time zone offset, time zone offset minute. And that's a pretty clear indication that these fields could be grouped together into a type of their own. So that's what we do in the good example. We make a time zone type. So the shop has one field called time zone of type time zone and the time zone type has abbreviation offset and offset minutes in it. And this lint rule is really just to like nudge you and make you think like, are we missing an object type here? Should there be a time zone object? And it's a notice because I think this one probably has most false positives. So this lint rule is actually straight out of our publicly available GraphQL design tutorial. That's the link. And our design tutorial was published in 2018 after the team had gained some experience and formed some pretty strong opinions about designing GraphQL APIs. So it's grown and evolved over time, including amazing translations from folks in the community. And then the most recent update was made about a month and a half ago. So we're still actively iterating on it. The design tutorial presents a list of 23 design rules, and then it uses the example of building a commerce related API to work through them because that's what we know about. So let's look at how the design tutorial articulates a sentiment about sub object. I highlighted some parts in green. It's really just like reiterating what we said. So you should create a new type that groups all the closely related fields, and your type does not need to have a great, like it does not need to have a direct model equivalent. And I like that because that's like a really great extension of rule number two, which is to never expose implementation details in your API design. Okay, since there are two different lint violation failures related to mutation return fields, I think it's worth digging into those too. Assuming that a GraphQL schema has like a product create mutation, which I'm not showing. Here's the schema for the associated payload. So you can see here that our payload returns two fields, a product name and an array of errors. So it doesn't really make sense to have top level errors for user failures. So we include them as part of the response. And then we require that mutations return an errors field, which internally we actually call user errors, whose type implements an error interface. And then the errors field type has to be a non-nullable list of non-nullable objects. And the error interface has to implement like a non-nullable string for the end user error message, and then a list of non-nullable strings for the error field path. And then the errors field type can also return, sorry, it has to also return a code field that's a nullable enum. So here's an example response. So on the left you can see a response for a successful mutation. It's returning an empty list for errors and it's returning the newly created products name for the product name field. And then an unsuccessful mutation would return one or more error objects and then null for the product name. You can see that our error code here is invalid, which isn't like the most useful, but you get the idea.

19. Automating API Review and Measuring Quality

Short description:

The GraphQL schema bot has been a valuable tool in enforcing API design rules and providing automated schema linting. By moving the linting process earlier in the development process, teams can validate their API design as they build it. Shopify has stopped reviewing PRs altogether due to the increasing number of PRs and the support load. Instead, they provide real-time support via Slack and Discourse and review PRs when explicitly tagged. The team also meets weekly to share learnings and identify trends in API development. Automating the implementation design review with RuboCop is the next step, and measuring API quality is an ongoing challenge.

Okay, so going back to this, then if we look at it again we can see on line two, the return type of product name is listed as non-nullable. And that would actually cause the lint violation failure since we require all mutation return types to be nullable. So we're very strict about our mutation return types. And we can look at the corresponding rules in the GraphQL design tutorial that inspired this lint rule. So you can see rule number 22, mutation should provide user business level errors by a user errors field on the mutation payload. The top level query error entry is reserved for client and server level errors. And then rule number 23 at the bottom, most payload fields for a mutation should be nullable unless there is really a value to return in every possible user, sorry, in every possible error case. That one's a bit of a tongue twister. So our design tutorial is great, but it's also quite long. So having the automated schema limping is a really nice way to remind folks about some of the ideas there.

So we built our GraphQL schema bot to be extensible and we've made a few changes over time. I mentioned before that the list of LIMP rules has expanded. We also extracted the core linting functionality into an internal gem. So that gem contains like a command line interface that folks can run from their dev environment to perform linting locally. So we realized that we were adding friction to the development process by being like this last minute barrier to shipping PRs. And like folks don't appreciate hearing that their API design should change when they feel like they're ready to ship already. So by moving the ability to lint their schemas earlier in the development process, they can validate their API design as they're building their API or they can build an API first, which is even better. So here you can see, they're exactly the same lint violations that we just looked at, but they're in CLI formatting. So I'm really thrilled with how solid our linting tool is and how much better it's made our team's ATC rotation. But as Shopify continued to grow, the number of PRs continued to grow as well. So even without like the distraction of commenting on all the small things, we still couldn't handle the support load. So in June of this year, we did something really crazy. We stopped reviewing PRs altogether. So to contextualize this for you, we were trying to review hundreds of PRs a week like as a part-time job for one person. So the numbers just didn't make sense for us. So we're seeing less PRs with our no-review approach, thank God, but we're still keeping a pulse on the GraphQL APIs that are being developed just through other ways. So we're still working with teams one-on-one in the technical design phase. That's before they start actually building their APIs or any of the underlying functionality. We provide real-time support via Slack and Discourse. So we try to leverage Discourse as much as possible so that the conversations are documented and easily queryable when people are looking for answers. And we very happily review PRs when teams explicitly tag us for a review. So when we meet as a team weekly, we provide a space for everyone to share back learnings from the discussions they've had with folks on other teams. So this helps us identify trends in how teams are building APIs. It helps us identify new problem spaces. And then it's also a great mechanism for context sharing. Looking back at this whole journey, I think we probably waited too long to start automating. We were concerned about the quality of the APIs if we stepped back, but we didn't realize that the quality of our PR reviews might be dropping anyways. And we recently uncovered another area we can automate and we're actively working on it now. We're using RuboCop to detect issues in the Ruby code that are like invisible to the schema. So I mentioned earlier that we are focused on reviewing schema design, implementation design and feature design when we did the code reviews. We automated the schema design review with the GraphQL schema bot with our app. So now we're automating the implementation design review with RuboCop, which is cool. I also, I talked a bit about being concerned about like drops in API quality, but to be honest, figuring out how to measure the quality of our APIs is like a whole new problem. So I'd love to talk about this more. I can clearly talk about it for a while too. So let me know if you have any questions and thanks for listening.

QnA

AMA Session and Handling Directives

Short description:

Thank you to all our speakers for a fantastic session. We will now take a 10-minute break before starting the AMA. If you have any questions, feel free to drop them in the chat or raise your hand on Zoom. Rebecca will be moderating the AMA. One question from the chat was about handling multiple contexts with the context directive. The speaker explained that they chose to show a single experience in the Storefront API and used fields in the Admin API to handle different management views. Defaults for directives can be handled in the schema or in the code. The downsides of using directives include initial lack of library support and potential caching bugs. However, debugging has been relatively smooth due to upfront handling of edge cases.

All right, thank you so much, Rebecca. Thank you to all our speakers, that was fantastic. To all the participants who are remaining, thank you for hanging out with us. We still have one hour to go, we are going to take a 10 minute break. Let's have like a water break, a breather, stretch, stand up from your seat, take a walk for five or so minutes, and then let's meet back here at the top of the hour, I think in the next exactly 10 minutes. And then we are going to kick off our AMA for one hour. So the next hour is dedicated to a full-on AMA. All our speakers are here, including Gui, who's going to give a lightning talk next week during the conference itself on the 9th and 10th. So again, if you have questions, feel free to drop them on chat. Otherwise, you can also raise your hand right here on Zoom. I'll keep an eye on that, and we will just prompt you to ask your question live to any of the speakers who is present here. So let me see, are we ready? So Jonathan, Alex, Lana, Theo, Greg, Rebecca, Guy, and I'll be handing over this next one hour to Rebecca. She just gave a final talk. She's also the one who's going to moderate the AMA. So Rebecca, over to you. And yeah, if anyone has questions, put them on chat, we are also going to, you know, just be looking forward to any raised hand and we will call you on stage to just ask your question live. Over to you, Rebecca. Cool, thanks. So we're going to try not to jump around too much with topics because we talked about so many like very different things. I think it makes sense to start with directives because I know there was a lot of interest in that. So Alexander, I'm going to start with your first question from the chat, which was, for the context directive, how do you handle the case where multiple contexts are needed at the same time? For example, a management overview of prices in different countries? And this is for Alex or Lana if she's that. Yeah. I'll just hop into it. And I totally love this question. So the reason why we landed on directives is precisely because we didn't want to expose this behavior. One of our requirements in the Storefront API was that we only wanted to show a single experience. So we didn't want to open up a possibility where you could query different countries and price values for a management view. But it's also important to note that one reason why we love GraphQL is it's not a one-size-fits-all solution. So at Shopify, we also have another GraphQL API called the Admin API. And this is where we do all of our management behavior. And we recently released a new field to handle contextual pricing called Contextual Pricing. And instead of using directives, we are using fields here precisely for that reason, where we want to use aliases so that you can say, hey, give me all of the prices for the country of France. And then in another alias, give me all of the prices for the country of Italy. And so again, we evaluated our solution based on the use cases. So for the use case of showing everything at once, we wanted to go with a top-level query directive. For the use case of showing different management views, we used fields. And so again, we didn't have that one size fits all need. So we used different tools. Does that answer it or anything I can clarify? Nope. Yeah, absolutely clear. Cool. OK. So some other questions about directives. I don't think you really talked about how defaults are handled. So that could be interesting. Obviously, I think Greg talked a little bit about backwards compatibility. Teo as well, this is a big theme for us. So how do you make sure you don't break anything backwards compatibility wise? Yeah. So two ways to handle defaults come to mind for directives. The first thing is that, again, one of the reasons why we chose to use a directive in our case is that it's defined in the schema. So you can define defaults and you can be very explicit even if it's required, for example, and you didn't want to provide a default. Because we're using GraphQL, we can make those distinctions and we can enforce the API clients to behave in that way. But then, of course, let's say you didn't do that or you did want to fall back to a default behavior, such as if you're not including the directive at all, the second way you would handle defaults is in the code. So in our case, if no country is provided in the code, we simply just fall back to the default behavior and we have a way to handle it. So I think it's a two pronged approach of first, you can make sure your scheme is explicit to handle defaults or you can make your code also kind of have a fallback for this as well. Cool, that's very helpful. Thank you. The last question about directives, I think that's top of mind is you kind of you showed us the four approaches and you talked about how directives was like the clear winner there. What are the downsides of using directives and like how does that manifest itself I guess also about debugging things? Yeah, I think that there's a few ways to answer this. First, one of the downsides we had when using directives is it wasn't fully supported by the library at first. So it's not necessarily supported by all tools but one of the benefits of us building this is we got to contribute to the GraphQL Ruby gems so that we could kind of create the world we wanted to see. So building in that support was a challenge but one that we hope others will benefit from. Another downside is that, for example, you could experience bugs if you have caching. So you do have to make sure that you're thinking about any kind of argument that's coming in that you're being aware that any caching layer is invalidated. And then the third thing is in terms of debugging, you asked we didn't have too many challenges actually because we defined it in our schema. Again, we're getting a valid country code argument, we're making it required. So we're really handling all of those edge cases up front. So we don't really have a case where we're creating invalid state. So on the debugging side, it's been pretty good. It's actually been just getting all the support to work in all of our different libraries has been the biggest challenge and downside.

Teo on Splitting Queries and Directives

Short description:

The complexity of splitting queries into multiple queries manually, without automation tools, is due to the need to identify struggling endpoints and rely on performance measurements. Although there is a popular GraphQL Flutter library, the best library for Flutter application development depends on specific needs. If already in production, building a custom library may be the recommended approach. Additionally, allowing directives to be applied sooner in the GraphQL language spec could help address the pain points of schema changes and field validation for mobile clients.

But also, as I said before, with the previous question is it's not a one size fits all approach. So it worked for this case, but it won't work for all cases. Cool. Thanks.

Before we move on, Jonathan or Greg, do you have any... I feel like you both have very strong opinions about directives. Is there anything you want to add or we're good to keep rolling? I don't have anything specific about directives. Okay. Cool. I'll come back to you when we talk about nullability because I know you'll both have a lot more there. Okay.

So let's jump over to Teo. So just a refresher for everyone. Teo was talking about like fast performing queries for mobile clients. Teo, this question is also from Alexander. Is the split from fragments in a single query into multiple queries, something that's being done by hand or did you create a tool to automate this? Very great question. Thanks for the questions Alexander. Yeah. It's done by hand. I would say it's mostly for a few reasons. First one being that we need to be able to identify which endpoint is struggling back inside. So this is also a good candidate for a separate query, which is not something you can necessarily have with a tool client-side to do it. And it's mostly relying on performance measurements as well. Like where you're standing compared to the thresholds. Again, I think it's a little bit too complex to encapsulate into a tool at the moment. But this would be really good, I think, as we build with bigger languages on mobile, like Kotlin, Swift, and React Native. I think this would become like a very neat feature for GraphQL developers. So TLDR are too complex at the moment, but yeah, looking forward on this. Feel free to ping me. I'd say if you wanna work on it together. Cool. Yeah, I had somewhat expected that it would be too complex, but I mean, you have a great, looks like you have a great team there, so who knows what you cooked up, so.

The next question for Tao is from Santosh. Sorry if I'm mispronouncing it. Which might be the best GraphQL library for Flutter application development? Honest, no idea. I haven't done much Flutter. Though, between the time you asked the question and now, I looked a little bit into it. There seems to be a GraphQL Flutter library, which is quite popular. Yeah, just search on the Flutter portal for GraphQL underscore Flutter. You'll find it. Sounds like it's a good rating. I wanna say the answer to this question probably depends on your needs. And whether you're just trying out GraphQL, then just yeah, go with something simple. So you can double up an opinion on it. But if this is for something you already have in production and you're looking to switch, I think Best Answers probably the one you're gonna build. Mostly because the one I've seen, again, with my limited time are not generating models. So maybe this is something you'll like to do. It seems like injecting GraphQL queries through string variables is easily breakable. So yeah, I'd recommend maybe invest. If this is something you're looking to bring to a bigger scope, then definitely invest in it and probably come up with your own library. Cool, awesome.

We're gonna jump into another question from Alexander with all the questions today. Also for Teo, in the discussion about skip and include versus duplicated queries. Would the breaks with any field change be resolved if directives could be executed before a field is validated? This would allow skip to ignore the field and not break it if it doesn't exist in the schema. Alexander, is there anything you wanna say to like elaborate on that? I feel like it all just sounded weird coming out of my mouth, but I understand what you're trying to say. No, I think you did fine, no, it's, I think my interpretation of why the directives didn't work in the scenario, were that directives if I remember correctly are applied before, sorry, after all the fields are validated. So you can't apply a skip directive to a field that isn't in the server-side schema, which means that you can't use it to guard against schema changes. So I guess, like, could this be maybe a proposal to graphQL language spec, to find some way to allow directives to be applied sooner so that you can solve this problem, that you can say that the client can say like, I don't care if it's not there anymore, like just ignore it and pretend I requested this without the fields, like you would do if the schema was valid? Yeah, I feel that's a good question. Yeah, it's definitely the dream for mobile clients. I feel like this would bring a whole lot of value to mobile, not so much for web because you can like always directly like fix it, but it's good to like, make sure, okay, this field is like an immune to changes. Well, if it doesn't exist, get moves or anything, it falls into tech debt rather than firefighting. So like, yeah, this is, I think the biggest pain point for mobile with GraphQL at the moment, especially in large application. Like we're experienced at Shopify. So yeah, I'm hoping Skip and Include can become more than just introducing new ability and improving performances but also guarding about potential changes. Again, I don't think this is what the creators of GraphQL had in mind. Maybe it's not the right fit as well. That's a good question. I feel like we would benefit if not Skip and Include changing, then we could benefit from something in the lines of, okay, like I'm trying to query this field but please before executing it and resolving the query take it aside if it doesn't exist anymore. I think this would be very valuable to us.

Shipping Changes and Field Availability

Short description:

We're about to ship changes and introduce new fields in the orders screen, which is heavily used. While it would be valuable to have a feature that predicts query costs, it's unlikely to happen due to current usage. We're working on building a single frontend for different API configurations, which requires determining if a field is available and if the feature is enabled.

I think this would be very valuable to us. It's always stressful when we're shipping changes. Right now that's what we're about to do. With one of the features, we'll organize a beta for merchants on the new feature. And we're about to introduce new fields right in between BFCM and holiday season in the orders screen which is the most used screen in the probably, I think, around like between 1,500,000,000 per month at this point. So yeah, it's nerve-wracking and it would be way better if we could benefit from such an improvement. To answer your question, unlikely to happen, I think, because of the ways keeping crude is being used at the moment, but we can always dream and push for it. Yeah, it would be a feature that I would love to see as well. We are trying to build a single frontend for different configurations of a GraphQL API deployment. So one of the problems that we have to solve is to figure out, will the field be there, is the feature enabled on the server in this installment of the API? It's a feature I would like to see as well.

Query Cost Prediction and Deprecated Fields

Short description:

At Shopify, we use requested and actual query costs to predict and determine the complexity and performance of queries. We calculate the potential cost based on the number of fields and their complexity. To monitor query performance, we track it in production and check reports regularly. Parallel queries should be revisited based on performance tracking, rolling out new fields, and changing thresholds. We have an API health dashboard for partners to detect potential breaking changes and provide alerts and timeframes for adaptation. The GraphQL schema bot helps detect breaking changes, and we provide change logs and proactive alerts to partners. Deprecated fields are marked but not removed to facilitate easy migration between schema versions.

Okay, cool. A couple more general questions, Tao, about your talk. So you talked about your API implementation phase, you talked about the query cost. What are you using to predict the queries' costs? So specifically at Shopify, we have two things that are called requested query costs and actual query costs. So requested is the complexity, it's simple math pretty much based on the number of fields you're querying and the complexity. So if you're for example, asking for a page of 50 items which each item has three or four fields, you do, 50 times three or four, so that gives you an idea of what the potential cost could be at the maximum if everything you're requesting was returned and the actual query cost is what it costs in reality. So while we run this in our local GraphQL environment, we can predict already, is this a good candidate for example for parallel query. So that's what we use internally, but I'd be curious to know as well, what folks right now attending this call or later on if you wanna reach out are using to determine that. Yeah, so if anyone has anything they're using that they wanna share in the chat, that's cool. I'm actually gonna give Gui a second to speak here because Gui I feel like you wrote an engineering blog post about this that's also publicly available. Yeah, exactly, I published a blog post on the Shopify engineering blog exactly on how we calculate query cost. Yeah, so basically, the GLDR is, we have a predicted query cost and an extra query cost. For example, if you ask for a page of 50 records, but it returns only two. So the cost comes from 50 to two. There's also some different ways we treat scalars and also mutations and everything. And at the end we have a linear correlation between query cost, the extra query cost and the server response time. I have a chart on this blog post at the end. So we can correlate. So we can have a very, yeah. A fairly linear correlation between them. So yeah. I have a look at this, a blog post it tells how we calculate and how it ends up correlating with the server response time. Cool. And I just shared a link in the chat to the blog post so everyone can read it, get more insights. Last question for Teo about parallel queries. How often should they be revisited for a particular screen? So if you have strong instrumentation and you're tracking the performance of your query in production, then it becomes easier, I'd say like, yeah, probably check the report once per day for something critical, just to get an idea of when the performance are dipping down. Mostly you can align that with rolling out new fields, measuring the performance, but ideally that be done before rolling out something new in production. Just anticipating potential drop and see if there's room for an improvement as well. From time to time, if you feel like this is no longer in adequation with our threshold or like you're changing the thresholds because you have more aggressive goals, that sure is what's happening at Shopify Mobile. We went from satisfactory is two seconds load to now we're aiming for 0.5. So we have to redefine that and yeah that means also revisiting often what we're doing and monitoring. We have a whole dashboard with the most important screens ranked from how many times we're being viewed and we have a score pretty much that is based on daily I think and an average of three months per client iOS and Android. So yeah, keeping an eye out for it and as far as I know, yeah, it comes up at least once per week on the table for particular screens. Cool, that's awesome. Okay, I think we're gonna move away from mobile for a little bit. We have two questions in the chat from Dominic. This first one is for anyone who may have had experience with this. Has anyone used a GraphCDN? Does anyone have thoughts about it or experience with it? Okay, sounds like maybe none of us have ever played on it but I feel that everyone is about to click on the link that you shared and go check it out and form some opinions. So, post in the channel, in the chat if you have any comments about it. This next question also from Dominic is actually quite heavy so I'm gonna give it to Guy to start off with but Jonathan, I think you'll also have a lot to say as well. So, the question is, do you ever remove deprecated queries? How do you approach evolving the schema like query name B2 or B3 like a pattern like that? And in the context of removing queries, how do you monitor the usage? So, this is a lot. Yeah, it's a lot. Yeah, so far, removing deprecated fields. So, until now or maybe until the next few days, we don't have versioning for return types on fields, for example. So, sometimes we'll have like money V2, money V3, and eventually we need to like remove the original money. So that's how it works now. Deprecating this is not a big deal because the way our versioning system works. So we have new versions every quarter. And so, probably the unstable version won't have the money field anymore and it goes to a release candidate. And at this time, the release candidate is 2022.01. And when 22.01 is released automatically, this becomes the new stable release. Requests for old deprecated versions, for example, if you do a request for 2019.04, it actually returns a response on 2021.01, I think, like, the oldest available version. Even if you are not, if you're, like, behind the schedule, like, this window of available releases, of versions, is always moving. So eventually it can break if you don't have a look. So, but now we can actually version returned fields with the new Ruby, GraphQL, GM version. So we don't, we won't have a virtual orbitry, hopefully for a next, for the next applications. Then, how we make sure we don't break? So there are a lot of moving pieces and different teams working on this. First, we have the GraphQL schema bot that Rebecca showed us. So basically it detects when we have a break in change and removing a field is a break in change. So what it means, if the developer gets alerted about this, if it's on purpose, that's fine, you can proceed. And when it releases to, when the change goes to the release candidate any partner developing apps for Shopify, they have an API health dashboard where it detects if the current queries they are making will break in the future. So now the current release candidate version doesn't have this field anymore, so it detects that if your app will break in a future version, so we start getting some alerts about this and you are also given a timeframe, for example. This is upcoming on the next version. So if you want to get the latest stable version, maybe you have like three months to adapt, or if you want, if you're always using the oldest available you have some more time, so you have a clear dashboard with clear actual items to keep up to date and also all the partner advocacy team works on developer change logs for every release version, so you can keep an eye on those change logs for new version. So either way, you can check the change logs, but we are proactively alerting partners for upcoming breaking changes. Cool, yeah. So there was a lot in the question, there was a lot in that answer. Jonathan, are there areas you wanna dive more into? Sure, so this is actually something that I personally am focusing on a lot recently with the storefront API. Traditionally, we have been very, I don't know what the word is, but easy going with applications where we will market field deprecated, but we won't remove it, even though we do have version API simply because we want people to have an easy time migrating between schema versions. In an ideal world, that'd be great because people would see deprecated fields and they would stop using them.

Handling Deprecated Fields and Nullability

Short description:

We're planning to be more aggressive in removing deprecated fields due to the large code base we have. The V2 and V3 naming issues were caused by a limitation in our versioning process, which is now fixed. We track usage of specific fields through API health reports and internal dashboards to inform our decisions. Deprecated fields can be challenging to deal with, and we're exploring ways to improve the process. It's important to consider that one size does not fit all when it comes to versioning APIs. We communicate upcoming changes to developers and provide reminders closer to the dates. Input unions are not preferred over the current one of solution. Nullability has been a topic of discussion, and we have experience with it in distributed graph architectures.

In reality, that's not the case, unfortunately. We have a lot of developers who just frankly ignore them. And so I think over time, our code base has gotten so large, as we said like we have 5,000 types and so many schemas, it's just not feasible for us to keep around these deprecated fields forever. So because we have versioned APIs and versioned schemas like Guy said, we're gonna start pretty soon being a little bit more aggressive about removing deprecated fields for that reason.

Around the V2 and V3 naming things, that was as Guy said, because of a limitation that we had in our versioning of process, that is now fixed. So hopefully those will never happen again, because frankly they're ugly and they're very confusing. Like which do I use? Price V2, V3, but I'm using the January release of the schema, but I'm using V2, it's just a mess, right? We all love to have nice APIs that are named nicely and are very OCD about things being consistent. So dropping V2s and V3s are something we definitely wanna do.

There's one question you had about monitoring usage, which we talked on a little bit, which is our API health that partners see. But also internally, we have dashboards that we can query and say, how many clients are hitting this specific field? That we use a lot to inform our decisions around when can we deprecate a field? If it is deprecated, how aggressive can we be about removing it in a future version? If this field is super, super popular, even though it's deprecated, maybe we ask ourselves why it's so popular? Like, is the new field not doing what it's supposed to, or is it not better than what we had anticipated? So I don't know if there's any open source tools for doing this. I know we have a open source gem called GraphQL metrics, which tracks more things around resolver execution times and whatnot, but internally, we have tools that says every query that comes in, we basically enumerate over every field that's requested and do it like a plus one, plus one. And that all goes into a dashboard behind the scenes that we can use to figure out when things can go away. But yeah, deprecated fields, honestly, they're great, but they can also be quite the pain to deal with. So we're looking for ways to improve that. Love to hear other people's thoughts on this. I know there's been talks in the GraphQL spec as well around arguments, because I believe arguments right now, you can't mark them as deprecated, which is just like built into the spec, which is interesting. So we've had to have some work around those as well. Yeah, I think there's a lot of work to be done in this space in general, around versioning APIs and breaking changes and like how to make them less painful, especially for mobile clients, like Tila said, because on mobile, once the app is out there, you can't technically force them to update. I mean, you can, but it's not the best UX, right? So, this is something we all have to think about. So yeah, definitely some room for innovation in this space. Yeah.

I'll add too, like in that, we obviously say like, Oh, we hate when fields are like money v2, money v3, or like money v7, because it feels pretty dirty. At the same time, like I think if your API is much smaller, it was a really big lift for us to implement an API versioning system, right? So, I don't think having a couple v2 fields is like the worst thing. I will say too, obviously you have to follow through on the process of like make a v2 field, get all your clients onto that v2 field, then get rid of your old field or change your old field and then migrate them back to the original money field, right? So I think it's like always a long process to get rid of those v2 v3s in your code base. But I think you can do it as like an easier lift without implementing like a full on versioning system. I don't know, those are my thoughts. Does everyone agree or disagree? Is it just bad and we shouldn't let anyone do it? Yeah. You got to work within the constraints of your system, I guess. Yeah, I think Alice said this perfectly earlier. Like, it's not a one size fits all solution. We're a little bit lucky because of the GraphQL Ruby Gen that we have supports a lot of things that we utilize to do versioning. And some of that has been a collaborative process, but if you're running like a node backend or something like the tools that you have available might be different. So this is just what has worked for us, but it's one size does not fit all. Oh, okay. We're going to hop over to Greg. Could I interrupt with a follow up question? Go for it. Do you track which applications are requesting the deprecated fields so that you could send out maybe like targeted messaging, like, hey, your application is still using an old field? Yeah. So the last link that I shared in the chat is for our API health report. So for that, it's like a partner. For us, the partner is a third party app developer. So we're tracking everything per apps so an app developer can go in and say like, oh, I'm using these fields. And these fields are going to be removed or change in functionality like three versions from now. And then as we get closer, we put a lot of thought into the communication plan for this, which I don't have an exact link for, but you can read about it on the shopify.dev website. As we get closer to the dates, we do send out email blasts just to remind people that like it's coming. Because you can tell someone like, hey, this field is going away in a year and they don't think it's something that they should urgently fix. So they need that like 30 day nudge or whatever it is, just to refresh their memory. Jonathan, did you wanna add something? Sorry. Nope. You said exactly what I was going to say. What you're gonna say, okay, cool. Sorry. Okay, Greg, if input unions existed today, would you use input unions instead of your one of? Interesting question. I guess I would say no still. Input unions have generated, I feel like a lot of enthusiasm about them. And they're always kind of the missing solution that we're always one step away and this problem will be solved by the input union. And I've heard that so many times and to go and find that really this one of thing just goes and solves it and it solves it better than anything that the input union could do, because it's still the input union wouldn't necessarily be able to go and do anything beyond just this is a set of input objects that I go and I put together and can receive these. And like the thing that we have today that we can do with incredibly simple server side validation can go and let us input scalers or lists or anything else. And it just, it accommodates so many more use cases and it's not even very difficult to do. So yeah, I'm sternly on the side of the fact that we don't necessarily need this spec and I hope it doesn't land. Cool, thanks for sharing your somewhat shaded views of input unions. This is... Feel free to provide the counter argument there. No, I don't feel like I can. You've basically convinced me. I'm on team one of... So Greg, this is a question for you but we'll open it up after as a question for everyone. When have you been trapped by not null constraints? Let's dive into nullability. Oh gosh, nullability. So I have a long history with nullability and it started from... My background before Shopify, I was working on distributed graph architectures.

Federation and Nullability Challenges

Short description:

Federation and nullability pose challenges in GraphQL architectures. While a monolithic GraphQL architecture reduces symmetry problems, the issues of object synchronization and failed fulfillment still persist.

So doing basically federated graphs. And federation is interesting because you go and you have... You can't assume that an object always exists in singularity because it's actually a composite of many objects coming from many places and so all of a sudden you get out of sync with this object exists in one place but it doesn't in the other place and then we've guaranteed that you're gonna get the object except that it came from two places. And then one of them couldn't fulfill and then the entire thing explodes. And it sort of was a hard introduction to the fact that wow, nullability is really kind of an overstated promise in a lot of cases and there's a lot of reasons why these things can fail. So actually this is one of the things I found that was sort of refreshingly simple about the only thing refreshingly simple about coming to Shopify was coming in and saying, wow we've got a monolith that goes and does a GraphQL architecture. So we're only talking to one service and that actually diminishes a lot of the symmetry problems that you have in something like a federated architecture. And yet the same sorts of problems still exist. And I tried to kind of touch on that in my talk a little bit.

Nullability and Shopify's Versioning System

Short description:

Nullability in GraphQL can be a challenging topic, but it's important to strike a balance. While non-null fields make things easier for clients, there are cases where nullability is necessary. Shopify's versioning system allows for smoother transitions and breaking changes can be opt-in for clients. Common sense plays a role in determining nullability, and if a field has a database constraint, it should be non-null. Shopify's implementation also features a robust versioning system that simplifies API changes and generates different schemas for different versions.

MetaFields has been a really interesting one to look at in the context of GraphQL because metadata, these custom data extensions are kind of fundamentally at odds with what a GraphQL is trying to do. GraphQL is strongly typed rigid structures. We know what we're presenting to you. And with these meta extensions that we're building, we're really building something that's orthogonal to what GraphQL is providing you. We're giving you flexible structures that you can kind of define your own thing and design your own schema and then get them through this one static interface. And so we found that nullability has become a very important thing there. Sometimes it's working directly against us and we wish there's some fields that we can poll. And there's other places where it's working sort of by accident for us. Some things that's like giving nullable references, which was sort of a design of the system originally is also now accommodating the fact that we might not be able to always fulfill the connection. And we're lucking into the fact that this thing was nullable upfront. So we're definitely being more proactive about thinking in two years, what is this field needing to do and can we actually guarantee data for it? I would love to hear other developers thoughts on this though. I'm like waiting for Jonathan to unmute. Nullability has been in the past. It's one of the reasons why we have some of the v2 fields that we do. For example, I think in our Strawford API, I think it was articles. Every article has an author, and author was non-nullable. Turns out that that's not always the case. We actually now have author v2 which is, returns the exact same object model as before, just non-null. This is definitely a problem that we've run into before. I'd love to say that you should always return non-null, or I'm sorry, always null, and everything would just work. But that's really a pain for the clients to deal with. So it's really a balancing act. It's interesting over on the Metafield side, we're actually looking at Jonathan's team, and with eager anticipation of what you guys have in the works, because you really are unlocking a kind of a critical thing for us in terms of like, right now we've got this idea that if we wanna use the current API and build upon it, we're gonna have to get into a really awkward schema contract, where, you know, we require this one field, and then maybe a secondary input, which is a customization of the primary input. You know, we have to do these patterns because this thing is null and we can't unnull it. And you know, if it was just... Well, what you guys are working on is actually really a huge opportunity for us. And it actually gives us a lot of reason to potentially commit to stranger API signatures, like argument signatures, with the value that the work that you guys are doing kind of gives us the ability to transition the API in a very methodical and smooth direction. So like, maybe we're gonna have a weird state for maybe one API release, but two releases from now, if we can kind of adjust the nullability of arguments, we could transition this thing into a much cleaner and more succinct interface in the longterm. That's the goal. To give more context on what Greg is talking about, I think Guy mentioned earlier that there were some things that we could not effectively version within our strategy. One of them was what type of field returns, another is like an argument name, what the argument takes, like input objects, and nullability. Changing the nullability of a field is something we could not version for technical reasons behind the scenes, we are correcting all of that, so basically we're making our versioning system that much more powerful to allow us to make these breaking changes in a nicer, more fluid way. That author v2 would not have to be a thing. It would literally just, if you request version A, it's not a null, you request version B in the future, it's suddenly null. It's tend to be a breaking change, let's be clear, these are breaking changes, but because we have versioning in place, it's not an immediate breaking change, it's an opt-in breaking change on the side of the client. You're opting in to this new version, which means all the breaking changes that come with it, but we at least now have the opportunity to do those breaking changes in a nice way. I will say also like a few of the speakers, Yi, I think someone else also talked about API versioning and the fact that we have this unstable version, I think Greg did as well so, these changes go to our unstable version before they actually get slated for release candidate. So sometimes when your features or new fields are in the unstable version, that's a really good time to try to nail down nullability. This kind of what I'm hearing is that maybe there's this idea where things should either mostly be null and then we'll correct them to non-null or are you saying that things should be non-null if we think they should be null and if that doesn't work, we'll change them in the future? What's the pattern here? I think what, I guess this is directed at me. I think the answer is a lot of them come down to common sense, like a product should always have a price, right? So that should just be non-null. I think what this change allows us to do is in the opportunity or in the situation where some random edge case comes up, we're not hamstrung into the fact where, okay, now we have to come up with a new field and change the semantics of what this field means, we can actually make it now nullable in a future release and things still make semantic sense. That was the biggest thing is like, we have these field names, we take such long and care to name them the way we want them to name them. And now we're like, okay, now we've got to find a new name just because we screwed up in nullability, right? Like, and that's why V2 exists is because we still want to call it author, just V2. So I think Greg put this pretty well in his things, like common sense. If there's a database constraint that says it should be non-null, we should probably make the field non-null. And if we're getting a null somewhere in a resolver logic, we got a problem, right? So yeah, it's hard. I would still lean towards non-null where possible just because it makes it so much easier on their clients. They don't have to have an if statement says if this doesn't exist. They don't have to have a separate state, like a whole state in the UI. Like what happens if this is null? It just makes things easier for everyone around. Cool, okay. I think follow up questions definitely in the chat or we can connect offline because this is the whole big topic. I'm gonna ask another question, maybe for Guy let's see. Maybe you could just mention some of the things that are unique about Shopify's GraphQL implementation. Okay. Let me think. I think our versioning system is something very big. So, it's, although it was missing things like the return type for fields, it was more an imitation, like a low level imitation on the gem. But I think the version of the system is really good. And not only it means it generates different schemas for different versions. But basically what we do, every time we do an API change, something like substantial, we basically have a, like IML file with all the API changes. And basically we can say, okay, this API change takes place on the stable version. So when you declare a field we can say, okay, this field, depends on this API change. So when you are ready to release from unstable to the RC version, you only need to change this IML file. You don't need to go through all your new fields or new changes and mutations and to like code changes. It's more like a configuration changes. So it's very easy to release, although we have all this versioning and this kind of API change is also used by other teams, for example, to detect when a feature is going from unstable to release game date. So it means it's time to write a DevChangelogPost. So it has some implications on communications as well.

Performance and Caching in GraphQL

Short description:

M plus ones are not the only performance concerns in GraphQL. The overhead of parsing, validating, and analyzing queries can also impact performance. Complex schemas and computationally heavy analyzers offer opportunities for optimization. Caching can be challenging, especially with directives like in context. Caching static validation and considering the trade-off between accuracy and speed are ongoing considerations. Alexander, a lead front end engineer, shared his experience with API building efforts and his lightning talk on PHP.

So it has some implications on communications as well. And also when we put some API change in our release game date version, it start triggering warnings for partners. So the versioning system has a lot of implications with communications, with our code base, with data. So I think that is very, very unique, yeah. Cool, I'm gonna kick it over to Greg actually to talk about something else. I don't know if it's unique or not. I don't know what goes on in other companies.

Greg, you talked about us having like public and then internal schemas. Do you want to talk about maybe like what goes into our like private schemas that aren't public or why we have both? Yeah, sure. So earlier this year I was working on the markets feature which is now an early access. It's now being rolled out slowly to merchants. We're kind of adding weight onto it. And what's interesting about this is it's a full model and it actually, we need to go and do a complete backfill. It takes a whole lot of configuration that you had kind of littered around your shop before. And then we rewrite all of that configuration and this new data model which represents the markets architecture. And we're kind of removing pieces from different parts of the admin and then kind of consolidating this around this new portal that kind of does everything that involves selling globally all in one place. And so the reason that we wouldn't necessarily go and release a public API on it immediately is because there's still a lot of pieces that are in the works. And some of the most insightful feedback that we get is really when we do release the feature to the public and start having some of our merchants using the thing, we actually get some immediate pointers as to like what we need to adjust, this needs to go here, just some things that we might've missed or just opportunities that we'd like to really put into the final MVP that we put out. So all that said, just having everything run on the unstable version that we use internally allows us to release one of these features and get people using it inside of the admin context. And then once we've gotten to the thing to the point that this is a public go-live feature and we're going to make it headless, then all of a sudden it can be minted into one of the schema releases and external consumers can start actually controlling it through. And then that also opens it up to the app ecosystem as well, which is kind of a key juncture. As long as we're developing internally, app developers can't necessarily innovate on top of this feature. And as soon as it's available, the API is, anybody can write an app that goes and starts, building on top of this to manage your shop for you or whatever else. Cool. Thanks for explaining that. We're like getting, kind of short on time. We didn't talk about M plus ones or performance, except from the mobile perspective that much in this AMA portion. So, Jonathan, what about M plus ones? Are they like the only cause of performance problems? Are there other causes? What else? Yeah. Yeah. TLDR, they are not the only performance concerns. I think they're the most obvious because when they do happen, they're pretty drastic because obviously if you're requesting 50 products for example, and a field our product because it's really expensive iteration or computation behind the scenes. It's a pretty linear, however, I guess linear is the right word. It's pretty obvious that this is the performance problem here. But there are definitely some other cases where performance can be a little bit slow. One of the unique things about GraphQL is because there's a schema, because there are type safety and things in place there's a whole bunch of overhead that happens on every single query. So like when our back end receives a query it has to parse it, it has to lets it, it has to validate it. And then there's certain analysis things that we have on top of it like cost, query, computation, all those sorts of things. So if your schema is particularly complex or you have an analyzer that does something extremely, computensively heavy on every query, like that's a huge opportunity for optimization there. Or else, like it doesn't really have to be an M plus one. Like a field can be super, just plain super expensive to compute. Alex mentioned this earlier with directives and caching. One of the things with the in context directive that we had to be super conscientious of is because it's at the top of the query, doing Russian doll style caching of like sub queries in that query becomes very kind of impossible because the top level's now changing. So one of the things we were considering with the in context directive was for buy online and then pickup and store, we want to be able to contextualize and say, location inventory results should be sorted based on distance to your location, right? Cause that's a very nice user feature to have. Well, the problem is if in that directive, we're now all of a sudden accepting two very precise floating point values, right? The entire query basically becomes cached to that one single location by 10 decimal places. So what we decided there was instead of taking discrete locations, we would take an ID of all of our known locations, which is a merchant storefront. Which would then say, okay, you can give us your preferred location and then we would sort everything based off that. That's a huge performance win in terms of cache ability. So the Nplus1s are definitely not the only thing that can really hurt you, but they're one of the more obvious ones. Does that answer your question? Yeah, no, I think that's maybe, we didn't really talk about caching too much outside of Alex mentioning it during her talk. I think on our side, we're like exploring, maybe we can cache some of the static validation too, right? Like not just the execution of the GraphQL query, but all these things he mentioned that happened before. Some of that might be cacheable. Teo, I'm sure you're trying to cache as much as possible on mobile clients. Yeah. It's also tricky because in some part of the merchants value accuracy over speed, AKA when fulfilling items, they just wanna make sure they have the most up-to-date list. In case of an order has been edited, this mostly happens between people who are kind of behind the desk performing order edits and folks that are in the warehouse fulfilling orders. So in this case, we, for example, disabled cache so that you always have the most up-to-date information on other parts of the app, especially in dealing with media and videos. Yeah, this is something we can't really afford, so we cache as much as possible. Cool, yeah. Dominik, I see your question in the chat just about performance tuning and documentation, sorry, document caching, I think we kind of have hinted at that a little bit, but we're kind of short on time, so I'm actually gonna kick it back to Marcia to wrap us up. All right, fantastic. Great work Rebecca, that was great moderating. And to everyone who has participated, Dominik, Alexander, we truly appreciate you. Alexander, I see your camera is on and you've been coming off mute, I don't know if you want to say hi, just tell us what you do, if you're comfortable doing that and then I'll kick it off. Yeah, sure. So I'm Alexander, my title is lead front end engineer with OpenSocial, but I've also been leading our API building efforts. I'm actually, I've done a small lightning talk that will be broadcast next week and I'll be on the panel for non-JavaScripts languages. So talking about PHP, which has its own interesting challenges. All right, fantastic, thank you for that, thank you for participating. Dominique and everyone else who has stuck on for the past three hours. That's a long time, but you've hung out with us.

Conclusion and Gratitude

Short description:

Thank you Jonathan, Alex, Lana, Athel, Greg, Rebecca, and Andy for addressing the challenging GraphQL problems we face at Shopify and sharing how the team is tackling them. It's important to discuss problems, not just highlights. The recording will be shared by Git Nation and the pre-recorded talks will be uploaded to Shopify's YouTube channel. Thank you to all participants and the team at Git Nation for accepting this unique workshop format. We had a great time and appreciate everyone's involvement.

We truly appreciate you. Thank you very much. Jonathan, Alex, and Lana, Athel, Greg, Rebecca, definitely Andy, thank you so much for helping us to suffer some of the hard GraphQL problems that we experience here at Shopify and how the team is tackling those problems. I think it's not often that people talk about problems, people just talk about the highlights. So this was really cool. Thank you so much for putting all that together.

This recording will be shared and distributed later by Git Nation and our team here at Shopify will also be uploading the five pre-recorded talks on Shopify's YouTube channel. So you can look out for that there. And to everyone who's participated. Once again, thank you so much. Lara and your team at Git Nation. Thank you for accepting this not-so-normal way of doing the workshop. We had fun with it. We hope our participants did as well. Yeah, thank you so much for all your time, everyone. We truly appreciate it.

Watch more workshops on topic

GraphQL Galaxy 2021GraphQL Galaxy 2021
140 min
Build with SvelteKit and GraphQL
Featured WorkshopFree
Have you ever thought about building something that doesn't require a lot of boilerplate with a tiny bundle size? In this workshop, Scott Spence will go from hello world to covering routing and using endpoints in SvelteKit. You'll set up a backend GraphQL API then use GraphQL queries with SvelteKit to display the GraphQL API data. You'll build a fast secure project that uses SvelteKit's features, then deploy it as a fully static site. This course is for the Svelte curious who haven't had extensive experience with SvelteKit and want a deeper understanding of how to use it in practical applications.

Table of contents:
- Kick-off and Svelte introduction
- Initialise frontend project
- Tour of the SvelteKit skeleton project
- Configure backend project
- Query Data with GraphQL
- Fetching data to the frontend with GraphQL
- Styling
- Svelte directives
- Routing in SvelteKit
- Endpoints in SvelteKit
- Deploying to Netlify
- Navigation
- Mutations in GraphCMS
- Sending GraphQL Mutations via SvelteKit
- Q&A
React Summit 2023React Summit 2023
145 min
React at Scale with Nx
Featured WorkshopFree
We're going to be using Nx and some its plugins to accelerate the development of this app.
Some of the things you'll learn:- Generating a pristine Nx workspace- Generating frontend React apps and backend APIs inside your workspace, with pre-configured proxies- Creating shared libs for re-using code- Generating new routed components with all the routes pre-configured by Nx and ready to go- How to organize code in a monorepo- Easily move libs around your folder structure- Creating Storybook stories and e2e Cypress tests for your components
Table of contents: - Lab 1 - Generate an empty workspace- Lab 2 - Generate a React app- Lab 3 - Executors- Lab 3.1 - Migrations- Lab 4 - Generate a component lib- Lab 5 - Generate a utility lib- Lab 6 - Generate a route lib- Lab 7 - Add an Express API- Lab 8 - Displaying a full game in the routed game-detail component- Lab 9 - Generate a type lib that the API and frontend can share- Lab 10 - Generate Storybook stories for the shared ui component- Lab 11 - E2E test the shared component
React Advanced Conference 2022React Advanced Conference 2022
95 min
End-To-End Type Safety with React, GraphQL & Prisma
Featured WorkshopFree
In this workshop, you will get a first-hand look at what end-to-end type safety is and why it is important. To accomplish this, you’ll be building a GraphQL API using modern, relevant tools which will be consumed by a React client.
Prerequisites: - Node.js installed on your machine (12.2.X / 14.X)- It is recommended (but not required) to use VS Code for the practical tasks- An IDE installed (VSCode recommended)- (Good to have)*A basic understanding of Node.js, React, and TypeScript
GraphQL Galaxy 2022GraphQL Galaxy 2022
112 min
GraphQL for React Developers
Featured Workshop
There are many advantages to using GraphQL as a datasource for frontend development, compared to REST APIs. We developers in example need to write a lot of imperative code to retrieve data to display in our applications and handle state. With GraphQL you cannot only decrease the amount of code needed around data fetching and state-management you'll also get increased flexibility, better performance and most of all an improved developer experience. In this workshop you'll learn how GraphQL can improve your work as a frontend developer and how to handle GraphQL in your frontend React application.
React Summit 2022React Summit 2022
173 min
Build a Headless WordPress App with Next.js and WPGraphQL
WorkshopFree
In this workshop, you’ll learn how to build a Next.js app that uses Apollo Client to fetch data from a headless WordPress backend and use it to render the pages of your app. You’ll learn when you should consider a headless WordPress architecture, how to turn a WordPress backend into a GraphQL server, how to compose queries using the GraphiQL IDE, how to colocate GraphQL fragments with your components, and more.
GraphQL Galaxy 2020GraphQL Galaxy 2020
106 min
Relational Database Modeling for GraphQL
WorkshopFree
In this workshop we'll dig deeper into data modeling. We'll start with a discussion about various database types and how they map to GraphQL. Once that groundwork is laid out, the focus will shift to specific types of databases and how to build data models that work best for GraphQL within various scenarios.
Table of contentsPart 1 - Hour 1      a. Relational Database Data Modeling      b. Comparing Relational and NoSQL Databases      c. GraphQL with the Database in mindPart 2 - Hour 2      a. Designing Relational Data Models      b. Relationship, Building MultijoinsTables      c. GraphQL & Relational Data Modeling Query Complexities
Prerequisites      a. Data modeling tool. The trainer will be using dbdiagram      b. Postgres, albeit no need to install this locally, as I'll be using a Postgres Dicker image, from Docker Hub for all examples      c. Hasura

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

GraphQL Galaxy 2021GraphQL Galaxy 2021
32 min
From GraphQL Zero to GraphQL Hero with RedwoodJS
We all love GraphQL, but it can be daunting to get a server up and running and keep your code organized, maintainable, and testable over the long term. No more! Come watch as I go from an empty directory to a fully fledged GraphQL API in minutes flat. Plus, see how easy it is to use and create directives to clean up your code even more. You're gonna love GraphQL even more once you make things Redwood Easy!
Vue.js London Live 2021Vue.js London Live 2021
24 min
Local State and Server Cache: Finding a Balance
How many times did you implement the same flow in your application: check, if data is already fetched from the server, if yes - render the data, if not - fetch this data and then render it? I think I've done it more than ten times myself and I've seen the question about this flow more than fifty times. Unfortunately, our go-to state management library, Vuex, doesn't provide any solution for this.For GraphQL-based application, there was an alternative to use Apollo client that provided tools for working with the cache. But what if you use REST? Luckily, now we have a Vue alternative to a react-query library that provides a nice solution for working with server cache. In this talk, I will explain the distinction between local application state and local server cache and do some live coding to show how to work with the latter.
TechLead Conference 2023TechLead Conference 2023
35 min
A Framework for Managing Technical Debt
Let’s face it: technical debt is inevitable and rewriting your code every 6 months is not an option. Refactoring is a complex topic that doesn't have a one-size-fits-all solution. Frontend applications are particularly sensitive because of frequent requirements and user flows changes. New abstractions, updated patterns and cleaning up those old functions - it all sounds great on paper, but it often fails in practice: todos accumulate, tickets end up rotting in the backlog and legacy code crops up in every corner of your codebase. So a process of continuous refactoring is the only weapon you have against tech debt.In the past three years, I’ve been exploring different strategies and processes for refactoring code. In this talk I will describe the key components of a framework for tackling refactoring and I will share some of the learnings accumulated along the way. Hopefully, this will help you in your quest of improving the code quality of your codebases.

React Summit 2023React Summit 2023
24 min
Debugging JS
As developers, we spend much of our time debugging apps - often code we didn't even write. Sadly, few developers have ever been taught how to approach debugging - it's something most of us learn through painful experience.  The good news is you _can_ learn how to debug effectively, and there's several key techniques and tools you can use for debugging JS and React apps.
React Summit 2023React Summit 2023
26 min
Principles for Scaling Frontend Application Development
After spending over a decade at Google, and now as the CTO of Vercel, Malte Ubl is no stranger to being responsible for a team’s software infrastructure. However, being in charge of defining how people write software, and in turn, building the infrastructure that they’re using to write said software, presents significant challenges. This presentation by Malte Ubl will uncover the guiding principles to leading a large software infrastructure.
React Advanced Conference 2022React Advanced Conference 2022
22 min
Monolith to Micro-Frontends
Many companies worldwide are considering adopting Micro-Frontends to improve business agility and scale, however, there are many unknowns when it comes to what the migration path looks like in practice. In this talk, I will discuss the steps required to successfully migrate a monolithic React Application into a more modular decoupled frontend architecture.