GraphQL for Everyone - Danielle Man

Bookmark

Data is knowledge and knowledge is power. One of the greatest powers we have as developers is the ability to access and manipulate raw data with ease. But it takes a lot of context to know how to write a SQL query or use an API or make a CURL request. A lot of our energy in the GraphQL community is spent moving the specification forward and improving developer tools around it, but we don't spend much time talking about what GraphQL can do to help people in our organizations beyond our developers –– our designers, product managers, business leaders, customer success engineers, etc. In this talk, I will share the outcomes of some research we did at Apollo on GraphQL accessibility, and my vision for how GraphQL can connect humans to data that impacts them much more effectively, there giving them an ability to answer their own questions.



Transcription


Hey everyone, my name is Danielle and I'm an engineering manager at apollo where my team and I are responsible for building devtools specifically that help people query and use graphql APIs. Today I'm really excited to be sharing with you some of the ideas that inspire our work centered around how you can use graphql to connect people and your organizations beyond just your developers to data that would empower and enable them to do their jobs more effectively. This talk is going to be a little bit different from others because instead of talking about how to build a graphql api and the many interesting technical challenges there, I want to talk about how to fully leverage and consume a graphql api. I believe that graphql can be useful to people in your organization way beyond just the developers who are using it to query their data. I believe that you can build a unified graph for your data for everyone to use and that it will empower people in your organization like you've never seen before. data accessibility is a really hard problem and it's really hard to access data from all of our systems these days because we store it in all sorts of different places. Different databases, different microservices, everything has been optimized to be for a different type of data. Everything is queried in a slightly different way. And it's hard to figure all these systems out sometimes even as developers. But there are a lot of people who could do their jobs more effectively if they could just plug into the data in our systems. And for product development, we've solved this situation of having many services that are all a little bit different by introducing a new layer with graphql and using it to create a singular api. And I believe that this new layer that we've introduced for APIs with graphql can also be used to solve the more general problem of data access in our organizations. I believe that graphql can be the standard way that we model and query our business data for almost all use cases. So with our time today, I want to walk you all through how to think about using graphql in this way as we pose this question of can graphql be the way that we create a universal access point for our data? And to get into this topic, I want to start by walking through a SQL query together and comparing SQL to graphql a little bit. So this is a query that I've written many times over myself and it's an analytics question. For each account, how many users have I seen in the last 30 days? And if I break this query apart and look at the different elements of it, there's some distinct things that stand out. The select here lets me control what I'm asking for given a platter of options, which we have the exact same ability to do with graphql. The where here is a conditional selection. I only want to select users if I've seen them in the last 30 days. With graphql, we have nothing specifically in the language to express a filter like this, but we can still filter our data using arguments. The join here lets us select data across multiple tables. With graphql, you actually build your join logic into your schema. So the query writer doesn't have to know anything about how to join data to benefit from being able to query joins data. I actually think that the graphql experience here is much better for the data browser than the SQL experience because you're not kind of reconstructing your business logic around joins. And then the last thing that I want to point out for now is this ability to group and count. This idea that we have aggregation and array functions that we can apply to our queries is something that I really miss in graphql. If you want to query something that's computed, you have to build those computed fields into your schema, which means you have to anticipate their needs, which you can do for applications like building layouts and clients, but you can't exhaustively anticipate every need that anyone is going to have when they're just casually browsing your data. So coming back to our question of if graphql can be the way to create a universal access point for our data, I think the main concerns with taking this approach are going to break down into three categories that I want to walk through together. Can I optimize my queries enough for it to make sense? graphql is built on top of anything, or it could be built on top of anything. So it's going to be important to consider that we may be wanting to query very large swaths of data. Number two, can I express what I want to express? I think this one comes down to what I showed you with the SQL query, where graphql is just kind of missing computation elements in the language itself. And then number three, can I see things the way that I want to see them? graphql was designed to be used by developers, so it's not the most accessible thing to people from the data world who are very technical, but are used to working with data in table formats and doing things like Excel, applying Excel formulas and being technical in that way. So let's walk through these together and talk about whether or not these hurdles can be overcome with graphql. So this first question of can I optimize my graphql queries enough for it to make sense? The thing that really comes to mind with this one for me is can we provide a mapping of our queries to an implementation that is performant? graphql is adding this layer of processing in your stack. So the best thing we can try to do is make that layer as thin as possible and avoid adding extra processing at the graphql layer. And ideally, we can take the graphql queries that come in and map them directly to database queries, which is guaranteed to be the most performant outcome. And there are a lot of tools that do this or that help you do this specifically with SQL that are out there. There are even companies that build graphql on top of SQL as a service. And in all of those tools, what you're trying to do is take your graphql query and identify the precise SQL query that needs to be made to fetch the data that was asked for. And what I have on this slide is an example of one of those libraries called JoinMonster that does this. But there's a great blog post written on the topic of graphql to SQL specifically that I've linked here in the slide by Marc-Andre Giroux. Something that's a little bit closer to our hearts at apollo, though, is how do you translate Druid queries to graphql? Druid is a time series database that's designed to help you query analytics data over large swaths of time. And we do use Druid for some stuff that we do at apollo. And we actually built a Druid to graphql translator a couple years ago to support some of our product needs. At the goal, our time was to build a flexible api for querying stats data. We basically generated a portion of our graphql schema from Druid. So, you would run graphql queries and they would be transformed to Druid. So, I wanted to show you all what that looks like, actually, in an example. So here I've got a query for data on a service. And the part of this schema that's translated to Druid or generated from Druid is the stats part of the schema. And each field that I can query under stats is actually corresponding to a table in Druid. So, if I want to query the query stats from Druid, I can go in there and that's from the query stats table. And I can ask for the total request count for this service over the last five minutes, which is equated to the last 300 seconds. And we can see that we've gotten 2,285 requests for this service in the last five minutes. Now, the really interesting part of this schema is that I can add this field into my query under group by. And if I select fields under the group by, I will actually start segmenting the data in that query. So, now my 2,285 queries or requests are going to be split out into the number of requests for each query that got made. And each of the things that I can group by, these are effectively columns in Druid. So, I can group by client name and segment this query further. And the more things you group by, the more parameters you have, the more results you'll have from your query. So, this is just kind of directly mapping to Druid. So, this, I would say, worked extremely well on the flexibility side of things. Because you could just make Druid queries, which is not something I would have otherwise been able to do without needing to know how to connect to the Druid database. The flexible queries also let us iterate really quickly with our feature work because we didn't have to know what our precise end goals were in order to get started. And this generated schema was nice to keep up to date because every time we added a new column to a table, that would just get pulled into the schema automatically. On the flip side, though, I think the jury is still out as to whether or not this actually covered our product needs. It was a big pain point that our return data was not in the shape formatted for our client layouts because we still had to do a lot of computation in our frontends to get our data into the shapes that we needed. And I think the bigger anti-pattern with this is actually that it was confusing that if you added fields under group by, you were actually going to directly affect how the query was executed and the data that you got back. This ended up being kind of an intuitive to a lot of our teammates who would have expected something like that to be put into an argument instead. So, I think there are pros and cons to an approach like this. But the point is that it's possible to translate graphql directly into other languages, especially complex database languages. And before we move on from the query optimization topic, I just want to highlight that api concerns are going to be different from analytics concerns. graphql schemas are typically built to be APIs. So, it's common to have things for pagination and other types of limitations built in. But if you're trying to do data analytics, request latency is not going to be as important as being able to stream large data results back and scan really large arrays of data. So, these two things, these two worlds actually have kind of competing goals in some ways. And that's something you're just going to have to reckon with. So, question number two. Can I express what I want to express? I think the interesting thing about this one is that despite the language not having counting and aggregation functions built in, graphql does have this concept called directives, which can be applied to both queries and schema. And you can basically define logic and functions in directives. And if you apply them to your queries, that will give the schema some indication about how the query should be executed. And there are a lot of interesting things up there that people have done with directives, including things around authentication and skipping, including deferring fields. But what I wanted to show you today in the spirit of what we're talking about with query flexibility is a project called graphql Lodash. Lodash is a utility library in javascript. And it implements a very large number of functions to transform arrays of objects. These functions include filter, count, min, max, sort, reverse, all sorts of things. And graphql Lodash is a node package. And you can add it to your server. And what it will do is provide support for applying Lodash functions to your queries through directives. So, you can transform the results of your queries. And to really show you all what's going on with graphql Lodash, I thought we would jump into another example. And my favorite graph to query is the GitHub graph. So, I thought we could try and ask the GitHub graph an analytics question, which is what are the top voted issues in the apollo server repository? So, here I have started us with a query for the apollo server repository. I've asked for its list of issues. And on each issue, we can actually query the reactions that people have had to that issue. So, I thought a good way to proxy results or sorry, to proxy votes would be people providing a thumbs up reaction on issues. So, if we look at our data, you can see that we have thumbs ups here. We have eyes. And what I want to do is kind of transform this result into the answer to our question. So, the first thing I'm going to do is I'm going to actually map this edges array to try and count the number of thumbs up reactions that we have. So, in edges, I'm going to say let's map this to node.content. And if I rerun this query, you'll see now that array is much simpler. It's not an array of objects. It's just an array of strings. And actually, instead of mapping, if I count by node.content, we'll get an actual count of the number of reactions. So, now we know how many thumbs ups we have, how many eyes we have as reactions. But we don't know what our issues are yet. So, let's ask for the titles of these issues. And I don't want in my results a kind of object of reactions. I want a single number that represents votes. So, I'm going to get edges.thumbs up here. And I'm going to alias this field reactions to votes because we've decided that a thumbs up is a vote. And now that I have my array of data, basically, that I want, I want to transform this into something that's a little bit easier to scan. So, I don't really want this kind of node object as a middle between my array and the title and votes and my issues. So, for my edges array, I'm going to map to just my node. And what I really want to do is sort this array in a descending order so I can see the top voted issue. So, I'm going to sort by votes for my array. And it looks like sort by is ascending by default, which makes sense. So, I'm going to reverse this array. And it looks like I have some issues here that don't have any votes at all, which also makes sense. So, I'm going to filter only for issues of votes. And now we've got the answer to our question. The top voted issue here is an apollo server fastify playground issue. And if we wanted to look into this more, I could even get the URL and follow it. But we have to get back to our presentation. So, a few things I want to point out from this example. The ability to aggregate, group, and generally apply transforms to query results is a really powerful thing, in my opinion. This is what lets you ask questions to your data and get answers within the context of your tool without having to take your data out of that tool and move it to another tool. This is also what enables people to kind of use your schema beyond what you may have currently imagined and built into your schema through computed fields. Oops. On the flip side, though, graphql load dash is not particularly intuitive. It's taken me several hours of fiddling to feel comfortable with it. And even now, I'm not an expert. And the even bigger, more important thing to point out about this example is that transforming our data like this is actually breaking a principle of graphql, which says that your responses need to be congruent to the queries that you sent. So, for use cases like this, where you're writing queries for analyzing data within the console, I don't think that's a big deal, because you're not taking that experience outside of the kind of single window. But if we were to try and actually take this query and put it into our code, that's where we're going to get into trouble and things are going to get iffy, because our other developer tools like code generation are going to rely on us staying spec compliant. So, this is something that I love. But I would not recommend using it for kind of application development. If you need computed fields there, you should build them into your schema. And finally, our third question, how do I see things the way that I want to see them? As I mentioned earlier, what motivates me most about graphql is this opportunity that I see to make data more accessible. And I think the last big issue we have to cover is whether or not graphql itself is going to be accessible enough for use by people who aren't developers. It's really hard to stare at a blank editor and get started with the query when you don't even know what the language is. So, I could give a whole separate talk on the query building side of this and the data discovery aspect of data browsing. And here I have a picture of GraphiQLs Explorer on the left and Studios Explorer on the right. Both of which have thought quite a bit about how to actually help you write queries without needing to know exactly what to type. But unfortunately today we don't have enough time to go into the query building side of things. So, instead I want to focus on sharing some thoughts with you on working with the responses from our queries. And when we talk about api responses, at least graphql ones, we're talking about working with JSON data. JSON is a beautiful format for developers and APIs because you can express complex objects, it's human readable, and it's basically universally accepted and usable within our code. But the problem with JSON is that it's a very developer centric thing. It's not very common to work with outside of the developer world. And usually when we're talking about data sets, we're talking about tables and CSVs and loading things into Excel. And to turn JSON into tables, it usually takes code to do that because it's not necessarily a given transformation. And if you're not comfortable writing code, then you're kind of stuck. So, my last demo here is pretty quick. But I just want to show you all that there's more to graphql response browsing than just scrolling long arrays of JSON data. And I want to encourage you to always be expecting more from your tools. So, if we go back to our GitHub example, I want to just show you quickly table mode, which is an idea that you could kind of generate a table as best you can from JSON results and give people some tools to interact with their data in a way that's not JSON. So, with table mode here, we can sort our columns by title alphabetically. We can sort our votes, which would have helped us not even have to add this sort by directive. I can also download this data to a CSV if I wanted to and move it to another tool. So, by building accessibility into our tools like this, we're enabling people to go beyond just what you could do with code. So, I see a lot of pros to making our tools more accessible in this way. Table mode is much easier to scan data from, even for developer use cases. And something like this is just naturally going to feel more familiar and welcoming to everyone else. And I don't see a ton of downsides to building things into our tools like this, other than kind of the eventuality that we don't want to overload our tools with too many things and make them too busy for any one use case. But beyond even working with data in your editor, I've seen people build integrations between graphql and other tools that are already familiar in their workflows, like Tableau. And I find that kind of those kinds of integrations and that kind of thinking really inspiring. So, as we wrap up, I want to leave you all with this thought. graphql can be impactful to your organization way beyond helping your developers be more productive. I've talked with product managers who use the Graph to put queries into their product specs to kickstart projects and designers who like to browse the Graph to figure out what data they can even add to mockups. I've taught our customer success team how to use the Graph to run admin mutations that don't yet exist in our admin app. And I aspire to one day maybe even teach our sales team how to use the Graph to look up information on behalf of their accounts. If our tools become accessible enough and our schemas are well designed and well built, maybe we won't even need a lot of our integrations and admin apps in the future because everybody could just use the Graph. So, I encourage you all to think about how to design your schema with flexibility so it can be used beyond the ways you've currently imagined. I encourage you to continue to always expect more from your tools, especially when it comes to making them more accessible to different groups of people. And most of all, I encourage you to share your Graph with your entire organization and to do the work to make your Graph work for everybody. If you're interested in trying out what I've been showing, that's the tool that my team builds called the Explorer in apollo Studio and it's free to use. Thank you all so much for tuning in and listening. If you have any questions, please don't hesitate to ping me on the conference Discord or reach out on Twitter or ask in the Q&A. My DMs are open and I look forward to seeing you on the internet. Hi, Armin. Hey. So, great talk. And without further ado, I think we should jump right into the audience questions. And the first question is from Radha. Oh, sorry. I'm looking at the wrong questions. It's from Nikon. Is there any extra implementation needed to use the add underscore functions or do they come with apollo Server? Yeah, that's a great question. So, that is a set of directives that come with a package called graphql Lodash. And graphql Lodash is not packaged directly with apollo Server, but you can absolutely use these two things together. graphql Lodash is just its own npm package. But the tool that I was showing you all to write those queries is called the, we call it the Explorer. It's in apollo Studio. And if you query through the Explorer, the Explorer actually extends the schema that you're using with those directives automatically. So, you can do kind of front-end queries with Lodash, graphql Lodash using the Explorer. But if you're using another query tool, you would need to add that to your server. Okay. Thanks for the question, Nikon. Next question is from TheWorstDev. That's a great nickname. Are there any noticeable performance issues with graphql Lodash? That is a good question. So, most of my usage of graphql Lodash has actually been on the front-end in the Explorer. And so, the ways in which it has been slow have been when you're querying a large amount of data that you then transform on the front-end. And the slowness there, I would not attribute to graphql Lodash. It's mostly just large amounts of data coming over the wire. But I imagine if you put graphql Lodash on your server, it would be much better, much different, and better performance-wise. But the challenges that you'll have there are then, if you use graphql Lodash and you provide that to your clients, then you are going to be breaking the spec in other ways. So, you want to be specific about where you use it and why you're choosing to use it. Okay. And then TheWorstDev, who is now hopefully the best dev, has a follow-up question also. What other types of visualizations make sense? Would something like charts ever be in Explorer? I have a dream that charts would one day be in the Explorer, but they're not right now. But you can imagine all sorts of things. Like, if you get an array of data back and it's all numbers, why wouldn't we give you a chart? Why wouldn't we have ways that you could transform your results to see them more visually? So, yes, having charts in the Explorer is like a pipe dream of mine. So, to actually bring that to fruition and make it practical for everyone in kind of a generic use case, I think, could be a little bit of a challenging problem, but not one that's not doable. You got to put the bar really high, right, for yourself and make it work. But maybe TheWorstDev can just help you out. That's true. I also, I was thinking while watching the talk, is this way of working something you've cooked up in your own brain, or is it something that you're doing at apollo or maybe at one of your previous employers? Yeah, that's a great question. A lot of the inspiration for the ideas in this talk around making graphql more accessible to folks who are just getting started and to folks from the query writing perspective comes from my own experience of consuming graphql APIs for the last four years to build apps. I've always been a front-end developer, so I've always come from the perspective of writing queries, not building schema. And my experience building schema is much newer than my experience of learning graphql through the query world. What we've done at apollo that's related to this talk is we've done some user research on how people get started with graphql and write queries in general as they progress through their graphql journey. And when we did that user research about a year ago, we learned that people wanted their tools to be more accessible, especially when they were getting started because graphql itself is like a language. It's like code. You have to learn how to write it. And there are a lot of people who can benefit from seeing data if they could only write queries, but they get intimidated by looking at kind of a blank editor that tells them to write some code to do something. So a lot of what we've done has been informed by some research that we did do at apollo to make graphql more accessible, to make the data in your APIs a little bit more discoverable. But yeah, a lot of it is kind of cooked up in my own brain. I will say yes to that. You could have just said yes then. Sorry. No, it's okay. I know it was a long answer. We're looking for the long answer. We want to have your opinion and that's why we have you here speaking. So I'm just kidding with you. We have a question from Juan. Thanks for the talk. That was great. Do you think graphql could actually be an accessible solution or would you prefer to use another tool to access the data? So like using graphql over another tool like Tableau or something to access data? I will interpret it that way. I do think graphql can become the way that you access data generically. That's kind of the picture that I was trying to paint with the talk. And I have seen people use graphql and integrate it into a tool like Tableau. And I found that to be really inspirational and interesting. I think to get to that point, we need to design APIs in a way where they can be used that way. I don't think graphql out of the box can be used as kind of a generic data querying tool. I think you have to build it. You have to use graphql to build your api into a generic querying tool because of those kind of bullet points that I was talking about with performance and schema design being flexibly used and stuff being kind of key to using graphql to make your data more generally accessible. But I do think it can be used. I think the fact that the graphql specification is consistent and strongly typed and already so adopted in the developer world, it all leads to the science that it can be used that way. Okay. Then we have one more question from, oh, Mette. Oh, that's funny. How do you see us getting from where we are now to working in your proposed way and how would we implement this in a company? Yeah. Well, so my talk is kind of trying to paint a vision for what the graphql api could be used to do, but doesn't necessarily prescribe how you get there because I think that's always going to kind of depend on how your company does things and what's right for your company. But I think what I wanted you all to take away is that you should think about your api potentially being used this way so that you can design your schema in a way where it can be used more flexibly. It can be more directly translated to database queries and be more performant. And if we get there more and more and more and then more and more people in your companies are going to be able to use it and then there will be like a natural draw. So I think the way to get there, it's not like a prescriptive formula. It's a mental model and a way of thinking that you have to adopt and kind of bring to your companies. Okay. Then one more question. I think that's the last question we have time for. That's from, I'm going to do my best to pronounce this. Sneha Sif. So we went, with graphql flexibility, do you see a concern with graphql low dash with clients writing costly queries or are there workarounds possible? Well, so the, yes, I do see a concern with clients writing costly queries and that's where I think the, um, the line item where you kind of do your best to translate your schema into database queries comes in because if you have your schema queries or your graphql queries and your schema translating more directly into your databases language, then the kind of costliness of what the client could be asking for will be offloaded onto the database, which is, uh, I think the best case scenario because that's what databases are designed to do. I see graphql low dash as kind of being a bridge to, um, getting you more schema flexibility without it already being there. Like it wouldn't necessarily recommend that anyone start relying on graphql low dash for actual development. If you need your app to do something and you're writing queries into an actual app, you should design your schema to have what your app needs. If you're just trying to provide your schema as a way for general data accessibility and like one-off queries here and there through something like GraphiQL or the Explorer, I think graphql low dash can be helpful for that because it brings you flexibility that you maybe didn't account for. Um, so yeah, so I do see issues with it being costly. That's why this is more of an idea though, and kind of like a prescriptive way that I would recommend you do things. Okay. I think that's all the time we have. Uh, we had one more question, which was for me and I'm just going to share it and it's from Josh and he wants my background. So I have a lovely donut shaped earth and that's how the earth is. That's how you see it from if you're far up in the galaxy. So just Google donut shaped earth and you'll find it. Josh. Uh, Danielle, uh, thanks a lot for joining us. Uh, you're going to stick with us for a little bit for the panel discussion coming up right now. Um, but first we're going to go to the results. Uh, no, we're gonna go do something else. So we'll see you in a bit. Thank you. Thanks for having me.
33 min
02 Jul, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic