Building a Highly Scalable Cloud API Gateway


One of the benefits of GraphQL is that it enables a single entry point into any number of back end service or databases. More and more companies are adopting cloud technologies – leading to more jobs, more money, and more opportunities in cloud computing. When GraphQL is integrated with a cloud back end, enabling secure and direct access to dozens of databases and managed services, the limitations are endless. The problem is often that building out these implementations from scratch, and getting them right, is hard. In this talk, I’ll show how you can build cloud-based GraphQL back ends connecting to multiple databases (SQL and NoSQL), serverless functions, machine learning services, and microservices using TypeScript, AppSync, and AWS CDK – and do so in fewer lines of code than you’d expect. We’ll also look at how subscriptions, security, caching, and authentication are all handled, enabling you to build APIs that can simultaneously connect to tens of millions of clients at once to offer true real-time applications at scale. By the end of the talk, you should feel comfortable knowing that you can become a cloud engineer using an existing GraphQL skillset.


♪ Hello, everyone. Welcome to my talk. I'm going to be talking about building a highly scalable cloud API gateway with GraphQL. My name is Nader Dabit. I'm a senior developer advocate on the AWS web and mobile team, where I focus on lowering the barrier to entry to cloud computing. My team works on a lot of different technologies. We work on web, we work on mobile, we work on back-end, infrastructurist code, all kinds of stuff. But really, my team is really focused on the intersection of web, mobile, and cloud computing. So kind of like the intersection of front-end and cloud, full stack cloud, you could call it. And in particular, one of the things that I'm really interested in is this idea of a full stack cloud and full stack serverless. So I think the talk that I'm going to be giving today kind of goes very closely along with that idea, because I'm able to kind of use my existing front-end skill set as a front-end developer to build out these scalable cloud applications using GraphQL. So I have a couple of books, but the most recent one and the one that might pertain the most to this talk is full stack serverless from O'Reilly. So if you're interested in building these cloud applications with React and GraphQL or just with GraphQL in any front-end framework, definitely check it out. Everything there is built with GraphQL and AWS. So this talk is going to be broken up into three main parts. So it's a 20-minute talk, not a lot of time. So I'm going to kind of briefly go over some concepts, and then I'm going to build a live coding demo that kind of builds the things that I'm talking about, because for me, I like watching code. I learn a lot when I see code. And the idea that I'm going to be building is using infrastructure as code in TypeScript. So it's actually a really fun topic in my opinion. So I'm going to be kind of going over what are the challenges of building a custom GraphQL server. I'm going to talk about AWS AppSync, and then I'm going to do that live coding demo. So what are some of the challenges for building a GraphQL API from scratch? It's kind of broken up, in my opinion, into four main parts. The first and, in my opinion, most important is security. When you build out your GraphQL API, you not only kind of have to make it work, you have to take into consideration a lot of different things and a lot of different scenarios around authentication, authorization, and fine-grained access control. Most of the time, your API is going to need multiple types of authorization scenarios. So if you think of something like Twitter, you think of something like Instagram, Facebook, all of the more popular modern applications that you probably interact with today typically have a combination of public and private access. So how do you actually implement this public and private access and if you're doing this from scratch, it's typically a lot of work. You have to think about things like encryption. You have to think about how the user information is stored, all of this stuff. And then there's GraphQL-specific scenarios like malicious queries, query depth, and things like that. And then the typical security roles and issues that you deal with within the API surface, like DDoS attacks and things like that. The next main area is scalability. So you built this API. It works. But what happens when you get that 10,000 user bump at one time or something goes viral and you get a 10x or a 100x or maybe even a 1,000x number of visitors? You want your API to scale. So how do you actually provision your infrastructure and do so in a cost-effective manner to where your app scales? And then you have to think about GraphQL-specific stuff like subscriptions. So one of the things that we've worked on very hard and something we've really focused on the last few years is making subscriptions scale. So we have customers that have scaled our APIs to tens of millions of connected devices for a single API endpoint. This was a very hard challenge, and it's typically something that is just hard to do in general. And then the next thing that I would say is something that you typically need to take into consideration is this idea of developer velocity. So when you're building your API, what happens when you need to add a new feature, when you need to version something, when you need to modify an existing data source or maybe even add a new data source? What happens when your API starts to become complex? Is this going to slow your team down and therefore kind of slow the development of your entire app? And then finally, there's cost. And when I talk about cost, I'm not only talking about monetary costs, I'm also talking about opportunity costs and developer hours and the things that you consider, for instance, if you're a startup or just a company in general, and you have competing companies that are doing the same thing that you're doing. If you need to build something and test it out, and you don't know it's going to work or not, let's say you spend three months, six months building this thing, that's an opportunity cost that you have to really consider, because if this thing doesn't work out, you've spent a lot of money, you've spent a lot of time, and that time could have possibly been spent building something else had you known that this thing was not going to work out. So how do you take into consideration all of these things, and how do you do so kind of in an effective manner across all of these different areas? This is kind of where I think that AWS AppSync, the service that we've been working on for the last few years, really shines across all of these different areas. And AppSync is a managed GraphQL service from AWS. And I really like managed services because when you are dealing with a managed service, this team is working on just this one problem for years. And all those people are specialized in solving this one problem. So when you're buying into a managed service, you're typically buying into years of work, a lot of money spent, and a lot of edge cases solved for just dealing with this one problem. So if you can find something that kind of fits the challenge that you're trying to solve in a managed service way from a team that you can trust, it's often a good approach for doing something without having to build it yourself from scratch and kind of reinventing the wheel. Anyway, so AppSync allows you to kind of build out these APIs, anything that you need, mapped through GraphQL, you can do with AppSync. So you start off with the new AppSync API, you define your schema. Here in your schema, you define your types, of course, your queries, your mutations, and your subscriptions. From there, you configure the different authentication and authorization types. So you can have a base type, and the base type could be public, it could be private, it could be using an OIDC provider, it could be whatever. But you can have additional authorization types as well. So most APIs, like I mentioned, have multiple authorization scenarios. So most apps that go to production have multiple auth types. So you typically have some type of public access along with some type of private access. After you've configured your auth types, you configure your data sources. Now, the cool thing about AppSync is that you're mapping your API requests directly into these really highly scalable databases and data sources. So you can map your GraphQL API requests into something like DynamoDB directly, first-class access to DynamoDB, serverless Aurora for SQL, Amazon Elasticsearch. You can even map your GraphQL API requests to something like MongoDB that's outside of AWS. Because the GraphQL resolvers are just functions, you have complete flexibility over all the business logic that you would need. So you have not only this managed service that kind of handles the scale, but you have all the flexibility that you would typically get from building it from scratch. From there, you would write your resolver logic. So you can either choose from pre-written resolvers that we have for different popular scenarios, or you can write this completely from scratch. And you can do so in pretty much any language that you would like to work with that's supported by AWS Lambda. So you can write this in Python, TypeScript. You can write this in.NET. You can write this in Go, whatever you like. JavaScript as well, of course. And then from there, you deploy your API and you iterate. And there's a couple of different ways to deploy your API. And I'll go over the two main ones that I work with and the one that I'm going to kind of show off as CDK. So AWS Amplify and AWS CDK are kind of the two that I've worked with a lot. But you can really deploy AppSync with anything that supports CloudFormation, which is an infrastructure as code for AWS. So if you'd like to use a serverless framework, SAM, anything like that also works. If you're working with AWS Amplify, you get a few things that you don't get from other CloudFormation providers. You get GraphQL code generation. So we'll introspect your GraphQL schema and generate all the different GraphQL operations for the platform you're on. So if you're on iOS, we'll generate those in Swift. If you're on Android, if you're on web, if you're on Flutter, depending on the platform you're on, we'll generate the code for you for that platform. It also has a GraphQL transform library built in. And this is a library of directives that allow you to decorate your schema and add additional authorization logic. So from here, you can define connections. So one to many, a mini one to one relationships. You can define authorization rules. You can map your operations directly through a Lambda function using the function directive, all kinds of stuff. There's a few other things that come out of the box with Amplify, like offline access using Amplify Datastore. But essentially, you just get additional helpers that kind of come with the framework if you're using Amplify. But the choice that I'm going to be using today is AWS CDK. And AWS CDK is nice because it allows you to kind of write your infrastructure as code using a familiar programming language. And for me, that's TypeScript. So this is the example that I'm going to be working with today. And you can actually configure CDK to work with Amplify client libraries. So there's a way to deploy your API using an output that will create a configurable file that you can then use with your Amplify client side project. And that's what we're going to take a look at as well in just a moment. So the demo that I'd like to do now is I want to build out a cloud API gateway from scratch using CDK and AppSync. So to do that, I'm going to go ahead and jump into my text editor and we're going to start writing some code. So from here, I'm going to go ahead and make a new directory called API gateway. And I'm going to change into that directory. From here, I'm going to initialize a new CDK project. So I'm going to say CDK init. I'm going to set the language to TypeScript. And from here, I'm going to go ahead and open up this project in my text editor. And in this file in our lib folder where we have this stack name, this is kind of where we're going to be writing our CDK code. And when you're working with CDK, you're working with different AWS services. So you can install the constructs and classes for those different services directly in your project. So for me, I'm going to be using AWS Lambda. So I can say AWS Lambda. I'm going to be using AppSync. And then I'm also going to be using AWS DynamoDB or Amazon DynamoDB. And once you've installed these constructs and classes, you can start using them. So I'm going to go ahead and build our API here using these constructs and classes. So the first thing I'm going to do is I'm going to import Lambda, DynamoDB, and AppSync. And then using these constructs, we can start writing our API. So what I'd like to do next is go ahead and see here. Go ahead and create a new AppSync API. And here we're going to go ahead and say we want a new AppSync.graphql API, and then you pass in your configuration. So this will build out the API from scratch. So I'm going to say I want this to be cloud API GQL Galaxy. We then define where this code comes from. So in this example, I'm saying in a folder called GraphQL and a file called schema.graphql. So we can go ahead and create that folder here and then create that file here. We'll come back to that in just a moment. We then define our authorization config. We're just setting that as API key for now. And then that's basically going to give us public access. For our schema, I'm going to go ahead and create a schema that's going to allow us to interact with two different data sources within a single, you know, within our API. The two data sources that we're going to be working with are a DynamoDB database, and then the second one is going to be the Unsplash API. So we want to be able to fetch images from the Unsplash API. To do that, we're going to have an image type that has some metadata about the image. We're going to have a URL type that has metadata about the URLs that are going to come back for each image. So that's the Unsplash data types. We then have a post type and a post input type for creating items in DynamoDB for a blog. So the post type is going to be kind of for a blog. And then we have our operations for queries and mutations. So we have a query of list posts that scans the database and brings everything back. We have search images that takes in a query and returns an array of images. And then we have a mutation for create post that takes in a post and returns, you know, the post. And then finally, we have a basic subscription for on create post. So after we've defined our schema, we come back here. We might want to go ahead and define our DynamoDB table. So here, the DynamoDB table is where our posts are going to go. So we're just basically saying we want a new DynamoDB table. We'll call this Cloud DDB post table, something like that. And then from there, we can now add our Lambda function, which is going to map our GraphQL requests from the API into DynamoDB. So what I'm going to do is we'll go ahead and create a Lambda function. And here we're saying we want a new Lambda function. We're giving it a name of like, you know, some Lambda function. Maybe I'll call this GQL Galaxy function. We're setting some basic things like our runtime, where our handler function is going to be in a folder called Lambda functions, our memory size, and a couple of environment variables. So the API key is where we're going to need to place our API key for the Unsplash API. I've already configured this in a separate project, but this would be something like, you know, x dash xxx, whatever. And then finally, we create a data source from the API that we created earlier using the Lambda function that we just made there, that we just defined there. So we're saying we want a Lambda data source to be created by calling API dot add Lambda data source. And then the last thing we want to do is give some permissions. So we're going to say we want to enable the DynamoDB table to be accessed from our Lambda function. And then we add a new environment variable for the actual table that we created for us to be able to access it in the Lambda function. And then the last thing we want to do is create our resolvers. So for the resolvers, let's see here. We have three resolvers because in our schema, we have three operations, lists, posts, search images, and create posts. So here we're going to map our GraphQL operations into those different fields. So we're saying we want a list post query, a create post mutation, and a search images query. And this 66 lines of code, and if you really take out the comments and stuff like that, we're talking about maybe 55, 60 lines of code. This is kind of deployed our entire backend. The only thing we would need to do at this point would be to kind of create our Lambda function code. So I'm going to create a couple of these files just to kind of show you what they look like. But in general, I think the main file that you'll learn from is this main.ts, and then maybe the searchimages.ts. So the main function is going to be the main entry point. So this is going to be where we kind of receive the event that is coming in from the GraphQL request here in the event. And we're going to basically have a few different functions that we can operate against. So create posts, list posts, search images, and then the post type itself. And in the event, we have the field name, and the field name is basically going to be something like create posts, list posts, search images, whatever. And then we can get the argument that we want to use for the query that we want to get. So we can call it whatever, and then we can get the arguments out of the event by calling event.arguments. So anything that was passed into the operation is available here. So we can call create posts, we can call search images, and we have that argument information here available as the event.arguments. And then the last thing we'll look at is kind of this function for creating search. So basically, we're going to have our main query here, where we have our... Let's see here. We have our query coming in as an argument. We define the main query endpoint, which is this Unsplash API, setting the query as the query that comes in as an argument, and the API key that comes off of the environment variable. And here we're just calling Axios. So we call Axios.get, passing in the main query, and we return the response data results array. And then that's it. We could then deploy this if we want to by running npm run build and cdk deploy. And this would deploy our API. And what we would end up getting would be basically this AppSync endpoint. And we have in the dashboard a way to kind of interact with this endpoint. So we can go to our query editor, and we can do like a list post query for listing our posts. And right now we're going to get an empty array. So let's go ahead and create a new post. And we should be able to list posts now. And this is going to our DynamoDB table. And then we also have our search images function, which is interacting with... I'm sorry, with Unsplash. So we could say we want to search for cats. And we would get like the images back, and we should be able to test these out. And there should be, you know, cats. That looks like a dog, but there's a cat too. Let's try one more and then we'll be done. So I'm going to go to full, see if there's cats here. There we go. All right. So that's it. That was a lot to take in. So I hope you learned a lot. If you'd like to learn more, go to the Amplify docs, go to the AppSync docs, or go to the cdk docs. They're all listed here, but you can just search those. Easy to find. So thank you for watching. I hope you learned a lot and I hope you enjoy the rest of this event. Great talk. I have a couple more questions. You, if you don't mind listening to them, you can run off, we can't stop you. We'll see. We'll see. If there's something I want to listen to and answer, I might stay. We'll see. No worries. So you heavily relied on the Amazon CDK, which is the cloud development kit, I believe. Do you sacrifice any flexibility on your scalable API by tying it so intimately with an Amazon backend, Amazon infrastructure? Yeah. So I think that because the stack that I use is using, actually the service itself is using AWS AppSync, then anytime you use a managed service, you're going to be having some trade-offs, and some of those trade-offs are going to be good and some of them are going to be bad. So some of the good things are that you will be able to have this infrastructure that's highly scalable deployed in just a few minutes, and you can get up and running very quickly. You don't have to write it all from scratch. The trade-offs are that you are limited by the APIs and the functionality and the features that AppSync provides. So if there's something that you need that we don't have, then you basically are not going to be able to get that functionality. So I would say whenever a customer is looking at using AppSync, so for instance, one of our earliest customers and the most successful helping us scale up, especially our GraphQL subscriptions was Ticketmaster, and we kind of looked at what they needed out of the service, and it worked out really well for them. And since then, they've helped us scale up, especially our real-time capabilities, to tens of millions of connected clients per single API endpoint. So we typically tell customers, let's look at what you need out of your API, out of your service, and see if this is going to work. If it is going to work, that's great. If not, then you should probably use something else, maybe build your own custom solution. Yeah, that's a good answer. Can't fault it. Let's talk about testing. So we had a little developer testing in your talk, but testing Lambda is often seen as difficult, at least in community convention, and testing GraphQL is still a little less mapped out compared to other kinds of interaction. What are your suggestions for testing GraphQL Lambdas? So using the app that I just built, actually a really easy way to start testing right away would be to use the SAM CLI, which is kind of the serverless application management framework that AWS provides. They have a really nice CLI for testing serverless functions. As far as testing AppSync APIs themselves, using the service itself, there's really not a really good story for actual local testing right now. So most of the time, customers have a testing environment for their features, and then whenever they're happy with where they are, they'll just merge that into their production environment. But there is no local testing strategy right now for AppSync APIs built with CDK. But using the approach that I took, it actually is a lot easier to test locally than if you were building AppSync APIs directly in the console or something, because you're able to test out those resolvers locally using the SAM CLI, and the SAM CLI is a really, really solid way to test serverless functions, in my opinion. Is using serverless offline a naive way of trying to test this, or is that too complex for it to handle? Testing serverless offline? Well, no, testing these Lambda functions and GraphQL responses using the serverless offline package. Oh, you know what? I have not tried that. Like from the serverless framework? Yeah, yeah. They also have a way of running Lambdas locally. This is me how it feels like putting together a testing surface before. Right. It's probably very similar to the SAM CLI. I would say it's probably very similar, where you can basically invoke a function passing in an event, and the event has whatever information that you would expect in that function, and then you would be able to log it out to your terminal and things like that. Yeah, I think both of those are probably good options. Okay, well, let's have a question from Juan. This seems to be targeted to start, to avoid spending so much time on something that might not work, but can it be migrated to another system afterwards? So I guess it's talking about the portability of the solution. So there's two parts to the solution that I just showed. I would say the two main parts are the database that we deployed, which is kind of, in my example, DynamoDB, and then there's the actual business model and the actual business logic and the schema that's part of GraphQL, I would say, land. And if you wanted to kind of take what we just built and then migrate it to another solution, you would have to kind of, you know, you could port over your schema and you could port over your, you know, even those functions probably and kind of have that set up, but you would have to rewrite a significant amount of your app. You would have to kind of just rebuild that back in. But I would say, yes, it is portable. Like, you could basically port the server to a custom implementation, maybe living on AWS because DynamoDB, I would say the biggest lock-in with what I just went through is kind of DynamoDB, which is an AWS-specific database. So if you wanted to kind of port this over to something like Google or, you know, some other custom database, then I think the main challenge that you would have would be to kind of, you know, pull that data out of that database and then transform it into whatever it needs to look like for your next database and then place it there and do so, especially if you're in production, in a way that does not kind of, you know, give any opportunity to mess up your data. Mm-hmm. Yeah, persistence is always the hardest part of these kind of services. I really like that in many ways GraphQL is almost like a pure function of a resolver. You put something in, you take something out. It's... You know, a really interesting thing, though, about DynamoDB, and one of the things that a lot of customers, like, do is they have... there's a way to trigger a function for any update in DynamoDB. So basically what customers a lot of times do is they have basically another database where they store kind of a copy of this database, and they might need it for certain reasons, like they want to have a different type of data access pattern. So, for instance, if you need to query the same data using something like Elasticsearch, which gives you a lot of more powerful querying capabilities around stuff like geolocation and things like that, you can automatically kind of have a replica of that data in another database, and it actually happens automatically if you set up the trigger. A trigger would basically... anytime an update happens, so anytime someone places or updates or deletes anything, you have that piece of data sent in a trigger into, like, a function, and then you could take that data and then write to another database. So if you wanted to kind of have a copy in a SQL database, even, you know, you would be able to query the same data from not only a NoSQL database, but also a SQL database, Elasticsearch, or wherever you'd like it to be. Like it's like an event-driven persistence layer. Exactly. Cool. A question from Dude RSM. Hey, Nailah, how would you describe the learning curve when trying to use AppSync to create a DQL API? I think that in the past it was actually a fairly steep learning curve. I think that a lot of the tooling and things that are coming out now is making it a lot easier. So for instance, there used to be a resolver level that you had to use called VTL, Velocity Templating Language, which is a completely new language that is very hard to test. But now with the release of the Lambda Direct Resolvers, basically what I use, you can just write all of your business logic in JavaScript. So the learning curve for AppSync is really just understanding how to write the deployment layer that you're using. So in my example, I use CDK and we wrote an entire API and like a few dozen lines of code. But you could also use something like serverless framework, or you could use something like the Amplify CLI, which kind of writes that for you. And it just asks you a bunch of questions and prompts you. And then you kind of answer yes, no, and you put in your schema. And then we write all of that for you. We will write all of those resolvers for you. We'll write all of the business logic for create, read, update, delete operations. And then you can kind of go in and kind of modify those. So I would say the learning curve has gotten better. And then look out at the beginning of next year, the first quarter of 2001, we're releasing a really, really massive new update to the AppSync service. It's going to make it even easier. We're seeing like a really high demand for certain things now that we have scaled up the number of customers that we have. So we're kind of being able to prioritize these things a little better. We have a really interesting release that's coming out the first quarter of next year. That's going to lower the learning curve even more. So I would say right now, since we released that new support for direct Lambda resolvers about a month or two ago, that significantly lowered the learning curve. And then it's going to be lowered again in the near future. Well, you know how to sell. You've left us on a cliffhanger. Oh, yeah. Well, keep an eye out. Follow me on Twitter. If you're not already, you'll probably see me talking about it and doing a lot of demos and stuff around it. I know that we will. All right. This completes our questions for yourself this lovely evening. So put your emoji caps together for Nader. Gosh, I always forget. I'm so sorry. No, you got it right the first time. Nader. Yes. I knew. I doubted myself. I shouldn't have. And, yeah, thank you very much for your wonderful talk. And we will have, I think you have a speaker room chat planned afterwards, right? I do. I do. And thank you for having me. And it was really good to talk to you. I know I met you in London a while back. I'll hopefully get to see you again in person because I've heard that you're a fun person to go party with. So hopefully I can do that with you one night. That was amazing. You're always invited. Thank you. Sounds good. Later, everybody.
31 min
02 Jul, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic