At Unity, we use GraphQL federation to expose a wide range of business functionality across the organization in a single GraphQL schema. With an ever-growing number of services, this presents challenges for authentication and authorization across the board. I explore how we implemented GraphQL auth at the gateway level, the key design decisions behind it, and the wide-reaching benefits this can have.
GraphQL Authentication and Authorization at Scale
AI Generated Video Summary
This talk discusses the implementation of GraphQL Authentication and Authorization at scale at Unity. The speaker explains how they use GraphQL Federation to expose business functionality through a centralized schema and the challenges they faced in handling auth at scale. They describe how they simplified configuration and scaling using Mercurius and implemented hooks and an Orth plugin. The implementation at Unity involves a Unity Orth endpoint and a central Unity Orth directive definition. The talk also covers the implementation of AuthPolicyHandler and AuthDirective for downstream services and showcases different access levels. The Mercurius Auth plugin provides a scalable approach to authentication and ongoing improvements include adding support for a filter schema.
1. Introduction to GraphQL at Unity
Hi everyone, and welcome to my talk on GraphQL Authentication and Authorization at Scale. I'll discuss GraphQL at Unity, the problem we wanted to solve with Auth at scale, the design, and the solution. At Unity, we use GraphQL Federation to expose business functionality to clients through a centralized GraphQL schema. We are actively working on improving our self-service options and automating the process. Our tech stack includes Node.js, TypeScript, and the Mercurius GraphQL server.
Hi everyone, and welcome to my talk on GraphQL Authentication and Authorization at Scale. My name's Jonny Green and I'm a Senior Software Engineer at Unity Technologies and also an open-source developer.
So before we get into it, I'd like to just quickly discuss the agenda. Just to really set the scene for you all and just really provide a lot of context for the solution and design that we'll talk about coming up. So first of all, we'll talk about GraphQL at Unity. I'll introduce our team and basically some of the things that we do, as well as talking about our tech stack. Just to really provide you all with an idea of how we work and how we implement GraphQL.
Next up, I'll discuss the problem we wanted to solve with Auth, especially at scale. So I'll discuss, yeah, basically the problems we encountered and also what benefits we're looking to solve, looking to have as well. Next up, I'll talk about the design. So I'll discuss the actual details of the design, as well as also how this solves our original problem and how it gives us the benefits that we're looking to have as well. And then finally, I'll show you all the solution. So this will include the implementation, but also a short, brief example to give you an idea of exactly how we implemented this at Unity.
So GraphQL at Unity. So I work in the live platform team, where our primary aim is to expose business functionality to clients. And this is all through a centralized GraphQL schema. So we use GraphQL Federation under the hood, where we have a gateway, and then, behind this, we expose several services that expose different parts of the business. And it's functionality, and clients talk to the centralized schema, which they treat as the hard contracts, do we expose and basically, yeah, just get the bits of business functionality that they need to access.
We also are actively working on improving our self-service options. So as we get more and more requests from clients, we want them to be able to do the work themselves as well. So if they want to expose a new bit of business functionality, we want to say to them, here's some instructions, and you can go and implement it yourself in a GraphQL compliant way with all the benefits that we serve as well. So for instance, we do a lot of caching under the hood, so we can tell them how to take advantage of all these benefits and tooling that we've developed over the past year or so. We're also looking to automate a lot of this. So a lot of this is fairly generic stuff and standardized by convention, so this enables us to look into, can we generate all this code and can we make lives for new developers a lot easier by just saying, if you want to spin up a new service, just run this command, and you're good to go. And you've got all the service set up you need. It's all hooked in, and it can be deployed as well.
So I thought I'd also talk about our tech stack. So we use Node.js and TypeScript under the hood, and it's done very well for us. And with that, we also use the Mercurius GraphQL server. And this is for both all our services and also our gateway as well.
2. Handling Auth at Scale with GraphQL Federation
We chose Mercurius because of its fantastic open-source community and the functionality it provides. With GraphQL Federation, we can expose business functionality at scale and take advantage of federation features. However, handling auth at scale was a challenge. We wanted to simplify the implementation and ensure a consistent auth model across all services. We also wanted to be closer to the auth mechanisms provided by Unity and have a single policy definition for all services.
So the reason why we chose Mercurius is just it's got a fantastic open-source community, and we really bought into that, and we really like it because of that. Also Mercurius provides all the functionality that we need and more. So we just thought it was a bit of a no-brainer for us to choose Mercurius, and it's been fantastic so far.
We also use GraphQL Federation. So as I mentioned before, we're exposing business functionality at scale, so through a central gateway with lots of services. Now, lots of these services provide functionality that can be related to other bits of functionality. So we're not only federating the graph and all the individual graphs into one big graph. Also, taking advantage of federation features, such as if we've got a user with a certain set of details that comes from another service, we can provide that through GraphQL Federation.
So I'd like to discuss a problem next, and that is handling auth at scale. So when we started out, all our services were running auth. So that means all the services had to implement the auth mechanism. They had to define their auth policy definition, and also then just define all the fields they wish to protect. And that's quite a lot of stuff for a service to be doing when we really want to scale this out long-term. So we want to take a lot of this responsibility away from the service, and just really make sure that the implementation is like really simple, and they just need to, all the services need to worry about is a business logic is exposing, and just take a lot of the responsibility away from that.
So with that, we've also got multiple services and also multiple teams. So lots of new contributors outside of our team that are contributing new and more and more services to our federated system. And this comes with its own problems. So multiple teams, they may use different tech stacks, they may use different auth models. And for the clients, they don't necessarily know how the fields are protected or what fields are protected. So we also want to solve this as well. So we want to ensure that we've got a consistent auth model across all services that they all adhere to. And we also want to present all this information to clients. So if a service protects a certain field in a certain way, we want to tell clients about this in order to tell them how to handle all the errors and basically how to integrate with our system. In terms of the benefits we're looking for, so we also want to be closer to the auth mechanisms. So because the services are currently running auth, they're quite far away from the Unity auth endpoint that they interact with. So we want to be closer to that, which will give us better performance, but also allow us to optimize, for example, how many requests we make to this endpoint and how we request this endpoint. So we want to be closer to the auth mechanisms that Unity provides. We also want a single policy. So we essentially want to control the centralized policy and basically tell services how to access it. So all the services all use the same policy definition, and as long as they adhere to that, we're good to go.
3. Simplifying Configuration and Scaling
They can protect the fields in exactly the same way, in exactly the way they want. It just makes integration really simple because all they need to worry about is configuration. From the gateway point of view, we need to tell the gateway how to find the service policies, register the policies, and apply policies upon a GraphQL request. As long as services provide a GraphQL schema, they're good to go. We needed to add features to Mucurious to keep the approach simple and scalable.
They can protect the fields in exactly the same way, in exactly the way they want. And it just makes integration really simple because all they then need to worry about is just configuration and that's it. They can then focus on the business functionalities and not worry about how do we interact with the auth endpoint. It's already happened.
And one other benefit that is really apparent from that we want to have is basically any requests from the gateway to the services. We want that request to be already authenticated, already sufficiently authorized. So as it stands with the services running off at the service level, we don't have this. So we want to provide this as well.
So next up is the design. So the next logical step is to take yours to the gateway level and this has several benefits. So you can see here that all the services now need to do is just define the policies that they care about. So for instance, if they want to protect a certain field and make it authenticated only, they just need to add a GraphQL directive to that field and they're good to go. Similarly, if they want authorization, they just have GraphQL directive with slightly different config to say, I want this X, Y, and Z permission. And also, they're good to go. So for the service point of view, it's just configuration.
From the gateway point of view. So we need to tell the gateway how to find the service policies. We need to tell the gateway how to register the policies, and we also need to tell the gateway how to apply policies upon a GraphQL request. Now with GraphQL directories, we can do this. So not only is this really simple for services, it's also really simple for the gateway as well, because all it needs to do is just look through the federated schema that it's generated. And depending on the auth directive that it finds, it can create a policy accordingly.
So how does it solve the problem? So as you can see, we've moved a lot of the work to the gateway level, and you can see that from a service point of view, all they need to worry about is configuration. Not only that, the configuration is also all in the GraphQL schema. So it means that services can be implemented with whatever text stack they choose to use. As long as they're providing GraphQL schema, they're good to go. And that's really, for us, that's really powerful, because it allows us to truly scale with all the services that we want. And they could write it in Rust, and it doesn't matter. As long as they apply the same auth policy definition to their service, to their GraphQL service, then, yeah, that's all they need to worry about.
So what do we need to do? So before we need to implement this, we needed to add some features to Mucurious in order to keep the approach as simple as possible. And to, yeah, to really, really make this approach scale.
4. Implementing Mucurious Hooks and Orth Plugin
We added Mucurious hooks and implemented an Orth plugin to enhance the Mucurious server with Orth policy handlers. We defined our own policy handler that interacted with the Unity Orth endpoint and registered the necessary configs to protect the required fields in the services. All services use a centralized policy definition.
So one of the first things that we looked into was adding Mucurious hooks. So these are essentially lifecycle events within Mucurious that indicate, well, they indicate the points at which, the important points within the GraphQL request, and also a system. For instance, when a gateway refreshes its schema, we want to know about that, we want to be able to provide functionality at that event. So we needed to implement that, but also we needed to implement an Orth plugin. So this would be an Opt-in plugin that would use the Fastify plugin system to essentially enhance the Mucurious server with the ability to add Orth policy handlers, look for certain Orth directives, and also add user information to the context to enable Orth policy handling to get as much information as they need. And then from a Unity point of view, we need to define our own policy handler that interacted with the Unity Orth endpoint. And also we need to define all the policies at the service level. So, we need to register these configs and basically protect all the fields we need to protect in the services. And these would all use a centralized policy definition. So they would all use the same GraphQL directive. And as long as they adhere to that directive, as I said before, they're good to go.
5. Implementing Mercurius Orth at Unity
With Mercurius Orth, we traverse the GraphQL schema at the gateway, looking for Orth directives. We define policies based on the directive arguments and protect fields and types. The Orth policy handler is configured through GraphQL directives. The implementation at Unity involves a Unity Orth endpoint and a central Unity Orth directive definition. Services automatically pick up the central directive. The implementation is simple due to the Mercurius Orth plugin and hooks. A similar approach can be found on GitHub. Unity has a federated system with a gateway, user, and post service.
So the solution. So, with Mercurius Orth from a high level, essentially what we're doing where we would traverse the GraphQL schema and this is all at the gateway. So we traverse the GraphQL schema and look for Orth directives. And depending on the directive arguments that will pass to it, we would then define a policy according to that position within the GraphQL schema.
And from this, we would just keep traversing all the GraphQL schema. Look for all the protected fields, and protect fields and types. And from that build up an Orth schema. We would then traverse this schema and wrap all the relevant field resolvers with the Orth policy handler. Such that this Orth policy handler would be configured to the config that we registered through the GraphQL directives.
Now, a lot of this work, so obviously we don't want to traverse the GraphQL schema at every request. So a lot of this work is done at registration time. So by the time a GraphQL request comes in, the field resolvers are already wrapped and all that's run is just the Orth policy definition with the correct configuration passed to it. You can find out more about this on GitHub, so you can see mecuriousjs slash Orth. So if you want to have a look at the code or discuss it further, you can check it out there.
So next up is usage at Unity. So as I mentioned before, we need to also implement some Unity-specific things. So we already had out of the box a Unity Orth endpoint. So our Unity policy handler that we wrote basically just interacted with that endpoint, and this would all be at the gateway level. And we also defined a central Unity Orth directive definition, and that would be controlled by us. So if we needed to evolve the schema, the services would then automatically pick up the central Orth directive in a schema evolution compliant way. And as long as they adhere to this central directive definition, they're good to go. And then, yeah, as I mentioned before, we just needed to then define the Unity Orth directive usages such that we're just protecting the certain fields in the correct way that needed to be protected. And that's really it. It was actually a fairly simple implementation in the end, just because we did a lot of the work within the Mercurius Orth plugin and also handling the Mercurius hooks. So as a simple example, I thought to provide you with this approach that's very similar to what we did at Unity, just to give you an idea of how we did this. You can find this on GitHub. So join in the green slash Oracle galaxy demo. And yeah, if you want to check out the code, you can check out that. So what have we got? So we've got a federated system. So we've got a gateway and then behind that sits a user and post service.
6. Implementing AuthPolicyHandler and AuthDirective
In this demo, we define AuthPolicyHandler and AuthDirective for downstream services. The GraphQL schema for the user service has a centralized AuthDirective definition. Similarly, the post service protects sensitive fields with the admin role. The gateway registration involves the auth context, policy handler, and auth directive AST for configuration and role matching.
So as part of this demo, we will define AuthPolicyHandler and also an AuthDirective, which the downstream services both use and then the gateway automatically picks up these AuthDirective definitions. So a GraphQL talk wouldn't be complete without a GraphQL schema.
So here we have the schema for the user service. You can see here we've got the AuthDirective definition here and this would be the same across all services. And this is something, this is controlled by us. So this is completely centralized. And so whenever we evolve it, we evolve it in the correct way, such that when services pick up the evolution, they've got it immediately and they just need to comply with that directive definition.
Similarly for the post service, so we've also got the auth definition and we've also got the post service usages as well. So we can see we're protecting the author field with the admin role. And we've also got the post field that we're also protecting with the admin role. These are kind of maybe more sensitive fields that we want to protect with a different role such as admin users. So you can see here, these are the ones we're looking to protect.
And then finally, we've got the gateway registration. So you can see this is as simple as it gets. So I've redacted all the other setup which is as fairly standard Mercurius and Fastify setup. This is specific to Mercurius auth. So as mentioned before, we've got several things we need to do. So we've got the auth context which looks at the request headers and just applies an identity to the context that just says this user from the headers. And then we've got the policy handler. So here you'll see it's very similar to a GraphQL resolver. So the final four parameters are just the parent information, the arg information, context and info. So you've got everything you need from that point of view. And then we also provide the auth directive AST to say, this is the configuration applied to this field. So here what we're doing, we're getting the roles using the identity and we're making sure that the auth directive matches up with the roles on the user. And yeah, so we can do whatever we want in this policy and it's just completely up to you.
7. Auth Policy and Summary
The final four parameters provide all the necessary information for the policy. The auth directive AST is used to configure the field's access. We can customize the policy to match the roles on the user. Examples of different access levels are demonstrated, including full access, partial access, and no access. In summary, the Mercurius Auth plugin provides a scalable approach to authentication. Ongoing improvements include adding support for a filter schema. Thank you for listening to my talk and I hope to see you at the rest of the conference.
So the final four parameters are just the parent information, the arg information, context and info. So you've got everything you need from that point of view. And then we also provide the auth directive AST to say, this is the configuration applied to this field. So here what we're doing, we're getting the roles using the identity and we're making sure that the auth directive matches up with the roles on the user. And yeah, so we can do whatever we want in this policy and it's just completely up to you. We've left it completely flexible. So you can just call whatever auth mechanism you want with all the information you need from the premises.
And here we're just saying, let's look for the auth directive within the gateway. Now here, I thought I'd just give some example requests. So we've got full access to fields. A user review and admin, and we can see we've got the data as normal. And then here we've got partial access. So we can see we've got a request with just a user role and we can see we no longer have access to posts and we can see here, it's set to null with the associated error. And then finally, we've got a solution with no access to fields. So this is an authenticated request. We can see, got no data. I've got both errors telling us what to do.
So in summary, we talked about the Mercurius Auth plugin. We talked about how we define the central auth definition and basically, as long as downstream services adhere to recognised schema, this is a scalable approach to auth. And we really believe this and we've really been seeing the benefits within our system. In terms of improvements, we're adding lots and lots of features all the time to the Mercurius Auth plugin. So one of the things that is going on at the moment is just adding support for a filter schema. So you only see parts in the schema you have auth to. And that's it. Thank you so much for listening to my talk. It's been really great talking to you and personally, I'm really excited about this work and I just hope you took something out of it. So thank you for listening and I hope to see you around at the rest of the conference.