You've heard of schema-first and code-first GraphQL development. You've seen tools that autogenerate GraphQL schemas from Swagger and SQL. But there's another way to build GraphQL APIs that's flexible, maintainable, and evolvable — just writing GraphQL SDL with directives!
Directive-driven GraphQL Development
AI Generated Video Summary
Lenny Burdett discusses Directive-Driven GraphQL, a prototype for building GraphQL APIs. The approach involves adding and removing parts of the schema, and offers advantages over imperative systems. The prototype integrates GraphQL with gRPC, allowing for easy editing and reshaping of the schema. The gRPC DSL and directives determine the behavior of the API. The directive-driven approach also supports Apollo Federation and future work includes support for GraphQL unions and real-time data subscriptions.
1. Introduction to Directive-Driven GraphQL
Hi, my name is Lenny Burdett and I'm a solutions architect at Apollo. Today at GraphQL Galaxy, I'll be discussing Directive-Driven GraphQL, a prototype I've been working on. I'll cover the different approaches to building GraphQL APIs, focusing on the data-driven and directive-driven methods. These approaches offer unique advantages and challenges, and I'll share my insights based on my experience working with large GraphQL platforms.
Hi, my name is Lenny Burdett and I'm a solutions architect at Apollo. I'm excited to be here today at GraphQL Galaxy to talk about a prototype I've been working on, based on a concept I'm calling Directive-Driven GraphQL. I'll start out with a short discussion on the various approaches to building GraphQL APIs and how one might choose between them. Then I'll give a quick demo of my prototype for building GraphQL APIs on top of GRPC APIs. Then I'll go deep on some design choices in the prototype that illustrate the power of the directive-driven approach.
So I propose that there are four distinct approaches to building a GraphQL API. The first two are well-known terms in the GraphQL community, schema first and code first. With schema first, you write your schema and GraphQL files and then you write your resolvers in code. With code first, you write your resolvers and then you derive your schema probably using metaprogramming or reflection. And there's plenty of literature about choosing between these approaches. So I'm not going to spend any time on that discussion. Instead, I want to spend time on the remaining two approaches which don't have well-known names as far as I know. If they do, please let me know. For this talk, I'll refer to them as data-driven and directive-driven. With a data-driven approach, you generate both your schema and resolvers from a description of your data source. Some examples of this approach are post GraphQL for Postgres and GraphQL Mesh for a variety of data sources. The directive-driven approach, which I'll define in a minute, is still pretty novel. I don't know many examples in the GraphQL community yet. I first encountered this approach when a colleague of mine at Square built this really neat directive-driven implementation for Elasticsearch.
I work with a bunch of companies building really large GraphQL platforms. Choosing a development approach is one of the earliest decisions they need to make. Like any engineering decision, it's always a matter of balancing trade-offs. Having a set of values helps guide your decision-making process. Here's some of the values that I hold, based on my personal experiences and working with customers, that I'll use to evaluate the GraphQL development approaches in this talk. If you have different values, you'll probably make different decisions, and of course that's entirely okay. First, I always want to prioritize good API design and provide a collaborative, agile schema design process. Ideally, my API strikes a balance between solving specific use cases, and being generalized enough to support new and unforeseen use cases. This is a challenge, especially at scale with dozens of teams and clients. So I want to avoid anything that gets in the way of good schema design. Second, I need the ability to gradually evolve my API as client and business needs change.
2. Directive-Driven GraphQL
This includes being able to add types and fields, and also being able to remove unused parts to my schema to keep things manageable. GraphQL is a great abstraction layer on top of these implementation details. We believe that declarative systems are preferable to imperative systems. A directive-driven system involves two different three-letter acronyms working together. I've been working on a prototype of a directive-driven approach for building a GraphQL API on top of gRPC APIs. gRPC is an open source RPC framework that originally came from Google.
This includes being able to add types and fields, and also being able to remove unused parts to my schema to keep things manageable. Third, I don't want to be locked into any particular implementation or data source under the hood. Any successful system is eventually going to need a rewrite, or some kind of migration. You might start with Postgres, but after you add a few million users, you might realize that you'd be better off with a combination of DynamoDB and Elasticsearch.
GraphQL is a great abstraction layer on top of these implementation details. I don't want my choices today to restrict what's possible tomorrow. This one's a big Apollo value and one I hold as well. We believe that declarative systems are preferable to imperative systems. Since you're watching our GraphQL talk, you probably agree, GraphQL's declarative query language is a big improvement over writing imperative code to coordinate a bunch of REST API calls. Declarative systems are usually easier to understand, support great tools for static analysis, and are better for collaboration.
Apollo Federation's declarative composition model is one example of our investment in this idea. And lastly, the quicker and cheaper it is to build a system, the faster we can get to market and the less tech debt will accrue along the way. Unsurprisingly, I believe that a directive-driven approach to GraphQL fits these values the best. So let's dive into that idea and hopefully you'll agree.
A directive-driven system involves two different three-letter acronyms working together. First, you define your API using the schema definition language, or STL, and you define the behavior of your API using some domain-specific language, or DSL. With GraphQL directives, these can go together in the same file. To run a directive-driven API, you pass both the STL and DSL to some black box implementation. It uses the STL to provide the GraphQL API, and uses the DSL to determine the behavior of how it fetches data from your data sources.
I've been working on a prototype of a directive-driven approach for building a GraphQL API on top of gRPC APIs. I've definitely got a long way to go, but I'm excited to show off what I have here today. Before I dive into the prototype, I want to briefly touch on gRPC for anyone not familiar with it. gRPC is an open source RPC framework that originally came from Google. I happened to use it for many years when I worked at Square. I think it's really great for east-west communication between services. It's super fast and efficient. However, I always found it lacking when using it in web and mobile clients. It has a lot of the same downsides that REST APIs have. My dissatisfaction with gRPC for end-user clients is what drove me to learn GraphQL in the first place. The wire format for gRPC is called protocol buffers.
3. GraphQL and gRPC Integration
GraphQL uses JSON, while gRPC uses proto buffs. Building a gRPC API is similar to the schema first approach. My prototype has three components: a boilerplate generator, a validator, and a runtime. The boilerplate generator helps you get started by generating a GraphQL schema with directives that declare resolver behavior. The resulting API is similar to a data-driven approach, but it's just a starting point that can be customized to fit your clients' needs.
GraphQL uses JSON, while gRPC uses proto buffs. Building a gRPC API is similar to the schema first approach. First you define your messages and methods in the proto buffs interface definition language and then separately you write your method implementations and code. gRPC supports code generation in many languages that provide the same end-to-end type safety you would get with a GraphQL approach as well.
My prototype has three components. A boilerplate generator, a validator, and a runtime, which is the black box that I mentioned earlier. The boilerplate generator is there just to help you get started. Given a proto buff service definition, it generates a GraphQL schema with a bunch of directives that declare the behavior of all the resolvers. The result, though, is not a great GraphQL API. It's actually really similar to the output of a data-driven approach, like GraphQL Mesh. It maps one-to-one with the gRPC API, so it's really just an RPC API that happens to be written in GraphQL. It doesn't model a data graph, it doesn't conform to GraphQL idioms and best practices, some types and fields have weird names, and has a bunch of extra types that aren't even necessary, and instead of modeling the relationships between your graph of data, it usually has just a bunch of foreign key fields. But the idea here is that this is just a starting point, and you'll rewrite it to make the API that your clients actually need, and you don't even have to use it if you'd prefer to start with a blank slate. Either way, all the work is going to go into this one GraphQL file.
4. Editing and Reshaping the Schema
While editing my schema, I realized the difficulty of mapping GraphQL types to gRPC messages. I created a validator to catch mistakes. I reshape the schema to fit client needs, renaming types and fields, removing indirection, and adding documentation. This unique approach showcases the power of declarative programming. The schema is proven correct through static analysis, and the API can be run with a single command. The implementation is surprisingly simple, with just one resolver per field.
While I was editing this schema and using this, my DSL, I realized pretty quickly that it was difficult to keep a mental map of the GraphQL types and fields to gRPC messages and methods. I wrote a validator that warns me if I mistype a field name or if I use the wrong output type for a field.
What you're seeing here is me editing and reshaping my generated schema into one that's more idiomatic and better fits the needs of my client apps. I'm able to rename types and fields, remove unnecessary layers of indirection, build relationships between types provided by different RPCs, add documentation, and much more. To me, this is actually the most interesting part of the prototype and something I haven't seen anyone else do. It showcases the power of the declarative programming model.
I'm able to prove the schema is correct just through static analysis. I wish I had a little more time to get into it, but it's basically a depth-first graph traversal algorithm that walks the GraphQL schema and compares the field's arguments and return types with the relevant gRPC methods, messages, and fields. Once you've reshaped the schema to fit your client's use cases, you can run this API with just one simple command. Here I'm running my movies GraphQL API on top of a gRPC API, and after I execute an operation, you can look at the gRPC server's logs to see which RPC methods were called to fulfill the fields on this operation. It works pretty well, and you might be surprised how simple the implementation is. It's basically just one single resolver that runs on every field, but it knows what to do based on the presence of apply directives. Right now it's about 300 lines of TypeScript.
5. gRPC DSL and Directives
The gRPC DSL determines the behavior of both the runtime and validator components. I ended up with only four directives. One obvious difference between gRPC and GraphQL is that gRPC does not differentiate between queries and mutations.
The gRPC DSL determines the behavior of both the runtime and validator components. It was a really fun challenge figuring out how to express resolver behavior in a set of declarative directives. I ended up with only four directives.
This first one is really just for configuring the gRPC client and includes the ability to do things like forward headers for authentication and things like that. I'll spend the next few minutes showing you what you can do with the other three directives to easily build a great GraphQL API.
One obvious difference between gRPC and GraphQL is that gRPC does not differentiate between queries and mutations. A gRPC service is just a flat list of methods. The first design choice I made was that it's up to you to decide which methods are queries and which methods have side effects and should be mutations. All you have to do is add root fields on the query and mutation types and add the fetch directive to declare which RPC method to use as its resolver.
6. Request and Response Wrapper Messages
It's standard practice to have unique request and response wrapper messages for each method in a gRPC API. I made a decision to have the input messages map directly to GraphQL field arguments, which removes the need for the request wrapper type. For response types, you have the choice on whether to include it or not. This makes it much easier to construct an idiomatic GraphQL API.
It's standard practice to have unique request and response wrapper messages for each method in a gRPC API, like this get movie request and get movie response. Sometimes you want to include these wrapper types in your GraphQL API, but sometimes you don't. I made a decision to have the input messages map directly to GraphQL field arguments, which removes the need for the request wrapper type. For response types, you have the choice on whether to include it or not. If you want to return just the movie that's embedded in the response message, you can use this argument to dig it out of the response and avoid this extra layer. This makes it much easier to construct an idiomatic GraphQL API.
7. Fixing Client-Facing API
GraphQL allows us to fix the client-facing API without changing the underlying API. By deprecating and renaming fields in the GraphQL schema, we can improve the API without touching the underlying implementation.
When making changes to RPC-style APIs, it's common to version the methods and messages with namespaces or naming conventions. GraphQL doesn't have namespaces or a versioning mechanism and I didn't want to add one to my DSL. So, instead, I made the choice not to enforce any agreement between GRPC message names with GraphQL type names. The validator only enforces that field names, argument names, return types and argument types match. This allows us to keep the GraphQL API consistent as the underlying API changes, which is an important feature to avoid lock in and reduce churn in your client applications.
I also wanted the ability to fix the client-facing API without having to change the underlying API. GraphQL is a great abstraction layer for making these kinds of fixes. The deprecation workflow is a good example of this. Imagine we have a date field on our GRPC API but we realized the name is too vague for our client applications. Instead, we want to call it premiered on, which better expresses the intent of this field. We can fix this just in the GraphQL schema by deprecating the date field using the built-in deprecated directives and using this rename directive to add a resolver for the new field that actually just uses the same GRPC field under the hood. And now we have declaratively improved our API without having to touch the underlying API.
8. Modeling Graphs in GraphQL and GRPC
One of the biggest differences between GRPC and GraphQL is that GraphQL models a graph of data while GRPC is basically just functions on a remote server. Instead of an ID field for the movies director, I can add a field that returns a complex type to better represent the relationship. We're not restricted to adding a fetch directive only to the root query mutation fields. The most common strategy to deal with this data loader pattern is the data loader pattern, which caches and batches requests to make fewer network calls.
One of the biggest differences between GRPC and GraphQL is that GraphQL models a graph of data while GRPC is basically just functions on a remote server. You can't quite model a graph using protocol buffers, mainly because GRPC lacks the ability to subselect fields. If you have a recursive graph or bidirectional relationships, you won't be able to model that effectively.
Instead, it's common for GRPC APIs to include foreign keys and responses, requiring that clients resolve the relationship themselves with another network request. If I have an RPC method for fetching the related type, I can declaratively model this relationship in my GraphQL API. Instead of an ID field for the movies director, I can add a field that returns a complex type to better represent the relationship. And I can add a fetch directive to declare that this fields resolver calls this RPC method. We're not restricted to adding a fetch directive only to the root query mutation fields. We can add them to any field in any layer in our GraphQL schema. These directive arguments declare how the director ID field on the parent object will map to the ID field in the request message of the RPC.
Of course, you only have to dip your toe into GraphQL to run into the n plus one query problem. In this operation, if my GraphQL API returns ten movies, then I'll end up making ten separate calls to fetch the director relationship for each movie, which is not ideal. The most common strategy to deal with this data loader pattern, to deal with this, is the data loader pattern, which caches and batches requests to make fewer network calls. So, I added an optional data loader argument that allows you to configure the data loader in this resolver. You can specify cache keys, how they're used to create the batch request, and then how to match the results back to that cache key. This does require that your gRPC API supports a batch get method, but it probably already should, because if you're not doing this in your GraphQL API, you're probably forcing your client applications to handle this behavior instead.
9. Support for Apollo Federation and Future Work
The final feature of my DSL is support for Apollo Federation, which allows organizations to build a unified graph distributed across different teams. This approach enables teams to efficiently expose a GraphQL API on top of preexisting APIs, compose their APIs together, and reduce type coupling across domains and teams. Additionally, the directive-driven approach provides flexibility to combine different behaviors by adding different directives in the same file. There is still more work to be done, including support for GraphQL unions, translation between gRPC and GraphQL errors, and utilizing gRPC streaming RPCs for real-time data subscriptions. I'm open to feedback and suggestions for use cases and features that I haven't addressed in the DSL so far.
The final feature of my DSL I want to talk about is support for Apollo Federation. I see this directive driven approach working well for organizations that adopt Apollo Federation to build a unified graph distributed across many different teams. Each team that owns a single complex domain, whether that's payments, inventory, ratings and reviews, can use this approach to efficiently expose a GraphQL API on top of any preexisting APIs, just by writing a schema file. And then, they can compose their APIs together to build a single companywide API using Federation.
The two semantics I wanted to support are the ability to return references to entities and the ability to expose entities as entry points into the graph, which allows joining data across subgraphs. In this example, instead of resolving the relationship for the person type, with another fetch directive that makes a call to an API owned by a different team, part of a different organization, maybe in a different time zone. I can instead wrap the director foreign key in this really small type that does model the relationship, but using only data that I know about in this particular domain. If I add the Federation directives and identify this type as a keyed entity, Apollo Federation can use this data to fetch additional data for this type from other subgraphs. This is a useful practice in a distributed organization, because it reduces type coupling across domains and teams. Apollo Federation handles the data composition at the API layer. Instead of requiring that my team builds a bunch of synchronous requests to another team's services, which would tie my services' uptime to theirs. And lastly, I decided to allow adding the fetch directive directly to types, not just to fields. This is equivalent to the resolve reference hook in Apollo Federation. By providing a key entity this way, Apollo Federation can join data from this type with references and other data from other subgraphs.
In addition to supporting federation, I think what this points out is another benefit of the directive-driven approach, the ability to combine different behaviors just by adding different directives in the same file. Apollo has a built-in cache control directive that we could easily add here. We're also working on directives for operation cost, authorization, and many other behaviors that would all play nicely in the system. There's definitely a lot more to do in my prototype before it's ready for production. Some of the things on my to-do list include support for GraphQL unions, where Protobuf has this oneof keyword and it's not quite the same, but I'm pretty sure there's a way we can translate between them with the DSL. gRPC errors and GraphQL errors actually have more in common than you might think, but I haven't spent much time trying to figure out how to translate between them either. I think there's some really powerful ideas in this approach, including the ability to use gRPC streaming RPCs to power subscriptions for real-time data, but there's a lot of work that I have to do to figure out how to do that. And I'm sure I've missed a lot. I would love to know what use cases and features that you would want that I haven't addressed in this DSL so far.
To wrap things up, I want to revisit the values that I talked about earlier and score each of the approaches. The two traditional approaches, schema first and code first, are great. They're not going away anytime soon. They allow for a lot of flexibility in schema design and the ability to evolve your API and tie it to any data source under the hood. But using my rubric, they fail because they're really expensive. Writing all those resolvers is a ton of upfront investment and results in a lot of code that you have to maintain for a long time. Anecdotally, most companies I work with choose schema first.
10. Collaboration and Future Work
Collaborating on schema in GraphQL files is easier. The data-driven approach lacks control over schema design and API evolution. The directive-driven approach is compelling but has few implementations. We could build DSLs to support various data sources and simplify building GraphQL APIs. Check out the prototype on GitHub and join Apollo if you're interested in declarative GraphQL systems.
They find it easier to collaborate on schema in GraphQL files. Losing the ability to design and collaborate on schema is one of the reasons why I'm hesitant to recommend the data-driven approach. If you're generating your GraphQL schema from another data source, it's difficult or maybe impossible to design your schema for client needs and evolve your schema and data source separately.
I don't mean to denigrate the engineering work involved in the data-driven solutions. Some of the implementations are very cool. But as a long-term strategy, I have my doubts. Some of Apollo's customers have tried data-driven approaches, but ended up tearing them out when they see how little control they have over schema design and API evolution.
So, it's no surprise that in my very biased table here, the directive-driven approach checks all my boxes. But hopefully, I've successfully made the argument that it's a pretty compelling approach. The obvious downside is that there's not many directive-driven implementations that exist yet. We could build DSLs that support REST, SQL, SOAP, Thrift, arbitrary serverless functions, and maybe many other data sources. We could even put them all in the same black box implementation, which would make it really easy to build GraphQL APIs on top of the variety of systems that our team or company already uses.
If directive-driven GraphQL development interests you, please check out my prototype here on GitHub. And if you'd like to work on building out the future of declarative GraphQL systems, Apollo is hiring.