Apps are hard enough to build without having to worry about layers and layers that sit between your users and the database. In this talk we examine trends in serverless computing, and their impact on modern databases and API layers.
The Diminishing API Layer
AI Generated Video Summary
Hi, everyone. My name is Tejas and I work on slash GraphQL at dGraph Labs. Today, I want to talk about the diminishing API layer and how we as a community move all that logic out of databases and then one day suddenly decided to move it all back in. 2020 has been a difficult year for everybody, but one good thing that's happened to me is my partner and I have just had a baby girl. As I read The Very Hungry Caterpillar, it gave me a lot of free time to think about philosophical questions about data and its relationship to logic and code.
Hi, everyone. My name is Tejas and I work on slash GraphQL at dGraph Labs. You can find me on Twitter at At T. Dinker.
Today, I want to talk about a little bit about something that's been interesting to me for a while and I'm calling this talk the diminishing API layer. However, I also had an alternate title for this talk and that was there and back again, or how we as a community move all that logic out of databases. And then one day suddenly decided to move it all back in.
So 2020 has been a pretty difficult year for everybody. It's been a global pandemic and things have been very hard. But one thing that's good that's happened to me is my partner and I have just had a baby girl. And so as I read The Very Hungry Caterpillar for probably the 4000th time this year, it gave me a lot of free time to get inside my head and think a lot about philosophical questions that really may not have any actual practical implications. And one thing I've been thinking a lot about these days is what is data? What actually do we mean when we talk about data and how is it different from logic or actual code?
2. Code and Data Coupling
Clojure has the concept of Homo-iconicity, allowing code to be expressed as data. Domain-specific languages have also blurred the line between data and code. This raises the question of whether code should be stored in databases. Early examples of tight coupling between code and data, like Oracle Forms, led to difficulties in testing, deploying, and managing systems. This led to the era of boilerplate, where APIs were decoupled from databases. Unit tests would mock databases, and code would be developed against interfaces like iRepository.
And Clojure has this concept of Homo-iconicity, right? And Homo-iconicity is the property of a programming language to be able to express your code as data, right? So in Clojure, you can actually write a macro, which basically accepts actual code, and you can process it with the same sort of data structures that you would process your regular data with. And even outside of programming languages like Clojure, domain-specific languages have become very popular over the last, I don't know, 10 or 15 years.
And domain-specific languages are simply purpose-built languages where you have various rules which are code, sort of written in some format, and then they're interpreted by whoever's parsing that domain-specific language. So once again, we see over here as well, you really have this distinction between data and code blurring significantly. So this sort of leads me to the next question, that is, if code is data, then is there a difference between your code data versus your data data? And if not, should your code actually be in your database?
So the first time I ever sort of came across something like this, where you're very closely coupling your code and your data, was this where sort of something I worked on very early in my career in the late 2000s. And my first experience with something like this was with Oracle Forms. So Oracle Forms is a great example of a very tight coupling between data and your code. And it's not so popular today for various reasons. But back then, Oracle would give you the database. It would give you the language that you would build these tools with. Your inputs would typically be drag-and-drop pages that you've built. And then you operate on them with PLSQL, and you shove it into your database, and then you finally query it out with views and SQL.
So in essence, your code and your data was together one single unit that would be deployed to various places. And it's not that Oracle Forms was the only one who was doing this by any means. There's many of these other things, like for example, FoxPro, for example, comes to mind. And even the systems that had a separate code layer, you would still very often rely on very database-specific features, like triggers and procedures. And this did work. You would have very large systems that are built out with these tools. And they were reasonably easy when you were first building them. But people realized pretty quickly that this was very tough to test. It was hard to deploy, because you'd often be just rewriting a stored procedure in your database. And it was tough to manage. It was impossible to version. There was just so many stuff that needed to happen. And I feel this sort of tight coupling led to everyone going in the exact opposite direction for a long time. And we sort of entered what I'm calling the era of boilerplate. Right?
So in the era of boilerplate, you'd have a bunch of APIs, and just without calling out any specific language, you know, maybe that would be like a Data Bean factory factory implementation. And these APIs would be very decoupled from your actual database. In fact, you would go so far as to be very proud of the fact that unit tests would have your database and stuff mocked. Instead of writing against an actual database, your code would be developed and deployed against the iRepository fit interface, which, you know, you expect the repository to be sort of implemented.
3. Code and API Complexity
Instead of writing against an actual database, your code would be developed and deployed against the iRepository fit interface. Your APIs would have their own endpoints, but there was no convention to determine the API actions or inputs. Dealing with new APIs required understanding their intricacies. Some stored procedures remained, leading to a tangled mess of APIs, layers, and indirection. Managing changes became difficult, especially with microservices.
Instead of writing against an actual database, your code would be developed and deployed against the iRepository fit interface, which, you know, you expect the repository to be sort of implemented. And that never even gets tested in the case of unit tests. And your APIs, they would be very, you know, soapy in the beginning. Which sort of meant that every single action that you could perform in the system had its own sort of end point. But there's no real convention for you to figure out that, okay, fine, this is the action I want to perform. This is what the API probably looks like. There was no way to figure out either where the API was or what the API inputs would sort of look like.
Around the time, of course, things like WSDL became very popular. But it still was not necessarily always meant for human consumption. If you were dealing with a new API, you would need to sit down and get into the nitty-gritties of how your API sort of worked. You did get rid of a lot of those stored procedures because now the database was considered a little bit untrustworthy even, and your application code was the blessed thing that was working. But, you know, a few of those really ugly stored procedures that you just couldn't get rid of, yeah, they were still stuck around. This actually did work for a while, but I think at some point, this sort of became a real tangled mess. You just have so many APIs and, you know, so many layers and layers and layers of indirection between, in essence, what's your API and what's your database, and you'd have so many systems and so many layers in between that things started becoming very confusing, very difficult to manage at any time you wanted to make a change. And this problem just got a lot worse once you started looking at things like microservices.
4. Rails and REST
When I first came across Ruby on Rails and REST, I was impressed by how the code could generate the database and vice versa. Rails made it easy to believe that the code and database are a single unit, with tight coupling and database integration. However, debugging Rails can be difficult due to the numerous layers of magic. Despite this, Rails popularized REST and made it easy to reason about APIs, although the predictability of the API itself was not guaranteed.
Yeah, so like every single API would have its own sort of format, and, you know, as you added more APIs, it was more and more difficult to debug what was actually going on. So the next thing for me that I saw that, like, was really, really impressive was when I first came across Ruby on Rails and REST in, I think it was around 2012. It was really impressive, because for the first time ever, I could see Rails in which your code generates your database, and then your database generates your code back and your API.
So what I mean by that, like, with Rails what you would start with is by defining a migration, and you say, you know, I'm creating this sort of table and these are the fields that are in it. And based on that, your database would be completely up-to-date, and it would even check and say, okay, this field was missing since the last time this migration, you know, your migrations ran, it'll create, you know, columns and stuff like that. But then, and this is where the magic of Rails really came in, when your Rails server actually started, it would read back from your database, and it would generate all the fields, all the methods to get those fields, all your setters and getters, based on the actual fields in your database. And your API also would respond based on those fields. And I think for the first time, once again, Rails kind of led you to believe that your code and your database are a single unit, right?
Your tests were expected to cover the database, Rails famously, you know, in unit tests, it actually loads up data from the database and you make assertions against that, as opposed to all this mocking and stuff like that. The conventional wisdom over there being that, you know, that you want to mock stuff that's outside of your service, but your database is part of your service. And I think this tight coupling was really fantastic. And for someone who was new, Rails was magic. And I think Rails is magic. And I think that's both the best and the worst part about it. With a simple scaffolding, you get a working API, you get a working server, and you get a working database. And this relies on a lot of meta programming magic from Ruby. Ask any Ruby programmer and they'll tell you that Rails is fantastic when things are working. There's no faster way to get anything done. But when things aren't working, you have to go through so many layers of magic in order to debug what's actually going on. And debugging Rails tends to be a little bit difficult. There's just so many of these magic layers that Rails sort of adds. But with every layer that you add, you're just adding that much more complexity between your API and your database.
But at least one thing that Rails did make very, very popular was REST, and it made these APIs at least very easy to reason about. So in REST, or at least the Rails flavor of REST, every single API presented itself as a CRUD. That's a create, read, update or delete operation on some resource. And quite often, or almost always, well, no, that's not true. Quite often these resources had one is to one database entities. So as a result, you as a programmer could just make a few changes and you'd have an API as well as a database. Both of which are in sync with each other, tied together through some magic Rails glue. And I believe this REST made the question of where an API endpoint was more predictable, but it didn't necessarily make the what of the API more predictable or easier to use. So what I mean by that is that if you wanted to fetch like a particular resource, you could pretty easily guess the endpoint where that API would be.
5. Code and Boilerplate
Even with all the magic from Rails, about 70% of the code ends up being boilerplate. Writing and managing code for typical scenarios like creating records and handling nested entities should be automated.
And in fact, from that resource it would even link you to other resources that are connected, but you wouldn't necessarily know what fields to expect, or you wouldn't be able to control what data you want to fetch or what nested entities of that you want to fetch or pass limits in an easy sort of way. And as a result, even with like all this sort of magic that you have from Rails, I would say maybe about 70% of all your code ended up being boilerplate just tying all this together. And we, just to say that you know, like you have very typical situations, like I have to write this one record and I want to write three sub records under that. And it's not, this is not something that's very complex. It's not, you know, the kind of code that everyone wants to spend their day writing. But you know, a good chunk of your code would be handling these sort of scenarios where it's just like, okay, write this and all the nested entities or, you know, remove these six entities and replace it with these three entities and like all these things which ideally should be automated for you. There's no reason that these should be things that you're writing in in actual code.
6. Backend, Boilerplate, and GraphQL
Programmers hate boilerplate. The rise of the backend involves a single system that combines data and logic. Firebase, Hasura, and Slashgraph QL aim to generate boilerplate code automatically, allowing developers to focus on building rich apps and iterating quickly. GraphQL fits nicely as it acts as a glue between data and APIs, answering the question of what is needed to generate a functional API. GraphQL's tight coupling with the schema allows for expressing relationships between types. For example, in slash GraphQL, CRUD operations are generated for types like products, customers, and reviews.
And as we all know, programmers really hate boilerplate. And I believe this led in a very significant way to the rise of what is called now the backend, right. So what is a backend? I think like when you look at a backend, a backend is basically a single system which contains both your data as well as your logic or code, whatever you would like to call it, which is typically deployed in the cloud which handles the scaling and the maintenance and operations of it. And basically it provides no code or a low code experience to you, the developer.
I believe Firebase, Hasura, and Slashgraph QL, the latter of which is what I work on, definitely come into this into this category. We're trying to make it very easy for people to build out code. The first step of this is how can we generate your boilerplate code for you? That's 70% of your code. How do you just generate that automatically and not let developers worry about that? And how can you let developers instead focus on what they want to do, which is building out their super rich app, and allow you to iterate quickly on that, right? So adding new types or adding new concepts into your backend should not be something that takes hours and requires downtime. It should be deployed in a matter of seconds. You want to be in a position where a small change you can iterate very quickly and get your features out to your user. And we believe a good backend will actually do all these things for you. So they'll be very tightly integrated between your code and your data store, and we believe this results in much more elegant code.
So I think as a result, GraphQL actually fits very nicely over here. GraphQL was traditionally developed as a higher level API language. It was primarily used for browsers to fetch data from your backend, but I think it's quickly becoming more and more popular amongst the database crowd as sort of the glue that sits between your backend and your... I mean, sorry, between your data and your API. I think in many ways, Fauna, DB, Hasura, Splash GraphQL, we're all trying to answer a very, very similar question, which is... And that is what is the minimum that you would actually need in order to generate a fully functional API for users? And I think why GraphQL fits in so popular over here is the fact that GraphQL comes highly coupled with the schema, which adds types. And of course, these types are not just simple types, which you say that, okay, I have these entries and each one of these have these fields under it, but it also allows you to beautifully express the relationship between these types.
So let me actually take an example of where we generate various CRUD operations for you from your types. This example comes from Dgraph or slash GraphQL just because it's the one I'm the most familiar with. Over here I have three simple types, products, customers, and review. And each of these products have a few simple fields. For example, review has ratings which are numbers and comments which are strings, but they also have relationships with each other. A product has many reviews and a customer has made many reviews. And so if you actually input this into slash GraphQL, you'll get at least the following queries. There were actually too many so I couldn't even list them all on one page. Let's just look at the products then. You have two queries, one to query a product and one to get a product by ID. And you have all your update, add and delete operations on the product as well.
7. Directives in GraphQL
Directives in GraphQL have become one of the most powerful tools, allowing for preprocessing and post-processing of requests and responses. They can intercept requests before processing and even short-circuit the processing if necessary. Two innovative uses of directives include the auth directive, which provides authorization on records, and the at-length directive, which allows for validation of field values. Another example is the at lambda directive, which is used for complex code that is difficult to express purely in terms of GraphQL.
And a good system won't just make this so that these are just top-level items. You could create a product and create the reviews under that product in a single query through a nested update. And I believe most good backend as a service systems will actually provide this for you. And I think this really covers sort of that top 70% of the code that you would spend in writing boilerplate. But this begs the question about what is that last remaining 30%, right? Does GraphQL provide anything for this, right? And anything for backends to leverage?
So for me, the answer is actually, yes. And I don't know if this was the original intention when this was introduced into GraphQL, but I think directives have very quickly become one of the most powerful tools of GraphQL. Directives, for those who aren't aware, are annotations that you can put in various places in GraphQL. There are directives that you can put on your schema. There are directives that you can put into queries. And at a very high level, how these directives work is they're sort of decorator patterns around the actual operation that happens in GraphQL. So what is a decorator pattern? Basically, what they do is the directive intercepts the request before it's processed. And it can do some preprocessing, and it can intercept the response on the way back and do some post-processing of the response. In fact, if the directive so deems so, you don't even need to do the actual processing. You even short-circuited directly at the directive. So over here, I wanted to maybe walk through maybe two or three innovative uses of directives that I found in the wild. And yeah, maybe we'll just take a look and just see how they are used and what you can extend them for.
The first comes from D-graph and slash-graph-ql. It's a directive called auth, right? So the auth directive over here is actually a way to provide authorization on your records. So in this example, I have a to-do. And this to-do says that if you're trying to query the to-do list which is on line four, then the user who is trying to query it must match with the to-do's owner. So this simple rule is basically applied as a filter before any queries fetching to-do's are made. And as a result, users are only able to query the to-do items for which they are the owner. In this particular example, or in D-graph, this is actually implemented as a pre-filter. So in a sense, if you just don't have access, the to-do is filtered out even before it's able to read out.
Let's take another example, this one from Apollo, the at-length directive. So over here, I have a very simple to-do thing, it just has an ID and a title, but the title is marked as saying the length of this title should have a maximum of 42, presumably characters in this case. You can see how this would work, and what you would do is you could potentially have multiple different fields and different kinds of validation on them, like maybe the length of something, and something else should be an email, and maybe your password and password confirmation have to match and be the exact same bytes. And you could very easily write a validator that walks through all of these validation rules, and is able to sort of, you know, make sure that these validations all pass before any data has been entered into your data store. And the last example I wanted to show is from Dgraph and slash GraphQL again. What we found is that some code is just so complex it's, you know, just difficult to express it as a, you know, purely in terms of GraphQL, so we've added the at lambda directive.
8. GraphQL and the at lambda directive
Yeah. And so in conclusion, I feel like in many ways we've come full circle. While we started out with your database and your code very tightly coupled, we moved through an era where we're like, no, let's separate this as much as we can. And then once again, we started viewing our data as a single unit of both your data store and your logic. In many cases, in fact, you even sort of talk about these sort of great pairings together, like you have Firebase and cloud functions, you have AWS Lambda and DynamoDB, or slash GraphQL and d-graph lambdas. You'll continuously have these sort of pairings where you kind of start thinking about your data and your logic as a single unit. And I think databases and database adapters, really backends, are really starting to embrace this. And we're sort of moving away from, or well, many people are moving away from serverless code, where serverless code was just that, hey, you've written a function and don't worry about the server. We'll scale that. To a truly no code or minimal code sort of experience with a backend as a service. And a lot of these things that we've discussed over here, a lot of these trends, are some of the reasons that we've came up with slash GraphQL at Dgraph, just as a side note. And at least I believe that GraphQL is sort of really perfect for this. Though it started as an API language, we're quickly realizing it can be adapted to more use cases. Its type system lets you easily express what your data stores sort of looks like, and you can easily generate APIs from that based on schema. And for that, it becomes like a great fit for at least getting us 80, 90% of the way. And with that, I would like to end my talk. Thank you all for attending. I'm happy to take any questions.
Hey, good to see you. Hey, thanks man. We got some feedback that they heard some snoring. Can you tell us what that was all about? Yeah, I'm so sorry, my daughter was asleep nearby and I didn't notice that it was actually audible. She's become a bit quieter now. Okay, good, good. Well, a baby is asleep, that's fine by us, right? That's the reality of working at home. I actually- Yeah, I was just saying in discord, I hope that's not really reflective of the quality of my talk.
Working from Home and Directives
No, no, no. If that's the thing that people are upset about, then the contact is good, right? Like I said, it's the reality of working at home. But now I would like to go to the questions. And I have a question. Where can I put my directives? As per the GraphQL spec, you can really put directives almost anywhere, to be honest. So, you can actually put it on types, fields, fragments, operations. There's also currently a spec that even allows you to sort of add directives at the schema level. We actually caught a security vulnerability in our own app just by introspecting the schema and what directives are actually there. One way, our GraphQL API, every endpoint that needs login or certain permissions, we have a directive called at needs login. And we just specify at needs login, needs permission, XYZ or whatever, some roles that you actually need. We were able to find a vulnerability before it was even exploited. Thanks. We have another great question from the audience from Juan Segebre. Can you put directions in a mutation to parameters, for example, the skip directive? I don't know if you can put directives in a parameter. I don't think you're allowed to do it on the parameter. However, on the mutation itself, in the operation, you can put operation, you can put the directive there.
No, no, no. If that's the thing that people are upset about, then the contact is good, right? Like I said, it's the reality of working at home. I was looking at my agenda when I had this conversation with you and I had a time slot for my groceries being delivered. So, luckily they are already here, but I might have just had to step out to have my groceries, except my groceries at the front door. So, that's the reality of working from home. Also, the glorious life of a conference MC and speaker. There are disturbances.
But now I would like to go to the questions. And I have a question. Where can I put my directives? Cool. That's actually a great question. Let me just think of it. So, as per the GraphQL spec, you can really put directives almost anywhere, to be honest. So, you can actually put it on types, fields, fragments, operations. There's also currently a spec that even allows you to sort of add directives at the schema level. I think this hasn't been approved yet. It's still in the working group. So, even at the high level in the schema, you could put directives. Actually, a pretty fun thing over here is that while I was actually writing the stock, I didn't get a chance to add a slide to this, but we actually caught security vulnerability in our own app just by introspecting the schema and what directives are actually sort of there. So, one way, our GraphQL API, every endpoint that needs login or certain permissions, we have a directive, which I didn't cover in this talk, but we have a directive called the at needs login. And we just kind of specify at needs login, needs permission, XYZ or whatever, some roles that you actually need. And actually, we were just reading the schema one day and we're just looking at this needs login, that needs login. Hey, doesn't this API logically or doesn't this operation as a mutation, so doesn't this need a login and doesn't it need a specific role for you to do that? And so just by introspecting the schema, we were able to kind of find a vulnerability before it was even exploited. Actually, it's one of the reasons I just love directives so much because they're so short, and they kind of are very right on your schema.
Okay. Thanks. We have another great question from the audience from Juan Segebre. Can you put directions in a mutation to parameters, for example, the skip directive? I don't know if you can put directives in a parameter. I don't think you're allowed to do it on the parameter. However, on the mutation itself, in the operation, you can put operation, you can put the directive there.
Skip Directive and Conclusion
I think the skip directive is one of three standard directives, but it's only allowed on the mutation itself. I want to thank you for making me feel young again and it's been great being here. You can connect with me on Twitter at tdinkr and check out slashGraphQL at dgraph.io.
I think specifically this question is about the skip directive. Skip is, I believe, one of three standard directives. I think this includes skip and deprecate it. I'm not actually sure if you can... I think it's not allowed on a parameter. It's only allowed on the mutation itself. Okay. I hope Kwan is happy with that answer. Sorry, I have to cough. I'm going to mute.
So I wanted to say it was nice having this blast from the past and looking back at how we used to do things. And I felt like I was 14 again in school in my first days as a programmer when I was starting out maybe with I think it was SQL the first time I was touching databases. And so I really want to thank you for making me feel young again. That's really nice. I don't know if I should be feeling old but thank you. It was lovely to have the opportunity. Yeah, it's been great being over here.
All right, awesome. So you are going to go to your speaker's room right now on Spatial Chat if people want to discuss this further with you. So thanks a lot for coming and then people can talk to you on the Spatial Chat or you can maybe plug your social media. Sure, I'm available on on Twitter at tdinkr and also if you guys haven't checked it out already please do check out slashGraphQL. In my opinion it's the fastest way to get a GraphQL endpoint up and running and you can check it out at dgraph.io. Awesome. Well, you heard it here first folks. Let's go there and thank you.