So, we've done lots of great things together. We worked together at MailOnline, at Threads, and when I did java library Jason also helped a lot. So, currently I founded a Simplex chat, which is a messaging platform that is the only one of a kind that doesn't have user identifiers, but this is not what the talk is about.
So, I'll hand over to Jason to introduce himself. Thanks, Evgeniy. Obviously, you know by now I'm Jason Green, I'm director of technology at Threads Styling. Threads is a fashion tech company pioneering the world of personalized luxury shopping through chat and social media. I also previously worked with Evgeniy as a principal engineer at the MailOnline. I've been a long time user of data validation with JSON Schema and in particular using AJV, which I've witnessed grow and mature so much over the years. I'm an early investor in simplex chat as well.
So as we all know, JSON is a widely used format that's generally considered to be flexible and easy to work with. However, it's important to be aware of some of the potential problems and challenges that it has. JSON is particularly wasteful.
2. Challenges with JSON and Importance of Performance
Passing JSON can be wasteful as you need to pass the entire data before checking its validity. JSON has security issues and can exhaust the call stack with deep structures or be used in DDoS attacks. Performance and reliability are important depending on the situation, especially when it affects user experience and satisfaction. Fastify is a library that tackles serialization by defining inputs and outputs in JSON Schema, increasing speed and improving data structure handling.
Now, it's not something you're gonna notice in your day-to-day debugging when you're working with it, but passing JSON can be a very wasteful process, as you need to pass the entire piece of data before you can understand or even begin to check if it's valid or not. Because of the potentially complex and nested nature of JSON, it can be particularly time-consuming to then go on and validate.
So before we are concerned about performance and reliability, it's important to think about when performance and reliability is actually important. It does seem like an obvious statement. You know, most people wouldn't go out of their way to make an argument that it's not important, but it's not going to be important for every situation. It really depends on various factors. Obviously, a slow app is better than no app at all. So if you have an application that's delivering value, you may have much bigger issues that you need to face before worrying about performance and reliability. Particularly in the early stages of app development, you're going to be much more concerned with time to market. If your app isn't even available yet, that's obviously a big issue. You're going to be concerned about budget, the overall user experience of your application, and of course, what are your users' needs and what's most pressing to them. However, it is going to be an issue when the performance is affecting user experience and satisfaction. That can risk you losing users and those people who go away from your application or site because it didn't load fast enough, they may not come back, which is obviously what we refer to as high bounce rates.
Even worse, if reliability is your issue and your customers are losing their work or their data is becoming corrupted, that's a big issue that, in the best case, can result in some apologies. In the worst case, you may actually end up having to pay for it in some way through compensation or discounts to keep people happy. So there is actually a solution to part of this problem, which is tackled by a library called Fastify, which is a replacement for your Express router. It tackles the serialization part of the problem, which is to say that by defining the inputs and outputs and the shape of them in JSON Schema, this library is able to more quickly serialize the responses and it can get quite good increase in speed because it's focused on ... because it knows the structure of the data it's supposed to be returning. In this way, it can take a lot that would normally be loops and turn them into straight property access. So if you talk about schemas, for a long time JSON Schema was the only way to define the format of the data or the type of the data or whatever you call it. It started from 2009 and since 2020 there is an alternative specification that was created to address the shortcomings.
3. JTD vs JSON Schema: Pros, Cons, and Debugging
JTD is better than JSON Schema for most API use cases. In cases where JTD is worse, consider writing code instead of using a schema. JSON Schema can lead to debugging issues due to its interpretation of schemas, causing confusion and errors.
You can spend quite some time comparing pros and cons of two specifications. You can watch it later and pause. Fundamentally, JTD is defined by its simplicity and one of the important qualities is it supports discriminated unions. But at the same time it has limitations on the supports but it's extremely well aligned with data types, unlike JSON Schema.
JSON Schema, on the other hand, has extremely wide adoption today. JTD is part of some API specification, but at the same time it doesn't support discriminated unions and it doesn't define JTD. It's a long comparison. It's a non-trivial trade off and I use it first-hand with HLB Library that supports both of those specifications. So to skip to the recommendation I can give, JTD is really much, much better for the absolute majority of the use cases in an API. So if you are building an API, you should be using JTD full stop. And for the cases when JTD is actually worse than JSON scheme, it's a big question whether you should be using a schema at all. You should probably be writing code and using schema.
So I'll demonstrate on some examples in some funny way. Right. So we headline that JSON type definition logic versus JSON schema what. So if you remember those intranet memes was what. What is that? So if you look at this schema. So what it defines. Majority of software engineers would discover any schema would say that this is this thing, it's obviously an object. What else can it be? And it has a property and the property obviously must be a string. And that's what JTD treats this schema as. JSON schema has an interesting view. So it has like, if the data happens to be an object and it has this property, then this must be a string and any other data will be valid in JSON schema, meaning any number or any strain or array or any object that doesn't have property foo will all be valid. It caused millions of hours debugging to various engineers and I have been answering, like, in Azure library, probably 50% of questions are about this kind of gotchas, right? Or for example, this it's a typo, right? Clearly properties is misspelled here and JTD responds correctly. This schema is just invalid. What else can it be? Well, JSON schema has a view. JSON schema believes this is a valid schema. It just has some property that we don't know what it means. And any data is valid according to this schema. Again, millions of hours debugging spend fixing errors like this in JSON schemas.
4. JTD Structure and JSON Schema Flexibility
JTD has a more strict structure for arrays, requiring objects as elements with specific properties. JSON schema, on the other hand, allows for more flexibility, leading to debugging issues. JSON schema is error-prone and often requires additional annotation or the use of strict mode. JTD is simpler and often a better choice for your code.
Or for example, for arrays, right? So JTD has this structure of array. So this data must be array, obviously it must have objects as elements, and this objects must have property foo and this property must be string, so lots of hard requirements on the data shape, right?
So if you have a similar schema in JSON schema, the only difference is keyword. Well, what it really means. Oh, well, one can guess the data doesn't have to be array. And the data doesn't have to have all elements to be objects. And if they are objects, they don't have to have property foo, like it's kind of a lot of different data types can be valid according to this schema, which again causes a lot of debugging, right? And the conjecture like this, right?
So what if we put square brackets here? I thought that's array. Why don't you put square brackets? Well, JTD just says it's an invalid schema. In JSON schema, this is unfortunately a valid thing that only validates the first item and ignores all other items. And it's an exceptionally common support question in AGV and it's millions of hours spent debugging bugs like that. So fundamentally JSON schema is exceptionally error prone specification that requires lots of additional annotation to express what you actually want to express. AGV found a solution is turned out to be extremely popular called strict mode. You effectively make all those cases mistakes, which is an extension or deviation from JSON schema specification and people use it. But fundamentally it just means that JTZ is simpler and in many cases better for your code.
5. Validating Data with JTD and TypeScript
Jason will show you how to do some magic with JTZ and TypeScript for data validation and parsing. With JTD, the process is more intuitive and straightforward. Let's dive into a nested object example: a Mario Kart character. We'll define the schema for the character's data, including properties like name, surname, weight, createdAtDate, and an array of weapons. Using the JTD Schema Type utility, we can ensure that the created schema is valid for this data type.
But over to Jason, will show you how to do some magic with JTZ and TypeScript and do actually some data validation and parsing.
All right. Time to look at some code. So just a remark on a few of the points Evgeny made there. Having worked with a lot of JSON schema, we ended up building a lot of things around these decisions and it takes a long time to work all those things out. With JTD I found that a lot more intuitive and a lot simpler and more straightforward.
So let's have a look. Now there's obviously a lot of different types of data that you can validate. I'm going to jump straight into a nested object with I think enough complexity that it's a good example for us to have a look at how we build a schema for it. So I've been playing a lot of Mario Kart with my family lately. Surprisingly my wife is quite good at it as well and we're both just jumping in and playing a lot of Mario Kart. So I couldn't come up with any other example, but a Mario Kart character.
So if we have our character, this is our data. I said, well, this is our type, our interface. It has a name, an optional surname. We all know Mario Kart characters. They have a weight. The heavier guys are the best, obviously. You have createdAtDate, just to give us an example here. And we obviously have an array of weapons. So each weapon has an ID, a name of the weapon, which is an enum, and a damage counter for how much damage this weapon will do. I'm actually going to just clear out all of this, because I want to show the process of writing the schema more so than anything else. Because we have this very interesting utility type, which AGV comes with, it's called the JTD Schema Type. And when we use this type, passing in our Mario Kart character, it will only allow us to create a schema that is valid for this particular data type. So I don't really have to, I mean, I can start off by typing, typing. No, actually, I don't know. Well, I actually don't know how to write a schema at all yet. So I can just go straight to my, you know, type ahead and trigger the, what do you call it, trigger, I forgot the name, but triggering the, the possible properties that I can have here. And we have a properties value. So we're going to need that to start with.
6. Defining Object Properties and Arrays
When defining an object, you specify the properties it should have. Start by defining the type, such as string or date. Optional properties are supported, and you can also define numbers with different types. The schema ensures data validity, allowing for nullable values. Arrays can be defined by specifying the elements and nested properties.
That's because when we're defining an object, this is how you do it with the properties, property. That sounds awful, but you get the idea. If I go to trigger again, now we can start putting in the actual different properties that our Mario Kart character, you know, needs to have defined.
You'll notice that surname's missing from here, and we'll get to that in a second. We're just going to start off by putting in, defining a couple of these. So we need a type obviously, and the type of this one is it can only be, well, it can be one of two things actually, but in our case, this is string. In this case, you'll see that when I go to put the createdAt type, it's only giving me the option of date. This time it's no longer allowing me to put in a string, and that's because it knows that from this type of the Mario Kart character, the createdAt is a date. Finally, we have weapons.
I'm going to get to defining that in a second, because I want to first jump to how we define the surname. So surname is an optional property. And if I look at my predictions again, we have optional properties. And in here, as soon as I start typing S, we get surname as well. So it just makes everything very intuitive. And then from this point, it's exactly the same as any other property that you're defining. I also forgot we also have the weight of the character and this is a number. And as you can see, there's a whole different, there's quite a bit more variety of different types of numbers that we can support in JDT, JTD. I'm just going to go with, given this is weight, I think probably U int eight will make sense. Just a positive integer there. And have I got that right? Oh yes. So this is the best part about this. It's not, it's not valid. It's not, it's not correct. This weight is not fulfilling the needs of this type here in the schema. And that's because I've made weight that it can be a null value as well. So here we have to pass nullable true. And then that's going to be valid as well.
Weapons. So when we want to define an array, I think you might remember from the example, Evgeniy showed before it is elements, and then from that point on, you're once again, just defining, it's just more, more of the same schema. It's sort of nested, a nested value now so that also in this case, we want properties because this is an array of objects to define a object in JTD, unique properties and we can once again go about putting in the rest of these values type that is number ID.
7. Enum Type for Name
In this case, we have a new type called name, which is an ad hoc enum. We pass in all the possible values as an array, and the type prediction is based on the defined type.
I'm going to go even 32 and name. So the name in this case, this is another new type that we've come across. This, we've made this an enum. Um, this is actually more of an ad hoc enum rather than using the num, uh, keyword in a TypeScript. And so in this case, we just need to pass in all the possible values for this enum. And that is in the form of an array. So as you can see already, it starts predicting what are the possible values. It knows this from the type that we've defined, which is really useful. And red shell.
8. Serialization and Parsing with MarioKart Schema
Finally, we have damage, which is another number. We can serialize data using a type-aligned serializer for the MarioKart schema, which is 10 times faster than using a regular serializer. We can also parse data using a parser generated from the schema, which returns the expected type or undefined if the data is invalid. This approach of parsing JSON directly to the application type and serializing a specific type improves performance and reliability.
Finally, we have damage, which is another number. Uh, we're going to make this a float 32. So we can have decimal points as well. And that is the full schema now defined.
9. Improved Security and Type Alignment
The parser fails on the first invalid character in the JSON string, improving security and performance. It has been used in large scale applications, including Wastestream. For the Storymaker project, JSON schemas were generated from types for type alignment. JTD and AJA make your job easier, secure your API, and save you time compared to alternatives.
If somebody will send the property that's not allowed, or of a run type, the parser will not parse the whole data structure that can be huge. It will simply fail on the first invalid character in the Jason string, which gives you much better defense against any kind of attacks and you never can end up with properties you don't expect in your data that may result in prototype pollution or in any other sense.
So this has really, really improved security and performance. And by now I know it's been used in large scale applications, even Wastestream where it's actually currently used in their offices to return the results. It was my previous company when I was leading the engineering team. They were kind to allow us. They use this approach as well, so we did it when I was here.
So, you know, one more note I had was that at for the Storymaker project, we built all of the state of the individual stories that you create is all managed on the server. And there's a lot of back and forth of deep, of big pieces of JSON. And we built, we did all the validation with JSON schema. However, because we wanted it to be type aligned, what we ended up doing was kind of similar in a roundabout way in that we generated JSON schemas from our types. And we didn't have any JSON schemas that were doing anything that you couldn't do in the type, TypeScript type anyway. So because we didn't, we didn't want that. We didn't want to have to second guess things and have an inconsistency there. So it, yeah, I think it's kind of the right way to go and it may seem more restrictive in what you can do with it, but I think I agree. I mean, I agree with your statement before. Those things you probably shouldn't be doing them in a schema anyway. So yes, JTD is fun. AJA makes it exceptionally powerful. So you use them both. That'll make your job easier and your API is more secure and you'll save yourself lots of hours lost on the buggin compared to alternatives. So that's, that's our recommendation from this talk. Thank you.