ArkType is a new runtime validator for TypeScript and the first library with the goal of making type syntax available 1:1 in JS with no compilation step.
It uses a carefully optimized static parser so that with each character you type, you'll see a list of completions, a clear ParseError, or your inferred type. At runtime, a simple definition like "string|number[]" will be transformed into a TypeNode that can be used to validate or transform inputs, compared to other TypeNodes, or combined with other definitions to form new TypeNodes.
This talk will cover the process of building ArkType, with a focus on the type-level parser and runtime type system, and demo some of the most exciting features like scopes, index signatures and generics.
ArkType: Bringing TypeScript to Runtime
Video Summary and Transcription
This Talk discusses the concept of runtime validation in TypeScript and how it bridges the gap between TypeScript's expressiveness and runtime capabilities. The speaker explains the evolution of top-down parsing and the shift-reduced parser that made runtime validation possible. The benefits of runtime validation in terms of flexibility, scalability, and efficiency are highlighted. The integration of validation and the type system is emphasized, along with the enhanced validation capabilities and new features offered by the Archetype framework.
1. Introduction to Runtime Validation in TypeScript
Hey, everyone, my name's David. I'm here to talk about runtime validation in TypeScript. There are great solutions out there, but there's a gap between TypeScript's expressiveness and runtime capabilities. I asked myself how we could express a TypeScript type for runtime use. The answer is simple: leverage the same structures as JavaScript. With some adjustments, we can achieve a one-to-one parallel between TypeScript and runtime validation.
Hey, everyone, my name's David. I'm very lucky to be here today and have the chance to talk to you about one of my absolute favorite topics in the TypeScript ecosystem, which there are many. But as many of you may know, one of them is perhaps the nearest and dearest to my heart, which is runtime validation. So this is something that's been discussed very frequently by the community in the past and solved many times by some fantastic engineers. So there are some great solutions to this out there.
But when I was looking at this problem, I couldn't help but feel there was this gap between the expressiveness and power of TypeScript and its type system and its syntax versus what was available at runtime through some combination of builder methods or various things like that. So essentially this is a couple years ago I asked myself this very dangerous question of what is the closest we could get to expressing a TypeScript type like this in a way that we can use at runtime. So remarkably the answer is pretty simple. And I don't think there's actually as much ambiguity as there is when you're answering most design problems like this. So luckily TypeScript leveraged a lot of the same structures for its object literals, tuple literals, etc. as our built in to JavaScript.
So we can do the same thing. We can say name as a string. Sure. OK. So we have to embed this. We got a device. We got a nested object here. Platform. So this will be a little tricky because they're already in a string. So probably have to do some kind of nested quotes or something like that. So that we know we're still in a string literal since Android and iOS aren't keywords. And then just a couple ways you could go about this one. But let's just go with this. So this is the closest I think you could basically get to a one-to-one here, if you compare these two things, look at this. You know, we don't have an as const here, but essentially these two have a very strong parallel, right. So the question is, is this structure something that we could theoretically use for runtime validation in a way that captures the essence of what makes TypeScript index so powerful and extends that for some of the core needs of a runtime validator.
All right, so fast forward, give or take a few months. Basically what I'm facing here is we need some way to take that original structure that looks just like TypeScript type but infer back out the original TypeScript type without all that runtime embedded syntax that's designed to fit within JavaScript. So essentially, after some iteration, I came up with this initial solution.
2. Evolution of Top-Down Parsing
You can see that I called it parse type, which eventually evolved into arc type. It was the beginning of my iterate on types, type iterate. It had some inherent limitations. I added cyclic inference capabilities. I added function parsing for some reason. This top-down approach had little control over precedence and other issues. Eventually, I realized it's not going to work for a fundamentally scalable solution.
You can see that I called it parse type, which eventually, as you can probably guess, evolved into arc type. And it was just a simple process of a couple of years and some iteration. So as you can imagine, this is the beginning of my iterate on types, type iterate. And it has, kind of, been a theme since then. But there's a few intermediate stages and you'll be able to see a little bit about how some of this evolved.
So this is my initial stab at things. It's got some fairly complex types in it. It's this top down parser approach doing a lot of pattern matching. It's a pretty... I'll say familiar. I mean, this is a little crazy stuff still. But in terms of what had been done within TypeScript in the past for parsing, it is kind of this, like, hey, does this match this template expression? If it does, then infer this part of the syntax. Otherwise, do the same thing. So somewhat straightforward. But I would find that it had some kind of inherent limitations.
So got some nice error messages. Impressively, one of the first things I added was this, kind of, cyclic inference capabilities. So it's able to do that. So that's cool. I added function parsing for some reason, which is useless for run time validation. So I think I just thought it was cool or something. I'm not sure why that was there. This precedence issue would kind of continue to be a thorn in my side. Because I had very little control with this top-down method in terms of ensuring, for example, that the array operator had higher precedence than the union operator. And that was really the most straightforward manifestation of this issue. As I went on, I would discover others, like trying to represent string literals like this, yes or no. Well, you need to make sure that it's not interpreted as the string literal containing that union operator. So this top-down approach, you know, I'm going to continue and try to work around it for a while. But eventually I'm just going to figure out, you know, it's just not going to work. This case was the specific one that made me realize, okay, there's no way that I'm actually going to be able to use this approach at all if I want a fundamentally scalable solution.
3. Shift-Reduced Parser and Improved Efficiency
Both in terms of performance and in terms of capabilities of parsing more complex syntax. The idea of a shift-reduced parser is a fundamental breakthrough that made arctype possible. Instead of a top-down approach, we keep track of a state with various branches and data to parse a string. TypeScript handles this efficiently, allowing us to represent complex syntax and write types close to imperative runtime code.
Both in terms of performance and in terms of capabilities of parsing more complex syntax.
Okay. So where do we go from here? How do we solve this problem? This ridiculous type with the union operator and two string literals, and how are we going to do this? Pattern matching is kind of at its limit here in terms of what we're going to get. So where do we go?
All right. So back to present day. We're back in our editor. It was that problem that really led me to the first sort of fundamental breakthrough that made arctype possible. It's the idea of a shift-reduced parser. So it's pretty different from the top-down approach that we had before. Let's see if I can find this.
And basically what you have is, instead of having this top-down approach where we are matching against various expressions across the whole string, this is, you know, there's a lot going on here. But the core of what you really need to understand is that we actually are keeping track of a state that has various branches and has groups and various other data that we need in order to parse a string. And it's just a simple algorithm where we're looping, checking if we have something to operate on. So, for example, maybe it's string or number or the literal five. Or we don't, and we need to get something to operate on. So we keep looping like this. If we don't have anything to operate on, we get something to operate on. And if we do, we figure out what's operating on it. So it's actually really straightforward. And it lends itself really well to this problem.
TypeScript handles this incredibly efficiently as we'll get into a little bit in the future. Basically using this method, both allowed me to represent much more complex syntax than I would have otherwise been able to do. But also allow TypeScript to do it much more efficiently, which was really surprising to me. Because the logic is not trivial. This is in my opinion kind of one of the more beautiful parts of the type level implementation here. You can kind of look and see this parallel between the runtime implementation of the parser and the static implementation of the parser. And certainly, you know, in this case, I definitely went a little bit out of my way to kind of create that parallel, but it really kind of demonstrates, in my view, that with some of the right patterns in mind, types can be written that really are quite close to imperative runtime code in terms of their expressiveness, in terms of their capabilities. Of course, eventually you're going to hit the limits of TypeScript's type system a lot faster than you're going to hit the limits of your language in general. But for something like this, it works remarkably well. You can kind of, again, see some of the patterns here where we're checking the remaining part of the string, we're kind of just getting the next token, and then based on that, making some decisions about what to do with the state and how to continue.
4. Benefits of Runtime Validation in TypeScript
This is a remarkably effective way to parse strings in TypeScript. It provides flexibility to handle complex cases and can scale as needed. It's efficient and has the potential to revolutionize type authoring for runtime validation.
So, again, this is really a remarkably effective way to parse strings in TypeScript. Everyone should totally use, if you're parsing something more than trivial, this is absolutely the way to go. Maintaining the current state, getting the next characters, and it gives you as much flexibility as you need to handle these very complex potentially cases where you're maybe having to disambiguate between various operators or things like that, things you'd never be able to do with a top-down approach. So it really can scale as far as you need it to, which was amazing for me to find. I thought that I had kind of hit the end of the road in terms of a lot of the utility of the project when I ran into that last problem, thinking that this kind of implementation would just be out of reach in terms of performance for the compiler and everything. But to the contrary, it's actually much more efficient. And this was the breakthrough that I had, where I was like, wow, this really has a lot of potential. Like, if I can parse arbitrary syntax, and it's really performative to do that, and then I can infer that one to one as types, who really knows where this thing could go? This could feel amazing to author our types for runtime, get live validation and everything. So I was really, really excited.
5. TypeScript Type Safety and Performance
If you're a grizzled veteran of the TypeScript ecosystem, you might have some alarm bells going off in your head when you see types like this. Libraries with similar types often crash TypeScript servers. To address this, we built a framework called a test that benchmarks the number of instantiations contributed by any expression. We ensure accurate and performant inference, even for complex types. The type level parser capabilities are advanced, providing an IDE-like editing experience. Strings are now type safe within TypeScript's type system.
So if you're a grizzled veteran of the TypeScript ecosystem, and see a type that looks like this one, my guess would be, probably have some alarm bells going off in your head. Because despite my enthusiasm, your experience might be that when libraries have types like It's very frequent that they will, for example, crash your TypeScript server. That's maybe the most frequent reaction I get when someone sees Archetype for the first time by a factor of three or four. But rest assured, the problem itself kind of necessitated, we come up with some better solutions for type level testing, including performance testing.
So we built a framework called a test that can benchmark at a type level, the number of instantiations contributed by any expression. So you can see that we have this very granular metric for every type of expression in Archetype. We can ensure that it's inferred performantly and accurately, even for a very complex types like this one. You know, we know that it's going to do exactly what you expect and it's going to do it very quickly. So you don't notice any delay whatsoever.
And in terms of functional assertions, for example, we could check out our divisibility tests over here. A little bit more space. You can see that the types of assertions we're making here at a type level, we have all sorts of syntactic errors, semantic errors. At this point, the capabilities of the type level parser are really, really advanced and essentially on par with a kind of like native IDE editing experience. So, for example, let's say you want to define some type. Say it's a number. Of course you get autocomplete for everything in your scope, including all the keywords. And then maybe it's divisible by two. It's not going to affect the inference. That's a runtime constraint. But if I were to write, oh, you see we're missing an operand there. Oh zero. You get a specific error for that. Let's say I try to divide by unknown. Essentially we can narrow down exactly what the problem is whenever anything goes wrong like this. So the idea of strings not being type safe, that's also out the window. Again, this is really on par with the native IDE experience just from within TypeScript's type system. So that's a really, really exciting point to be at with all this. And I've just been so excited to share with you all for so long. So but even with all this implemented there was one thing that was nagging at me as I was getting ready to release this quite a while ago. And it was just something about the way the types composed together and reduced together that didn't feel quite right.
6. Enhancing Type Comparisons and Runtime Constraints
I wanted the ability to compare arbitrary types together, reduce intersections and unions, and have a runtime type system. TypeScript now treats runtime constraints like divisors, ranges, and regex with the same rigor as comparisons between other types.
I couldn't put my finger on it because I really work much with type systems in the past. But what I eventually figured out what I was missing and what I wanted was the ability to compare arbitrary types together, the ability to reduce intersections and unions as completely as possible to go beyond shallow validation capabilities and truly have a runtime type system. And you can see some of that after very significant iteration has finally come to fruition where in addition to the kind of standard constraints that TypeScript covers, it treats runtime constraints like divisors, ranges, regex, etc., with the same rigor that you treat comparisons between numbers and strings and objects and things like that.
7. Integration of Validation and Type System
It's all integrated together into a single type safe system that you can use, for example, to do more than just define validators. It's essentially a full type system as well at this point on top of all the validation capabilities.
It's all integrated together into a single type safe system that you can use, for example, to do more than just define validators. You can compare them to one another. You can say, you know, does this numeric type extend some other numeric type that maybe is an integer or it's divisible only by one. And it will, because it's more specific than that. So it's essentially a full type system as well at this point on top of all the validation capabilities, and really this was an amazing problem to work on. It gave me a much deeper understanding of the kind of work that the TypeScript team does, and really it was just a lot of beauty in having a chance to work with these kinds of types.
8. Validation Capabilities and New Features
Archetype offers significant improvements in validation capabilities, such as automatic discrimination of unions and optimized runtime validation processes. It outperforms other validators in terms of performance, especially in cases where it can leverage its deep type system knowledge. The upcoming release of Archetype introduces new features like generic inference capabilities and the ability to define custom keywords within a scope. These features enhance the flexibility and power of runtime validation in TypeScript.
In general, to find the, you know, simplest possible representation that allows types to be composed together, compared against one another, and reduced fully was really such an amazing problem to work on, and it led to a lot of really significant new capabilities within validation as well.
By having this type information, for example, we can do things like discriminate unions automatically. So for example, in archetype, you can just define a union like this directly, and essentially we will, within our type system, identify the optimal discriminant, or set of discriminants, in case, you know, it requires multiple checks to figure out which branch you're on, and we'll implicitly, you know, optimize the actual runtime validation process around leveraging those checks so that most unions are, without you even having to think about it, going to be checked in constant time, whereas other existing validators without that type system information are either just going to be checking it linearly, or they're going to require you to, you know, opt in and manually specify it through a syntax like this.
Zod, I think, actually, specifically recently was having some issues maintaining their discriminated union types, and, you know, even defining it explicitly, it's really expensive at a type level. You can see there's a massive disparity of about 20 times fewer instantiations for the union here, and of course it's just a much more straightforward type to define and read as well. You mouse over it, you see exactly what the union is, versus we go here, and I mean, I don't use Zod constantly, but it's a lot harder to tell what's going on exactly without being able to, you know, see many of the parameters, and maybe you have to pull it out with Zod.infer. But there are some pretty significant disparities, and largely those come out of the two major areas we've discussed. The ability to use the runtime type system to optimize the validation process, so we don't need to rely on user-provided information like this, as well as the optimizations in TypeScript's type system to ensure that this can be parsed much more efficiently.
So, in terms of runtime performance in general, Archetype has been very heavily optimized, and for the sort of base case checking simple props on an object is going to be basically identical to some of the top runtime validators performance wise that are out there, like Tibia and TypeBox. But where it really shines is these cases where it can leverage its deep type system knowledge, for example, to implicitly discriminate a large union. It could very easily result in speeds of 20 or 30 times faster than even those most performant validators in cases where, again, it can, through multiple steps, identify which checks it needs to make sequentially in order to determine which branch of a union it's on, and often we'll be able to check that union in constant time without you having to even think about it. Whereas again, the alternatives are sort of generally to not have that option at all, or to have to build in that logic manually and maintain it as your tech changes. And as a point of reference, those base cases I was talking about for existing performant validators are already about 400 times faster than Zot for these kinds of scenarios. So, certainly, if you need to optimize around performance, there's going to be some really big gains to be had in that area as well.
So I know we're sort of running out of time here, I just wanted to demo a couple of my favorite features from the upcoming release, which hopefully, I guess, is the current release now that you're seeing this video. It's been hard getting to this point, there's so many things that I wanted to cover. The scope definitely grew quite a bit beyond what I had anticipated in terms of the improvements for beta. But hopefully, if you're seeing this, and I was able to wrap everything up the way that I intended, you can try all these things out now. But regardless, I wanted to show off some of these generic inference capabilities, which I feel are very cool, so there's a new feature in beta that you can define these generic types. You can define a signature like this and then reference a type parameter and then instantiate it later on and you just get this one to one inference. Again, it's just like TypeScript. And then, it will work for runtime validation as well. You can define them within a scope, which is essentially a way to bind keywords, your own custom keywords, to whatever types that you want. So, this one is probably the most kind of mind-blowing type within TypeScript's type system that's in here. You can see we've got this alternate generic that takes A and B and it calls itself, and then swaps A and B. This is just a classic TypeScript recursive generic, but this can actually be inferred within archetype and you can see that, in fact, you get these alternating inputs. If we instantiate it later, we pass it off or on instead of 01, you get the expected result and those two are toggling back and forth. So, the fact that all this ends up actually being possible within TypeScript was really incredible for me to discover. Again, props to the TypeScript team for creating such an amazing parser, an amazing tool that would support probably unintentionally this kind of level of depth just within its own type system. I think it's incredible.
9. Archetype's Morphs and Transformation Handling
The last feature I want to show is morphs, which handle transformations in Archetype. They are built into the type system and allow granular handling of transformations at the property level. This avoids the function coloring problem and provides a visual representation of the transformations. It's a unique and useful feature.
So, the last feature that I just wanted to show off quickly, which actually does exist in alpha, but because it's a really major part of what Archetype does in addition to scope, which I barely had a chance to go over, but unfortunately it's just 20 minutes is a very short amount of time, so much here, but of course I'm so excited to follow up with you all. Morphs are how Archetype handles transformations. What's really unique about them is they're actually built into the type system. So, instead of just at a high level, shallowly kind of marking everything as an effect where it can't be combined with types anymore, they're handled granularly so that only at the particular property where this transformation occurs is it considered a morph and can't be combined with another morph. But, in general, it just avoids the function coloring problem where, more broadly, the type can still be used as a type intersecting with other types, union with other types, etc., etc. And you get this really nice visual representation of exactly what kinds of transformations are going to occur. So, I think that's a really nice feature as well, and I hope you guys enjoy it.