Parse, Don’t Validate

Rate this content
Bookmark
Most JavaScript applications use JSON.parse to create any object first, and then validate and narrow the data type to the expected one. This approach has performance and security problems, as even if the data is invalid, the whole JSON string needs to be parsed first before the data is validated, instead of failing at JSON parsing stage (e.g., if number is passed instead of string in some property).

Many languages support parsing of JSON strings directly into the expected types, but it is not natively supported in JavaScript or TypeScript.

In this talk we will show how the powers of TypeScript combined with the new specification JSON Type Definition (RFC 8927) and Ajv library can be used to parse your data directly into the expected application-defined type faster than JSON.parse, and also how to serialize data of a known type approximately 10 times faster than JSON.serialize.

 

FAQ

Simplex Chat is a unique messaging platform founded by Evgeniy, which is distinctive for not having user identifiers.

The primary challenges include security, reliability, and data validation, which are consistent issues across various projects.

JSON Schema is a way to define the format and structure of data, widely adopted since 2009. JTD (JSON Type Definition) is a simpler alternative started in 2020, supporting discriminated unions but with some limitations compared to JSON Schema.

JSON can be wasteful as it requires parsing the entire data piece before validation, it's time-consuming due to its complex, nested nature, and it has security issues like potential for call stack exhaustion and DDoS attacks.

Fastify enhances performance by efficiently handling serialization of responses, understanding the structure of the data it returns, which minimizes loops and accelerates property access.

AJV is a library used for data validation with JSON Schema and has significantly grown, with 350 million downloads monthly as of 2015. It helps manage data correctness and security in JavaScript applications.

For API development, JTD (JSON Type Definition) is recommended for most use cases because of its simplicity and alignment with data types, providing clearer and more efficient data handling.

Type alignment ensures that data parsing and serialization are directly linked to specific data types, which improves performance, security, and reliability by ensuring data is correctly typed and structured.

Evgeny Poberezkin
Evgeny Poberezkin
Jason Green
Jason Green
26 min
17 Apr, 2023

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Hello. We're going to talk today about JavaScript and how to ensure data correctness. JSON can be wasteful and has security issues, but Fastify tackles these challenges. JTD is better than JSON Schema for most API use cases, as it has a more strict structure and avoids debugging issues. Jason demonstrates how to validate data with JTD and TypeScript, ensuring data validity and improving performance. The approach of parsing JSON directly to the application type and serializing a specific type improves security and reliability.

Available in Español: Analizar, no validar

1. Introduction to JavaScript and Data Correctness

Short description:

Hello. We're going to talk today about JavaScript and how to ensure data correctness. We've worked together on various projects, including MailOnline and Threads. I'll hand over to Jason to introduce himself. Jason is the director of technology at Threads Styling and has extensive experience with data validation using JSON Schema. We've encountered common problems in our projects, such as security, reliability, and data validation. We'll discuss an alternative approach to validation that involves parsing and carrying the proof of validity. JSON, while flexible, has its challenges and can be wasteful.

Hello. I'm Evgeniy and this is Jason. We're going to talk today about JavaScript and how ensure data correctness but before we'll give a brief introduction.

So, we've done lots of great things together. We worked together at MailOnline, at Threads, and when I did java library Jason also helped a lot. So, currently I founded a Simplex chat, which is a messaging platform that is the only one of a kind that doesn't have user identifiers, but this is not what the talk is about.

So, I'll hand over to Jason to introduce himself. Thanks, Evgeniy. Obviously, you know by now I'm Jason Green, I'm director of technology at Threads Styling. Threads is a fashion tech company pioneering the world of personalized luxury shopping through chat and social media. I also previously worked with Evgeniy as a principal engineer at the MailOnline. I've been a long time user of data validation with JSON Schema and in particular using AJV, which I've witnessed grow and mature so much over the years. I'm an early investor in simplex chat as well.

Yeah, AJV growth has been indeed crazy so it's got from really nothing in 2015 when it started and now it has 350 million downloads every month with every JavaScript application probably everyone using that. So why do we want to talk about what we talk about, right? In all these projects we've done and we've done some really cool things, right? We've done a content creator at MailOnline when we've been quite mature and nevertheless we've built a very complex in-browser application and with hundreds of thousands of lines of JavaScript code that allowed editing. The whole MailOnline website is managed by that. And then when I was using engineering threads and Jason also joined threads, we built StoryMaker. Mostly Jason built it, I'm just basking in the glory. So it was a content management system for Instagram, which we definitely learned a lot of things from the previous project. And in all of those projects we did, we have been invariably hitting the same problems of security, reliability, data validation, whatever project we do, the problems are invariably the same. So I've done a lot of Haskell, and this parse.don't.validate.maxim belongs to Alexis Kane, one of the best and most genius Haskell engineers out there, who proposes the approach to parsing as an alternative to validation. So rather than just check that your data is correct, you push the proof and carry the proof of validity around. So not just correct, not just check that your data is correct, but also obtain some proof as if it's in some different type and use it across your application. So I'm gonna hand over to Jason. And in this class with JavaScript, what we'll learn is that you should really not be using native JSON in JavaScript. You should be doing some other things. Jason, over to you.

So as we all know, JSON is a widely used format that's generally considered to be flexible and easy to work with. However, it's important to be aware of some of the potential problems and challenges that it has. JSON is particularly wasteful.

2. Challenges with JSON and Importance of Performance

Short description:

Passing JSON can be wasteful as you need to pass the entire data before checking its validity. JSON has security issues and can exhaust the call stack with deep structures or be used in DDoS attacks. Performance and reliability are important depending on the situation, especially when it affects user experience and satisfaction. Fastify is a library that tackles serialization by defining inputs and outputs in JSON Schema, increasing speed and improving data structure handling.

Now, it's not something you're gonna notice in your day-to-day debugging when you're working with it, but passing JSON can be a very wasteful process, as you need to pass the entire piece of data before you can understand or even begin to check if it's valid or not. Because of the potentially complex and nested nature of JSON, it can be particularly time-consuming to then go on and validate.

Many of us who started in JavaScript have come to love working with type script, but then you go and throw a big large blob of unstructured JSON into the mix, and suddenly, you're back to square one. None of your types matter, and everything is unknown again. It also has some security issues. These are issues that I actually wasn't very aware of despite working with it for a long time, until looking into it. If you have very, very deep structures, they can actually exhaust your call stack. This can be just because of the data itself, or it can be a deliberate attack with very deeply nested structures being sent to your APIs. You can also suffer from very large blobs of data being sent to your APIs in the form of a DDoS attack. Once again, before you can even understand if it's valid or not, your API will have to dutifully pass those blobs, which once again is very wasteful. It is even possible to do prototype pollution attacks via JSON as well.

So before we are concerned about performance and reliability, it's important to think about when performance and reliability is actually important. It does seem like an obvious statement. You know, most people wouldn't go out of their way to make an argument that it's not important, but it's not going to be important for every situation. It really depends on various factors. Obviously, a slow app is better than no app at all. So if you have an application that's delivering value, you may have much bigger issues that you need to face before worrying about performance and reliability. Particularly in the early stages of app development, you're going to be much more concerned with time to market. If your app isn't even available yet, that's obviously a big issue. You're going to be concerned about budget, the overall user experience of your application, and of course, what are your users' needs and what's most pressing to them. However, it is going to be an issue when the performance is affecting user experience and satisfaction. That can risk you losing users and those people who go away from your application or site because it didn't load fast enough, they may not come back, which is obviously what we refer to as high bounce rates.

Even worse, if reliability is your issue and your customers are losing their work or their data is becoming corrupted, that's a big issue that, in the best case, can result in some apologies. In the worst case, you may actually end up having to pay for it in some way through compensation or discounts to keep people happy. So there is actually a solution to part of this problem, which is tackled by a library called Fastify, which is a replacement for your Express router. It tackles the serialization part of the problem, which is to say that by defining the inputs and outputs and the shape of them in JSON Schema, this library is able to more quickly serialize the responses and it can get quite good increase in speed because it's focused on ... because it knows the structure of the data it's supposed to be returning. In this way, it can take a lot that would normally be loops and turn them into straight property access. So if you talk about schemas, for a long time JSON Schema was the only way to define the format of the data or the type of the data or whatever you call it. It started from 2009 and since 2020 there is an alternative specification that was created to address the shortcomings.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Scaling Up with Remix and Micro Frontends
Remix Conf Europe 2022Remix Conf Europe 2022
23 min
Scaling Up with Remix and Micro Frontends
Top Content
Do you have a large product built by many teams? Are you struggling to release often? Did your frontend turn into a massive unmaintainable monolith? If, like me, you’ve answered yes to any of those questions, this talk is for you! I’ll show you exactly how you can build a micro frontend architecture with Remix to solve those challenges.
Full Stack Components
Remix Conf Europe 2022Remix Conf Europe 2022
37 min
Full Stack Components
Top Content
Remix is a web framework that gives you the simple mental model of a Multi-Page App (MPA) but the power and capabilities of a Single-Page App (SPA). One of the big challenges of SPAs is network management resulting in a great deal of indirection and buggy code. This is especially noticeable in application state which Remix completely eliminates, but it's also an issue in individual components that communicate with a single-purpose backend endpoint (like a combobox search for example).
In this talk, Kent will demonstrate how Remix enables you to build complex UI components that are connected to a backend in the simplest and most powerful way you've ever seen. Leaving you time to chill with your family or whatever else you do for fun.
TypeScript and React: Secrets of a Happy Marriage
React Advanced Conference 2022React Advanced Conference 2022
21 min
TypeScript and React: Secrets of a Happy Marriage
Top Content
TypeScript and React are inseparable. What's the secret to their successful union? Quite a lot of surprisingly strange code. Learn why useRef always feels weird, how to wrangle generics in custom hooks, and how union types can transform your components.
Making JavaScript on WebAssembly Fast
JSNation Live 2021JSNation Live 2021
29 min
Making JavaScript on WebAssembly Fast
Top Content
JavaScript in the browser runs many times faster than it did two decades ago. And that happened because the browser vendors spent that time working on intensive performance optimizations in their JavaScript engines.Because of this optimization work, JavaScript is now running in many places besides the browser. But there are still some environments where the JS engines can’t apply those optimizations in the right way to make things fast.We’re working to solve this, beginning a whole new wave of JavaScript optimization work. We’re improving JavaScript performance for entirely different environments, where different rules apply. And this is possible because of WebAssembly. In this talk, I'll explain how this all works and what's coming next.
Debugging JS
React Summit 2023React Summit 2023
24 min
Debugging JS
Top Content
As developers, we spend much of our time debugging apps - often code we didn't even write. Sadly, few developers have ever been taught how to approach debugging - it's something most of us learn through painful experience.  The good news is you _can_ learn how to debug effectively, and there's several key techniques and tools you can use for debugging JS and React apps.
React's Most Useful Types
React Day Berlin 2023React Day Berlin 2023
21 min
React's Most Useful Types
Top Content
We don't think of React as shipping its own types. But React's types are a core part of the framework - overseen by the React team, and co-ordinated with React's major releases.In this live coding talk, we'll look at all the types you've been missing out on. How do you get the props type from a component? How do you know what ref a component takes? Should you use React.FC? And what's the deal with JSX.Element?You'll walk away with a bunch of exciting ideas to take to your React applications, and hopefully a new appreciation for the wonders of React and TypeScript working together.

Workshops on related topic

React, TypeScript, and TDD
React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
Paul Everitt
Paul Everitt
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.
Best Practices and Advanced TypeScript Tips for React Developers
React Advanced Conference 2022React Advanced Conference 2022
148 min
Best Practices and Advanced TypeScript Tips for React Developers
Top Content
Featured Workshop
Maurice de Beijer
Maurice de Beijer
Are you a React developer trying to get the most benefits from TypeScript? Then this is the workshop for you.In this interactive workshop, we will start at the basics and examine the pros and cons of different ways you can declare React components using TypeScript. After that we will move to more advanced concepts where we will go beyond the strict setting of TypeScript. You will learn when to use types like any, unknown and never. We will explore the use of type predicates, guards and exhaustive checking. You will learn about the built-in mapped types as well as how to create your own new type map utilities. And we will start programming in the TypeScript type system using conditional types and type inferring.
Using CodeMirror to Build a JavaScript Editor with Linting and AutoComplete
React Day Berlin 2022React Day Berlin 2022
86 min
Using CodeMirror to Build a JavaScript Editor with Linting and AutoComplete
Top Content
WorkshopFree
Hussien Khayoon
Kahvi Patel
2 authors
Using a library might seem easy at first glance, but how do you choose the right library? How do you upgrade an existing one? And how do you wade through the documentation to find what you want?
In this workshop, we’ll discuss all these finer points while going through a general example of building a code editor using CodeMirror in React. All while sharing some of the nuances our team learned about using this library and some problems we encountered.
Deep TypeScript Tips & Tricks
Node Congress 2024Node Congress 2024
83 min
Deep TypeScript Tips & Tricks
Top Content
Workshop
Josh Goldberg
Josh Goldberg
TypeScript has a powerful type system with all sorts of fancy features for representing wild and wacky JavaScript states. But the syntax to do so isn't always straightforward, and the error messages aren't always precise in telling you what's wrong. Let's dive into how many of TypeScript's more powerful features really work, what kinds of real-world problems they solve, and how to wrestle the type system into submission so you can write truly excellent TypeScript code.
Testing Web Applications Using Cypress
TestJS Summit - January, 2021TestJS Summit - January, 2021
173 min
Testing Web Applications Using Cypress
WorkshopFree
Gleb Bahmutov
Gleb Bahmutov
This workshop will teach you the basics of writing useful end-to-end tests using Cypress Test Runner.
We will cover writing tests, covering every application feature, structuring tests, intercepting network requests, and setting up the backend data.
Anyone who knows JavaScript programming language and has NPM installed would be able to follow along.
Build a powerful DataGrid in few hours with Ag Grid
React Summit US 2023React Summit US 2023
96 min
Build a powerful DataGrid in few hours with Ag Grid
WorkshopFree
Mike Ryan
Mike Ryan
Does your React app need to efficiently display lots (and lots) of data in a grid? Do your users want to be able to search, sort, filter, and edit data? AG Grid is the best JavaScript grid in the world and is packed with features, highly performant, and extensible. In this workshop, you’ll learn how to get started with AG Grid, how we can enable sorting and filtering of data in the grid, cell rendering, and more. You will walk away from this free 3-hour workshop equipped with the knowledge for implementing AG Grid into your React application.
We all know that rolling our own grid solution is not easy, and let's be honest, is not something that we should be working on. We are focused on building a product and driving forward innovation. In this workshop, you'll see just how easy it is to get started with AG Grid.
Prerequisites: Basic React and JavaScript
Workshop level: Beginner