Static Analysis in JavaScript: What’s Easy and What’s Hard

Rate this content
Bookmark

We are all using static analysis tools like ESLint every day to ensure the better quality of our code. How does it work, and what is tricky about JavaScript, making writing a proper rule often not trivial?

23 min
05 Jun, 2023

Video Summary and Transcription

Static analysis in JavaScript involves analyzing source code without executing it, producing metrics, problems, or warnings. Data flow analysis aims to determine the values of data in a program. Rule implementation in JavaScript can be straightforward or require extensive consideration of various cases and parameters. JavaScript's dynamic nature and uncertainty make static analysis challenging, but it can greatly improve code quality.

Available in Español

1. Introduction to Static Analysis in JavaScript

Short description:

Hello, everybody. My name is Elena Vilchik and I'm going to talk to you about static analysis in JavaScript. I'm working at the Cynar company, writing analyzers for JavaScript and other languages. Static code analysis is a program that analyzes source code without executing it, producing metrics, problems, or warnings. It is different from dynamic analysis, which executes code. There are different levels of static analysis, including text-based, token-based, syntax tree, and semantic analysis. Control flow analysis is a less commonly used model.

Hello, everybody. My name is Elena Vilchik and I'm going to talk to you about static analysis in JavaScript. So, I'm going to talk about what's easy and what's hard there. So, a bit about myself, I'm working at the Cynar company. We're doing the platform for continuous code quality and security detection. More than eight years I'm writing analyzers for JavaScript and many other languages, and when it comes to the clean code, some people can call me a pain in the neck.

So before we jump to the what is easy and what is hard, I want to first tell you what static code analysis is. Not everybody might be aware of that. So, static code analyzer is a program, as you might have guessed, which takes the source code, the text files of the program. Sometimes for some languages it takes something else. Some precompiled files, for example bytecode for Java, to get some semantic information produced by the compiler. And without actual execution of the source code, it might have some imitation of the execution, but never actually executing the source code, it produces some metrics, problems or warnings, findings, so whatever you call them. You might also think about, okay, static analysis, I get it, then what is dynamic analysis and if those are some competing directions of the same thing, in fact no, dynamic analysis you're using it every day, this is something which actually executes a code and examples also that are known to everybody, this is code coverage, those are unit tests, and in fact those two things are required to everybody and big friends and helpers to every developer in everyday life.

I'm going to go through the levels of static analysis, levels in terms of the models which is used for writing. We are going to use the term rule which is familiar to everybody. The first level will be text-based when you just get the source file and you try to infer something from there. This can be, for example, this example could be the number of lines in the file or the presence of the license header. To get those things, you don't need to know anything else but just the text of the source code. The next level will be tokens. You're splitting it into words. You know some metadata about those tokens if it is a keyword, dictator, and you can already know, write some rules on this level. For example, for the string literal, you can say that, okay, I have this literal token and I can tell you if it's using the right quotes which is single quote or double quote, whatever you configure it. The next level is the syntax tree or abstract syntax tree, IST for short. This is super common, the most used level where we represent the source code in the tree format. Here is the example, a bit simplified of course for the shortness of the code which we just had before. So we have function which has name foo and parameter p which has a body. If statement has condition with a quality operator and those has operands, p untrue and a call function. So for those who didn't know I would really recommend to have a look at the IST Explorer website, super great website which will display you the IST representation of the source code whatever you're going to put there, super nice to even to investigate some new features of language to see like what is actually the thing which you just entered there, which kind of syntax of language it is. there is a level of semantic, semantic we're talking about rivals, their declarations, usages and on this level for example you know that here is the parameter P, it is declared here, it is used here, then there is a function foo which is declared and referenced at the last line and the variable P which is not the same as parameter P even if they have same name, here they are declared in different scopes, scope is another notion of this level and here it is written and declared and here it is read. And then we are talking about more dance models which are usually not as widely used as all previous ones, the most common of them is control flow analysis which is for example present in the core of ESLint.

2. Understanding Data Flow Analysis

Short description:

On this level, we are aware of the order of execution of instructions, expressions, and statements. Data flow analysis aims to determine the values of data in a program. TypeScript compiler performs dataflow analysis to check variable types based on control flow. Each level builds upon the previous one, with control flow analysis being a prerequisite for dataflow analysis.

On this level we are aware of the order of execution of instructions, execution of expressions and statements or our example with if statement, inside the foo function we have the condition p equals true and depending if it's true, we're gonna alert it, if it's not, we gonna alert we can exit as well. So this is, the last level I wanted to talk about, last model is data flow analysis.

In this level we want to know the values, the maximum we can learn about the values of the values of data, in other words of the program. Of course we cannot know everything because nobody knows everything until you actually execute the program, but you can know something about the values. For example in this block of if equal true we will know that p is actually a true value, in the else we will know that it is not true. Outside of this if we will not know anything about p value. You can also think of TypeScript compiler as a dataflow analysis, because that's what it does. It looks at the control flow of the program and checks depending on the different statements, different expressions, what are the limits to the type of the variable. And here the notion between value and type is pretty fuzzy, because of the way TypeScript defined it. And as you might have noticed, every next level is based on the previous one to be able to build the dataflow analysis, you need to have control flow analysis necessarily, and etc.

3. Exploring Rule Implementation in JavaScript

Short description:

So what's easy about JavaScript is its dynamic nature and the abundance of weird behaviors, which provide plenty of ideas for rules, especially for newcomers. The implementation of most rules on the first levels, such as text, tokens, IST, and semantic, is straightforward. For example, the No New Symbol rule from YesLint has a short and clear implementation. On the other hand, the lint.nonused.variable rule requires extensive consideration of various cases and parameters, demonstrating the Pareto principle in rule implementation. Tuning the rule with different options and special cases is crucial for achieving optimal results. This includes considering language features, development practices, and the use of frameworks and libraries.

So now I start to talk about, OK, what's easy. So what's easy is I would say that the JavaScript is such a language which is super dynamic, which has a lot of weird behaviors, that this gives us plenty of ideas for the rules, especially for people who are new to the language. Not many people might expect that X equals null is true when x is undefined, that X equals null is never true, and that plus will be a concatenation when there is at least one string operand. Another thing about easiness is that most of the rules will be implemented on the first levels, on the text, tokens, IST and semantic, and the implementation will be pretty straightforward.

I took the example of the No New Symbol rule from YesLint. This rule is reporting you every time you use a new symbol, which will produce a runtime exception as you should use a symbol built in only without new. You see that the implementation is super short. First you have the block of metadata for the rule with description and message. And then here is the actual implementation of the rule, which is just 20 lines or even less. We get from the global scope, we get all the symbol named variables, because if we're using the building symbol, it will be from the global scope. Then we're checking definitions as zero, because otherwise it would mean that the symbol is declared by the user and we need a building symbol. Then we're iterating on all its references, and for each, we're checking if it is a new expression, we're gonna report it to the user. So, yeah, that's all super clear and the way you would want it to look like.

Another example I wanted to show you is the lint.nonused.variable rule. So if you think about, okay, how I would implement it, and I would say that, okay, I'm taking the each variable from the file, and when it has only one declaration and zero usages, this is actually a new variable and it can be removed. Now let's have a look and see how the rule is actually implemented. You might see that the rule is pretty big, and I'm scrolling and scrolling and scrolling and scrolling and scrolling, and I finally finished the bottom 700 lines of code. So you might have guessed that this is not only what we just said about arriving variables without any usages with just declarations. There are many things which developers of this rule had to consider. It has a lot of parameters about rest, about destructuring, co-errors, arcs, and having some exceptions. And I believe there are many more different cases that they had to consider to be able to have a good implementation. That's when I will speak about a Pareto principle, which is true for many, many things in our lives. But here it's also true that, for many rules, when you implement it, the basic intuitive implementation, which is super small in terms of work, and which brings you 80% of the result, is good. But, you need to spend 80% of your time tuning the rule, with different options and special cases, which will end up with those 20% of the rule results. But, this is essential to make a good rule. And, to tune the rule, you might do it for many reasons. For example, I listed three things. Here is the language features, which are old, or in the opposite future, not yet even released as a part of the official ECMAScript, or just recent, and you didn't think about them when you wanted to implement the rule. There could be some also development practices, which are pretty common, but you, me, as, I, for example, don't, would not write this, if condition does something, I would write two lines, and the rule about statement, one statement per line would not report it, but many people write it in one line, and we need to exclude it to not make them annoyed. Also, there are many frameworks and libraries that, when you use them, you need to do something which will make rules complain.

4. Challenges in JavaScript Static Analysis

Short description:

Other rules about function size, like number of statements or lines, can be excluded to remove noise. The absence of types in JavaScript makes static analysis challenging. The back-return rule is an example of a rule that is disabled by default. It detects misuse of the map function and reports on any usage, regardless of the array-like structure. This rule's implementation has limitations, leading to its default disablement. Different strategies and heuristics can be considered for this rule, each with its trade-offs in terms of true positives and false positives.

For example, other rules about function size, like number of statements or lines, or whatever, for React functional components, as those are being a kind of container for other functions, they will be pretty big, and we'll report them. We need to exclude them to remove that noise.

Another thing which is hard in the JavaScript static analysis is the absence of types. I took as an example a rule called back-return rule. You might never have heard of this, because it's disabled by default in the recommended profile. When you have an array and you want to rate on it, for example, here, I'm just summing up all the values, and using the map function which will work, which many people are doing, I'm observing that again and again. In fact, this is a misuse of the function map, and some people, the other maintainers, they might see it and say, okay, what are you mapping? To what? You're not returning anything. You should return to be able to use map. If you don't use, if you don't want to map anything, just use for reach. And, in fact, if you check how this rule is implemented, it's supposed to report only on array and the array-like structures, but it will report you on whatever is there. Just checking the name of the function. So, this is a pretty dumb implementation, I would say, and this is a case strategy, which we use often in our company that, okay, if it works, why not? But as it is not good enough for most of the users, for years leaned to maintainers, they disabled the rule by default. So this is very sad as such rule, which has really great value, has to be disabled because of this limitation. You can think of other rule strategies for this rule. You can think of checking what the object is assigned to, if it's assigned object, a really trial, you going to report that. And that's where the, here is another thing that you need to choose best heuristic. And note that when you are in this case, when they chose reporting every map usage, it has a lot of true positive issues. So it will going to report on all the cases you want, but it also reported many false positives. So there is pretty linear dependence here. If you're an opposite going to report only on those variables, which are assigned array literal, like in my example, it will report only a really small fraction, a fraction of true positives, but then we're going to have like basically zero false positive, because we are sure we will never report on anything but arrays.

5. Static Analysis: Challenges and Techniques

Short description:

JavaScript's dynamic nature and uncertainty make it challenging to detect errors. One example is the 'no extra arguments' rule, which reports when a function is called with more arguments than it expects. Implementing this rule becomes more complex when dealing with exported or imported functions, callbacks, or functions with unknown parameters. Advanced techniques, such as dead store detection, require implementing leave variable analysis and control flow graphs. Writing static rules can be difficult, as evidenced by the large number of YesLingDisabled instances on GitHub. Understanding static analysis and using static analysis tools can greatly improve code quality.

Another thing about JavaScript is that it's dynamic and that you never know, you can never be sure about anything. An example is no extra arguments. This is a rule from our plugin from my company. This rule reports when you use the function which has less arguments than you are calling it with. In case this is just a local variable, local function, this is working perfectly, but this is probably not when you're going to make this mistake because you have this function declaration just next to it, but I put an example when the function is exported, or imported from another model or file, or you have a callback and you don't have a handle with how many parameters are there, that's when you can make this mistake easily, but this rule implementation is not able to know that. And at the end I wanted to talk about advanced techniques, so when we were talking about models, we were talking about some, let's say, big thing, which you're going to use for many rules, like control flow, graph, or data flow analysis. Here I'm talking about literally like a technique, which is useful for basically one rule. This rule I took as example is dead store detection and this is not really known, I think, in JavaScript community, but I believe it's a really cool one. So this rule is supposed to detect, I'm going to show on this example, when you are writing the value, for example, here you assign x equals zero, and in fact, the next thing you're going to do is x equal 10, assign 10, which means that this assignment is actually, it's never used, and this zero is a dead store. This is often, in some cases, you can just drop the line and say that, okay, I in fact don't need this assignment, but often that means something else that you really did your algorithm wrong. And to implement that store, you need to implement leave variable analysis, which calculates the variables that I leave at each point in the program, so everything which is not leave is dead, that's just a copy-paste from Wikipedia, and variable is leave at some point if it holds a value that may be needed in the future. So in this example, we see that, okay, the zero will never be needed in the future because it will be overwritten just after that. To see how to detect that store, let's have a look at this small piece of code. I have here many assignments and assignments, read, assignment, read, so I represented this as a control flow graph with a branch on the EVE, and then we do two things, and then we merge again. Here we don't care about actual things to be able to report on this rule, we just need to know about write and read of the variables. In this case we consider only X variable, and if we try to, in order to find the dead store, what we need to find is that we need to find those writes which don't have any path in the control flow graph which will read that after. So let's say if we find the path in the call graph which will read the value, this is net dead store. So if you take the first one, the one on the top, it has this flow to this read. If we take this write on the left, it has a...so if we take the one on the left, in fact we see that it doesn't have any flow in the graph which will read it because it has only one flow and it writes it so we can say that okay, this is dead store. And the next write here, it will be read the next instruction in the same block. So this is just an intuition implementation of the rule for this particular case and in order to generalize it and to have the generic implementation, here what you need to implement. This is a screenshot from Wikipedia where you see that you need to have some formulas with sets and every time I needed to implement this store, it took me like a whole day to load it in my head and then I forgot it in one week. So I wanted to finish with the screenshot from GitHub with a number of YesLingDisable and in the code, you see it's almost three million of YesLingDisabled, I believe many more in private code, which means that yeah, it's not so easy to write static rules. People need to disable it because they are not happy with their results. And this is just false positive, you see here, we can never say how many false negative there because nobody is able to see them. And as a takeaway from my talk, I wanted to tell you that you guys know how static analysis works and what ICT, abstract syntax tree is, so feel free to contribute, write custom rules if you need some, those rules can be easy to write, so don't be scared, just do that. And on another side, if you like challenges, if you like something, here is something hard to do, static analysis is also the place to be for you, so have a look at it, especially at the more advanced techniques. And, of course, use static analysis tools, this is really something which will help you in your code quality. And that's it for me.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

JSNation 2023JSNation 2023
29 min
Modern Web Debugging
Top Content
Few developers enjoy debugging, and debugging can be complex for modern web apps because of the multiple frameworks, languages, and libraries used. But, developer tools have come a long way in making the process easier. In this talk, Jecelyn will dig into the modern state of debugging, improvements in DevTools, and how you can use them to reliably debug your apps.
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Top Content
Our understanding of performance & user-experience has heavily evolved over the years. Web Developer Tooling needs to similarly evolve to make sure it is user-centric, actionable and contextual where modern experiences are concerned. In this talk, Addy will walk you through Chrome and others have been thinking about this problem and what updates they've been making to performance tools to lower the friction for building great experiences on the web.
React Summit 2023React Summit 2023
24 min
Debugging JS
Top Content
As developers, we spend much of our time debugging apps - often code we didn't even write. Sadly, few developers have ever been taught how to approach debugging - it's something most of us learn through painful experience.  The good news is you _can_ learn how to debug effectively, and there's several key techniques and tools you can use for debugging JS and React apps.
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
pnpm – a Fast, Disk Space Efficient Package Manager for JavaScript
You will learn about one of the most popular package managers for JavaScript and its advantages over npm and Yarn.A brief history of JavaScript package managersThe isolated node_modules structure created pnpmWhat makes pnpm so fastWhat makes pnpm disk space efficientMonorepo supportManaging Node.js versions with pnpm
JSNation 2023JSNation 2023
31 min
Rome, a Modern Toolchain!
Top Content
Modern JavaScript projects come in many shapes: websites, web applications, desktop apps, mobile apps, and more. For most of them, the common denominator is the technical debt that comes from settings up tools: bundlers, testing suite, code analysis, documentation, etc. I want to present you Rome, a toolchain that aims to be a all-in-one toolchain for the web, with one single tool you can maintain the health of all your projects!
React Advanced Conference 2021React Advanced Conference 2021
27 min
Beyond Virtual Lists: How to Render 100K Items with 100s of Updates/sec in React
Top Content
There is generally a good understanding on how to render large (say, 100K items) datasets using virtual lists, …if they remain largely static. But what if new entries are being added or updated at a rate of hundreds per second? And what if the user should be able to filter and sort them freely? How can we stay responsive in such scenarios? In this talk we discuss how Flipper introduced map-reduce inspired FSRW transformations to handle such scenarios gracefully. By applying the techniques introduced in this talk Flipper frame rates increased at least 10-fold and we hope to open-source this approach soon.

Workshops on related topic

React Summit 2023React Summit 2023
170 min
React Performance Debugging Masterclass
Top Content
Featured WorkshopFree
Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)
React Advanced Conference 2021React Advanced Conference 2021
174 min
React, TypeScript, and TDD
Top Content
Featured WorkshopFree
ReactJS is wildly popular and thus wildly supported. TypeScript is increasingly popular, and thus increasingly supported.

The two together? Not as much. Given that they both change quickly, it's hard to find accurate learning materials.

React+TypeScript, with JetBrains IDEs? That three-part combination is the topic of this series. We'll show a little about a lot. Meaning, the key steps to getting productive, in the IDE, for React projects using TypeScript. Along the way we'll show test-driven development and emphasize tips-and-tricks in the IDE.
React Advanced Conference 2023React Advanced Conference 2023
148 min
React Performance Debugging
Workshop
Ivan’s first attempts at performance debugging were chaotic. He would see a slow interaction, try a random optimization, see that it didn't help, and keep trying other optimizations until he found the right one (or gave up).
Back then, Ivan didn’t know how to use performance devtools well. He would do a recording in Chrome DevTools or React Profiler, poke around it, try clicking random things, and then close it in frustration a few minutes later. Now, Ivan knows exactly where and what to look for. And in this workshop, Ivan will teach you that too.
Here’s how this is going to work. We’ll take a slow app → debug it (using tools like Chrome DevTools, React Profiler, and why-did-you-render) → pinpoint the bottleneck → and then repeat, several times more. We won’t talk about the solutions (in 90% of the cases, it’s just the ol’ regular useMemo() or memo()). But we’ll talk about everything that comes before – and learn how to analyze any React performance problem, step by step.
(Note: This workshop is best suited for engineers who are already familiar with how useMemo() and memo() work – but want to get better at using the performance tools around React. Also, we’ll be covering interaction performance, not load speed, so you won’t hear a word about Lighthouse 🤐)
JSNation 2022JSNation 2022
71 min
The Clinic.js Workshop
Workshop
Learn the ways of the clinic suite of tools, which help you detect performance issues in your Node.js applications. This workshop walks you through a number of examples, and the knowledge required to do benchmarking and debug I/O and Event Loop issues.
JSNation 2023JSNation 2023
44 min
Solve 100% Of Your Errors: How to Root Cause Issues Faster With Session Replay
WorkshopFree
You know that annoying bug? The one that doesn’t show up locally? And no matter how many times you try to recreate the environment you can’t reproduce it? You’ve gone through the breadcrumbs, read through the stack trace, and are now playing detective to piece together support tickets to make sure it’s real.
Join Sentry developer Ryan Albrecht in this talk to learn how developers can use Session Replay - a tool that provides video-like reproductions of user interactions - to identify, reproduce, and resolve errors and performance issues faster (without rolling your head on your keyboard).