Automated Performance Regression Testing with Reassure


As developers we love to dive into performance metrics, benchmarks, compare one solution to another. Whether we enjoy it or not, we’re often required to fix performance issues in our React and React Native apps. But this process is not sustainable and prone to regressions, especially as the app and team grow. What’s worse, those issues are often discovered by your users, making their experience miserable. In my talk I’ll introduce you to Reassure—a performance regression testing library for React and React Native— which happens to be a missing piece in our automated testing and performance suites. Spotting problems before they hit production.



Hi, today I'm going to talk about performance monitoring and how to make it happen in your React and React Native codebases with Reacher. My name is Michał Pieszchala, I'm a head of technology at Callstack, responsible for our R&D and open source efforts. I'm also a core contributor to a bunch of libraries, currently maintaining the React Native CLI and the React Native Testing Library. Let's start with some inspiration, shall we? Anyone heard of entropy? Not really this one. The real world entropy, described by physics like this. Or how Stephen Hawking framed it. You may see a cup of tea fall off a table and break into pieces on the floor, but you will never see the cup gather itself back together and jump back on the table. The increase of disorder, or entropy, is what distinguishes the past from the future, giving a direction to time. Or in other words, things will fall apart eventually when unintended. But let's not get too depressed or comfortable with things just turning into chaos, naturally. Because we can and do fight back against it. We can exert efforts to create useful types of energy and order, resilient enough to withstand the unrelenting pull of entropy, by expending this energy. When developing software, we kind of feel entropy is a thing. That's why we usually put some extra effort and follow some kind of a development cycle. For example, we start with adding a new feature. During development, we sprinkle it with a bunch of tests. When done, we send it to QA. QA approves it and promotes our code to production channel release. And we're back to adding another feature. But that's quite simplified version of what we usually do. Let's complicate it a little bit. Because among other things, we don't take into account that bugs may suddenly appear. Now, our circle becomes rather a graph, but that's okay, because we know what to do. We need to identify the root cause, add a regression test, so it never breaks again, send to QA once again, ship it, and we're back to adding new features. So we're happy with our workflow. It works pretty well. We're adding feature after feature. Our app release is so well designed that even adding 10 new developers doesn't slow us down. And then we take a look at our app reviews to check what folks think. And a wild one-star review appears. And then another one comes in. And they just keep on coming. And we start to realize that our perfect workflow based on science, our experiences, and best practices, which was supposed to prevent our app from falling apart, is not resilient to a particular kind of bugs. Performance regressions. Our code base doesn't have the tools to fight these. We know how to fix the issues once spotted, but we have no way to spot them before they hit our users. So how was it once again? Or performance will fall apart eventually when unattended. So if I don't do anything to optimize my app while adding new code and letting the time just go by, it will certainly get slower. And we don't know when it will happen. Maybe tomorrow. Maybe in a week. Or in a year. And if only there's been an established way of catching at least some of the regressions early in the development process, before our users notice. Wait a minute. There is. If we start treating performance issues as bugs, we don't even need to break of our development workflow. Regression tests run in a remote environment on every code change, so we just need to find a way to fit performance tests there. Right? But before we go for a hunt for the best tool, let's take a step back and think about impact and what's worth testing. As with any test coverage, there's a healthy ratio that we strive for to provide us the best value for the lowest amount of effort. We want to make sure to target regressions which are most likely to hit our users. And apparently we're developing a React Native app. By the way, did you know there's a font name, Impact? And you've probably seen it with hits like memes. Anyway, take a look at the typical performance issues call stack developers are dealing with daily. Slow lists and images, SVGs, React context misusage, re-renders, slow TTI, just to name a few. However, if we look at this list from the origin of issue point of view, we'll notice that the vast majority of these come from the JavaScript side. Now let's check the related frequency. And what emerges is pretty telling. We estimate that most of the time our developers spend fixing performance issues, around 80%, origin from the JavaScript realm, especially from React misusage. Only the rest is bridge communication, overhead, and native code, like image rendering or database operations working inefficiently. But I'm not a fan of reinventing the wheel, really. So I've done my googling for React performance testing library, and I found this. This package. It looks promising. Let's see what's inside. It's not quite popular, but that's okay. Last release was nine months ago. That's okay-ish. What else? This monkey patches React. That's not okay. It uses React internals as well. Well, that's a bummer. It's not a good fit for our use case and doesn't really look like a solid foundation to build on. But what do we actually need from such a library? Well, ideally, it should integrate with existing ecosystem of libraries we're using. It should measure render times and count reliably, have a CI runner, generate readable and parsable reports, provide helpful insights for code review, and looking at our Google library, have a stable design. And since there's nothing like this out there, we need a new library. And I'd like to introduce you to Reassure, a performance regression testing companion for React and React Native apps. It's developed at Colstack in partnership with Intane, one of the world's largest sports betting and gaming group. Reassure builds on top of your existing setup and sprinkles it with an unobtrusive performance measurement API. It's designed to be run on a remote server environment as a part of your continuous integration suit. To increase the stability of results and decrease flakiness, Reassure will run your tests once for the current branch and another one for the base branch. The delightful developer experience is at the core of our engineering design. That's why Reassure integrates with GitHub to enhance the code review process. Currently, we leverage Danger.js as our bot backend, but in the future, we'd like to prepare a plug and play GitHub action. Now, let's see what it does. Reassure runs Jest through Node code with special flags to increase stability. The measure render function we provide runs the React profiler to handle measurements reliably, allowing us to avoid monkey-patching React. After the first run is completed, we switch to the base branch and run tests again. Once both test runs are completed, the tool compares the results and presents the summary showing statistically categorized results that you can act upon. Let's go back to our example. Notice how we created a new file with.perf-test-.dsx extension that reuses our regular React testing library component test in a scenario function. The scenario is then used by the measure performance method from Reassure, which renders our counter component in this case 20 times. Under the hood, React profiler measures render count and duration times for us, which we then write down to the file system. And that's usually all you have to write. Copy, paste your existing tests, adjust, and enjoy. Running benchmarks is not a piece of cake even in non-JS environments, but it's particularly tricky with Node.js. The key is embracing stability and avoiding flakiness. Operating in a JavaScript VM, we need to take JIT, garbage collection, and module resolution caching into account. And we have a cost of concurrency that our test runner embraces for speed execution. We need to pick what to average and what to percentile, and a lot more. To take statistical analysis, for example, to make sure our measurement results make sense mathematically, running them once or twice is not enough. Taking other things into account, we've figured 10 times is a good baseline. Then to determine the probability of the result being statistically significant, we need to calculate the z-score, which needs the mean value or average, divergence, and standard deviation. This got me flashbacks from college, so I'm not going to dive any deeper here. Now, module resolution caching is something that's great for Node.js apps, but it bit us when developing the library. As it turned out, subsequent execution of the same component often resulted in even 10 times slower runs. As you can imagine, averaging that would make the results unreliable. So we dropped the slowest test as most likely it's cued by the lack of cache. Having all of that data, we can present you the render duration times as statistically significant or meaningless. Apart from render times, another useful metric that may easily degrade is render counts, which we get for free from React Profiler. All of this information is stored in a JSON format for further analysis and Markdown for readability. We use the Markdown output as a source for the GitHub commenting bot powered by Danger.js. This is by far our favorite and recommended usage of Reassure as it enriches the code review process while allowing us to alleviate the instability of the CI we're using. Let me share what we learned so far when using this library. You need to cover the most important user scenarios. Even if you don't have performance issues now, you'll spot them if they appear. Test whole screens or even screen sequences. Component level tests are possible, but often require more test runs. You can reuse your React Native and React Testing library tests if you have them. And all of the established testing library practices apply. So let your tests resemble user behavior and avoid mocking anything other than I O. Due to their qualities, it seems that front-end performance tests resemble end-to-end tests for our apps. It makes sense to treat them as such than in our testing trophies or pyramids. Remember that performance is not a goal. It's a path. Walk it with confidence. Heading towards earned, Reassure wouldn't exist if it wasn't for my fellow colleagues, Maciej Jastrzębski, who is the brain behind the library, and an awesome dev team of Tomasz and Jakub. Make sure to follow their work. Reassure is open source. The QR code will redirect you to the repo. Give it a start if you like it, and let me know if you have any issues adopting it. I would be happy to help. And that's all, folks. You can find me on Twitter or GitHub under this handle. Have a great conference and thank you.
16 min
24 Oct, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic