I See What is Going on: Visual Testing for Your Components


Many modern web frameworks use components as their building blocks.

In this talk, I will show component testing using Cypress - and I mean "show" literally. You will see the component running inside a real browser, you will interact with the component like a real user, and you can debug the component using the real browser's DevTools. Finally, I will go through the current open source and commercial options for visually testing the component's styles and rendering.


Hi everyone. Thanks for inviting me. I'm Gleb Ahmutov, VP of Engineering at Cypress.io. And I'm going to tell you how to visually test your React components. The title of my talk is, I see what is going on. And through my presentation, you will see what's going on. Before we start, just a quick slide. Our planet is still in imminent danger despite COVID. So if we don't change our climate policy, we're going to go extinct. So the time to act is yesterday, really. You can change your life, but also you should join an organization because we cannot do it alone. We have to work together. Okay, so let's take a React application. In this case, it's a Sudoku. I really like this app because it looks nice, it plays nice, and it has a very good style. It even has responsive styling, so you can see how it changes from desktop to tablet to phone. This application is a React application and it's built under the hood from React components. So if we look at the application and we look at the source code, we can see individual files. And even by name, you can kind of tell that, for example, the footer is the footer component, the header is the header component, and so on. If we have React DevTools installed in our browser, we can hover over the list of components on the left and see each component and where it's presented in the DOM on the right. We can look at the source code. It's a typical React application. The index.js file has an app component that it renders. The app component imports the game component and surrounds it with a context provider where all the data is stored. The app component also imports the application CSS file with all the styles. The game component is where most of the logic is contained. It imports all other components like header and game section status and initializes the game. And then it renders the header, the game section, status, footer, and for those children components, it actually passes different handlers as props. This is very standard React application architecture. So let's look at individual components. We take some inputs and we produce DOM, we react to user events. Let's look at what a component expects. I'm going to talk about this numbers component. It shows numbers or digits from one to nine that I can enter into the Sudoku grid. You can see the component on the right bottom corner. You can highlight a specific number because that's the number you're about to enter. The highlighted number comes from the context. So you don't have to pass it through all the components. Instead, the numbers component just grabs it from the Sudoku context and then uses it to highlight a number. But this numbers component also accepts props. And in this case, the parent component passes on click number property. Whenever a user clicks a number, that property is invoked with a number that the user selects. So we really can think of our numbers component as being this unit of code where we feed different properties, different context values, and user events like clicks. And in response, the numbers component generates a different DOM and outputs property calls. This view of components as just units of code is not just this presentation. I've given my philosophy on components being units in other talks as well that you can check out. So let's write component tests. I'm going to install Cypress, which is open source, free to use, MIT licensed test runner. And I'm also going to install Cypress React unit test, which is an adapter for Cypress that allows you to mount React components directly. I'm going to add Cypress React units to the Cypress support file, and I'm going to add it to the plugins file. And this will allow Cypress to bundle specs the same way as your application is bundling its code. Because this is still an experimental feature, I'll have to set in Cypress config file experimental component testing true and tell Cypress that my component tests live in the source file, a source folder, excuse me, next to the source files. So here's a numbers spec that will test numbers. I will import a mount function from Cypress React unit test, and I will import my component numbers from numbers file. And then here's where the magic happens. I'm going to mount my numbers component using mount numbers. After it's mounted, which is asynchronous command, I will take each digit, each number from one to nine, and I'll write a Cypress contains command, because then I can use any Cypress command to interact with my application. And it is a real full application. The numbers component is mounted and scaffolded and runs as a mini web application inside Cypress, as you can see in this screenshot. But the numbers don't look right. They don't look nothing like the component in the real application, and that's because we don't have styles. We only mount in the numbers. In our application, the app CSS is imported by the app component. Because we don't have app component, we're working with just numbers right now. We'll import app CSS ourselves from the spec files, and it will include it in the scaffolded app. Now, our numbers component will render more accurately, but not perfectly yet, because as you can see, it's all spread out. It is because our component and our styles assume a certain dump structure. So in order for us to render numbers component accurately, we have to surround it in a div of a class in a container and in a section with class status. In this case, the mount will be exactly what our CSS and the dump structure expects. And now I can see those numbers on a screen inside my real browser, the way they look in a real application. Great. What about all the props and clicks? When I'm mounting numbers from my test, I can pass a property on click number, and I can create a stub so that whenever numbers is interacted with, like right here, contains sorry, it's right behind. I will actually get the click back. And I can see in a command log, in Cypress command log, I can see that those stubs were actually involved on user click. Excellent. So the last piece of input to my component is the context provider, where the data, like selected number, is fetched from. So in this case, when I'm mounting, I'm surrounding numbers component with Sudoku context provider, and I'm setting the number selected to four. And then I can see that my number in the dump will actually have the class that I expect it to have. So this test confirms that the component is working as expected. But now is the crux of the matter in my talk. What if I change CSS or selectors or the dump structure or the layout parameters? Just a little bit. My application will still probably work, because I didn't break the logic. But does it look right? You can kind of see that for numbers component, it's easier to say, yeah, it's just numbers, and if a selected number should be blue, and it should be in the grid. But what about bigger components? They have a lot of nice, unique styles in this case. And if I interact and have different context properties, they'll look differently, because the models are set. Will changing CSS for one component suddenly affect some other component in another part of the game? And what about the entire application? It looks really, really nice on desktop. Does it still look as nice on a desktop? Does it still look good on tablet? And does it still look good on mobile? Do you manually go through your application every time you change a little bit of CSS? You probably cannot. So the trick here is to understand that you only have to do it manually as a human being wants. When you work on application, when you're designing with CSS, you want to look at the application in your browser and say, yeah, this is what I want. Computers cannot do that automatically. So instead, when you're happy with your result, you want to save a screenshot of your application, say this is what I want my application to look like. But computers, on the other hand, are really good at another task. Instead of saying this looks right, they are very good at saying this looks exactly the same pixel by pixel like it used to look before. So we substitute the problem of does this look good or correct with does it look the same? And this is what computers can do. So in our component test, I will install one more free and open source plugin called Cypress Image Snapshot. I will add this plugin to my support file and in my plugins file. What it will do is that in this test that we had before, where we set one number as selected and confirm that that number in the DOM has the class status number selected. After that, we do one more thing. We now have this match image snapshot command that comes from this plugin. And this command, first time you run this test, will save that section of a rendered DOM into an image. It will save it in Cypress snapshot name of a test folder. You should commit this image file into your source repository right next to the source file because this image will tell you this is how the application of this component should look from now on. Now let's do a change. Let's say someone goes into a CSS file and changes the padding from 12 pixels to 10 pixels. How will this affect all our components? Well, you run the test and that command to match snapshot now will generate an image that looks slightly different. It will save that difference into a div output folder as an image itself. And this image has three columns. On the left, you see the baseline image. That's the one you looked at before and stored in your source repository. On the right, you see the current image. And in the middle, you see pixel by pixel difference, with red being very, very different pixels and yellow being pixels that are slightly different. And now you can tell what has changed visually, right, as a human being. And now you can say, yeah, this is what I wanted. Oh, no, I don't want this. And it's easy to say, oh, just compare all components pixel by pixel and that's the end of the story. And unfortunately, it's not. You have to generate precisely the same image snapshot every time you run the test. In our game, we have a timer component. As you can see, it just counts the seconds from the start of the game. Well, how do we take a snapshot of a timer? We can probably take a snapshot of a timer at time 0, 0, 0, right? Our matching snapshot might be fast enough to just capture it as soon as the timer displays 0, 0. But what if you want to wait for 10 minutes? What if you want to take a snapshot of a timer and sometimes it takes this second and sometimes it takes on the second second, right? What if we catch this transition? We'll create an image with different pixels and it'll be a very flaky test. So what do we have to do in this case to generate precisely the same image pixel by pixel? We have to control the data. In this case, we have to stop the clock. And Cypress includes this command in its API. It's called CyClock. So once we freeze the clock, it frees all the intervals, all the timers, everything. Then we can mount the application, confirm that status time 0, 0 is displayed, and then take a snapshot. And this is the image we get. Then in addition to CyClock, what freezes the clock, we can fast forward the clock by, in this case, 700 seconds. And once we fast forward, it still stays frozen. So we fast forward the clock instantly. The application updates itself because it thinks 700 seconds passed, and then we can take an accurate snapshot from constant data and it generates the same timer snapshot. But that's not everything, right? We've looked at small components. Why not take the snapshot of the entire game board? After all, our game is a tree of components, and we already have written tests for these small components. Why not write a test for the app component? In a single snapshot, you would confirm that the entire board with all data and all components and all the UI elements looks the same. But it's not that easy because every time you click on new game, it actually creates a new game by definition. So it generates a different board, and you cannot compare both boards. They will always have pixel difference because of different numbers. So how do we mark the board generation? Well, we have to look at our source code. In this game.js file, we can see that getUniqueSudoku is imported from another module, and then it's used to generate initial array of numbers and the solved array of numbers. So I went to DevTools, and in one iteration of the game, I just grabbed those variables and I saved exactly those variables as two JSON files in Cypress fixture folder. Then inside my test, I import that module wildcard as an object. And then in that object, I can use side-stop, the same approach that I used to stop click handlers. And I said on that ES6 module, stop getUniqueSudoku method and always return the same arrays. Now freeze the clock, take the snapshot. From now on, every time I run the game, it will generate exactly the same board. I can build on that. If I have the same board, I can play a move, confirm it was done, and then take another snapshot and now there's a snapshot of a board with a single move. Even better, Cypress has a time travel on debugger. So when working with component test, I can hover over each command and see what happened during the test. What did I click? How the board has changed? How does it look now? I can see everything that's going on during my React component test. Now I talked about component tests. I showed how to set up visual snapshotting test, and I talked about how to control your data so you get the same pixel by pixel images. Now let's talk about how it works locally and on CI. So first problem. If I run Cypress in interactive mode, I see the results, and I look at the screenshot that I saved, I can see that their resolution is actually twice as large as if I ran Cypress in headless mode on Mac because of pixel density. So the first trick I do when I work with snapshot locally is I actually disable them. I don't take them in interactive mode because their resolution will be twice as large as in headless mode, even on the same Mac. So instead, I skip them. I can see where I skipped, and every time I want to add a new screenshot image, I just run Cypress run headlessly. If I have an updated snapshot and I really want to save a new image, I run Cypress headlessly and I set an environment variable that tells the plugin to update the snapshot and not fail on differences. Good. This is what I do locally, right? But then I push my code and my snapshot images to a continuous integration server, and guess what? There is a pixel by pixel difference. Even the timer on the left, you can see the output of a headless screenshot on Mac. On the right, you see the output of headless screenshot on Linux. In the middle, there are slight differences in font rendering, in restoration, in aliasing. I cannot recreate the same pixel by pixel content on Mac, Windows, and Linux. So the trick here is to use exactly the same environment, exactly the same operating system with exactly the same libraries and fonts and browsers, version, everything, locally and on CI. So what happens is that in my package.json, instead of just saying yarn cypress run, I set up a command that runs cypress test using Docker. And we have a Docker image called cypress included. So I don't have to install anything. So every time I want to update screenshots or add new ones, I actually run that command, which starts our Docker container and runs everything. When I run things on CI, I run them in a container that matches exactly that image. Cypress excluded is just built on top of cypress browsers. So I know that all operating system dependencies, all fonts, everything is the same. The rendering should be the same. So on pull request, I don't fail the images. I just let them all be generated. And then I use a little script to post a GitHub status check on snapshot. In this case, there was one visual difference. And when updated, no visual snapshot diffs. So everything is good. So in summary, Cypress React unit test is great for component testing, visual snapshots with cypress image snapshot. We talk about marking data in the flow. And to be honest, I love cypress image snapshot, but there is a lot of effort to get everything working. So if you can, consider a third party paid service. Thank you very much. You can find the slides online and you can find the example in a repository. Thank you. Thank you, Gleb. I'm going to mop my head because my brain just exploded. What a great talk. Would you mind coming up and answering a few questions live, please? Absolutely, Ruth. Thanks for watching, everyone. It was amazing. Thank you. We have a few questions for you. We like to call this the five minutes ruthless interrogation. If I do make you cry, just tell me to stop. First question, which we have from Dennis or Dennis 1975 and similar questions from other people's. In which situations would you use Cypress over another component rendering package like Storybook, for example? So we love Storybook at Cypress. We use it ourselves. All right. It allows you to mount a React component and design it, fit it different inputs. Maybe take a screenshot and compare how it looked before. And with Cypress, we always want to say, mount a component and then click on a button like a user would. Interact, see what the component does. Maybe it makes a network request. So if you want to interact with a component, you probably want to give Cypress React unit test a try because your component is mounted and becomes a mini web application. That's why you would use it. Another question. Will we be able to use real WebGL with Cypress? So that's not what Cypress controls. Generating a screenshot, we believe it should generate and include WebGL part. But I have an experiment, so I cannot confirm. After you generate an image, it doesn't matter what generated the image. Was it a DOM? Was it a canvas? Was it WebGL? At that point, it's just pixels. And after that, you can compare them. I think the trouble with WebGL is that you don't know when the rendering has stopped. When to take a screenshot. So somehow when you render WebGL scene, you have to maybe set a property or expose some kind of observable attribute where Cypress knows I have to wait for that and then take a screenshot. If you can do that, then I think the image comparison should work just fine. Thank you. A question from somebody called Metin. How pleased was your team when Microsoft announced that it was moving to the Chromium engine? We were very happy. So the new Microsoft Edge browser is built on Chromium, that means Cypress can open that browser and control it using the same approach, the same flags, the same hooks that we use to control regular Chrome and Electron. So that really allows us to say we support multiple browsers. And the fun part about that is that we did not do it ourselves. It was a user pull request that actually implemented that. And when we just polished it up, it was complete user contribution into open source project. So we were like really blown away by this particular contribution and props to Microsoft team for moving forward with the browser. Props to the Microsoft team and props to the user contribution, the beauty of open source there. Another question is what can we do to avoid flakiness and how can we detect that automatically? Excellent question. So in Cypress, we fight really hard to make individual commands as flake free as possible. Cypress has a built in retry mechanism, but we'll wait for the button to be enabled and visible before clicking. So we take care of that. But there are other sources of flake. For example, when you test the full application that makes requests to the server, comes back, refreshes the UI, there are so many things that can temporarily go down and then back up. Maybe the server was busy, did not respond. Maybe the network went down. Maybe the browser had a hiccup and did not render. In that case, out of 100 big end-to-end tests, you might have one that fails occasionally. Now you rerun it and it always passes. So you don't want to block the pipeline and yet you have to do something. So we are doing two things to solve it. First, in our dashboard, we now have analytics that will show you all the historical information about each test so you can decide, is this test reliably failing? Should it be failing? Is it non-flake? Or should I just ignore it? And we're also adding a complementary test retries. It's already available in Cypress as a plugin where if you run a test, you can designate a specific test and say, if it fails, rerun it up to, let's say, two times. Or you can enable it globally. So we are adding it as a core feature. So just watch for our release notes. It's coming soon. You'll be able to control it globally or per test. I think that will solve all this outside of your control flakiness that is so frustrating. I'll unmute myself. Thank you. And now we've tempted you into talking about what may be coming up in the future. A question from Iraldia. No, a question from Davulca. What roadmap do you have for the next releases? So we have lots of features coming up. We have a link on our documentation to roadmap. The big ones is solidifying what we released as experimental flags. We already have experimental flag for shadow dump support, which was a huge feature. We have new experimental flag that is coming out on Monday with a window that fetch polyfill. And this is a temporary measure before we release full network stubbing, but will allow you to do anything you want from your test. Stub static resources, control how you stub GraphQL, all those things. So look at our documentation. We have a roadmap. We keep it up to date. And you'll find out about all the features that are coming up. Up to date documentation. These are words that I love to hear, but seldom find. Thank you, Gleb. Last question, as you can tell, of the four MCs, four of us are avid cooks. What is your favorite comfort food? If for example, a build breaks or GitHub goes down. Does beer count as food? Because I love beer. Does beer count as food? I'm an Englishman. Beer is food. Absolutely. Absolutely. Totally. I mentioned Iraldea had a question. It is a getting in the nitty gritty question. They say we found that a Docker run can be fairly slow on a commit hook. Any way to give confidence on a commit basis rather than just PR? Look at our documentation. You can define what test you want to run. It probably is more involved than just saying run all the tests. So it's up to you what you have confidence in. Cypress allows you to specify which test files you want to run using a spec parameter. So maybe you want to run just a few sanity tests on every commit and only if they pass, then run a full set of tests. I think that will speed up your testing and yet give you full confidence. You probably should run the full set of tests on pull request and on master branch or main branch as we renamed our branches. But for smaller commits, you probably want to run at least a small subset of tests. Nice. Another question about the future. Will there be support for React Native in the future? We don't have any concrete plans. As you can see, even React web application support is still an experimental feature. You probably will be able to run a lot of tests while the component is rendered into the browser. Once it's rendered into mobile application, we don't have any plans. It's a very different platform. So we have no plans to dominate mobile testing yet. Gotcha. Ten people upvoted this question. How do you see the difference between Cypress and Puppeteer plus Jest? Well, we would love for everyone to write more Puppeteer tests, more Jest tests, and more Cypress tests. We really are not competing to take away Puppeteer's market share. We think that 90% of developers are not writing any end-to-end tests. So Puppeteer can take 45%. We'll take 45%. I think the difference is that with Cypress, you have a built-in, specifically targeted at testing end-to-end testing tool with time traveling debugger, with cross-browser support, with all the nice assertions built in. You don't have to install anything. We already installed everything we configured and we tested it to the kazoo. And we have all the tools and all the niceties for running it on CIS, for observing the results, for debugging failed tests. With Puppeteer and Jest, you do have some parts of it, but it becomes not a single system, but becomes a combination of two disparate systems that you will have to maintain. Sorry. Sorry, muted. Lastly, this question has had 10 upvotes as well. Somebody asked why you use jQuery. And I'll give a bit of background to this. But recently I had to use, I was on a project that was using jQuery and I loved it. It was lovely to use a little library that did what I wanted rather than make me do what it wanted. And I know that jQuery isn't necessarily the cool thing, but I think the HTTP archives almanac showed that it's on 70% of the websites or something like that. So why are you still using it? Well, there is nothing wrong with jQuery. It allows you to work with elements on a page for finding them in a nice battle-tested, reliable manner. So we can argue about every other technology, but for finding an element on a page, jQuery is perfect. Now you might say, well, isn't it heavy or doesn't it add overhead? Well Cypress bundles so many things in order to control the browser and run the task. A single jQuery in a desktop download that you do just once will not add any weight. It's a nice library that if we are not happy with, we actually rewrite parts, but the API stays the same. So jQuery is just a great tool, so we'll use it. Yeah, I mean, like you said, it's been battle-tested for as long as... It's been going since I didn't have white hair. It runs on all browsers. And yeah, what I love about it is it does what it's told. It doesn't demand that I architect in a certain way or change the way I work. Gleb, it's been fabulous to talk to you. I'm going to wave my magic whisk and transport you back to the speaker's room, because I know that there's going to be a Zoom call where paid ticket holders will be able to interrogate you further. Hopefully I haven't given you too much of a horrible time. Thank you very much for your knowledge and answering our questions. Thank you, Bruce. Thanks, everyone.
35 min
18 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic