1. Learning from the Past
The answer to how I manage that with 4Kids is magic, and that magic is my wife. The topic is fighting test flakiness with time machines. We learn from the past. A lot of testing is actually the process of learning from the past. If you are doing SRE, you probably have conducted like a post-mortem meeting after an incident. How to learn from the past? We can either use our own memories, try to recall what has happened. We can talk to someone who witnessed something of interest. We can try again. We can try to recreate the past.
Thank you. Thank you very much. The answer to how I manage that with 4Kids is magic, and that magic is my wife.
All right. So, the topic is fighting test flakiness with time machines. So I have a question for you, for the very beginning. Would you like to travel back in time? Raise your hands. Who would like to travel back in time? All right. What would you like to do if you would travel back in time? Change the project that you're working on? That's a very software development answer. Who else? We got anyone else who raised their hand? You can shout out. Buy some Bitcoins. Buy some Bitcoins, all right. Those are great answers.
For me, actually, I would go to my own life and apply the knowledge that I have today. So yeah, maybe buy some Bitcoins or, you know, like when I'm arguing with my brother and I have a clever comeback, but like three months later, I would want to use that, you know, so like there, there you go. The point is, that I'm trying to make, is that we learn from the past. We are now at TestJS Summit and you might ask, like, what does this time traveling have to do with anything? It's that we learn from the past. If we can time travel and apply the knowledge we have today, we can do better. A lot of testing is actually the process of learning from the past. For example, if you write a bug report, you document the things that happened, you try to look into the past, and so you can try to fix them. With test automation, we use all kinds of different data and see traces of the test run. We, again, look into the past and apply that knowledge. If you are doing SRE, you probably have conducted like a post-mortem meeting after an incident. And all that is basically learning from the past.
All right, so now the question. How to learn from the past? If you think about it, we only have like a couple of options. We can either use our own memories, try to recall what has happened. We can talk to someone who witnessed something of interest. We can try again. We can try to recreate the past.
2. Using Memory and Communication
Or we can use a time machine. And all of these ways of traveling into the past are good, but they have some flaws. The biggest problem is that even with all of this information, it might still not be enough. This often leads to an issue which many of us already had. You might lack information, clear communication, and technical knowledge. So, let's move on to the next point.
Or we can use a time machine. And we do actually have time machines and I'm going to be talking about them. And all of these ways of traveling into the past are good, but they have some flaws.
So let's go to the first one, using your memory. So if you're writing a bug report, I recently asked on LinkedIn, how do you write a bug report? And these are like the different answers that came in, what the bug report should do. And it's a lot. And I haven't even included everything. It's like descriptions, steps to reproduce, screenshots, videos, desired behavior, et cetera, et cetera. The biggest problem is that even with all of this information, it might still not be enough.
Which leads me to the other thing, talking to someone. Especially this happens in a team where the testing and fixing the bug or reproducing and fixing the bug is distributed among many different people or more people. And there's a communication bridge we need to create. And communication is hard. It's no easy task. And even in real life, if you want to talk to someone, let alone if you want to try to convey complex information.
This often leads to an issue which many of us already had. Could not replicate moving to backlog. Anyone has experienced this? Yeah? Okay. Then it's not just me. And it can be so annoying for all parties involved. Oftentimes, you can have amazingly smart people, and it would still happen. Like a couple of hands raised here. You are all smart people and it has happened to you. And by no fault of your own, you might lack information, you might lack clear communication, you might lack technical knowledge. And again, by no fault of your own, there's always something that can get in the way of finding the issue and then fixing the issue and learning from the past. Learning what happened.
So, again, talking to someone overall is great. But when you want to learn from the past. But also has some flaws. So, let's move on to the next point.
3. Learning from Debugging and Time Machines
Trying again is another way of learning from the past. Debugging and trying to recreate the past can be challenging. You can use local development, debugger, or print statements. Time machines in test automation are exciting and can reveal hidden subtleties. They can show snapshots that provide useful information for fixing tests.
Trying again. That's another way of how we can learn from the past. AKA debugging. AKA trying to recreate the past. Now, we all done debugging. We know how difficult it can be. You can use local development, you can use debugger or print statements.
By the way, print statements or debugger? Print statements? Raise your hand. Okay. Good. Debugger? Okay. Oh! We got some fancy people. All right. Yeah. Everything and all that you do in hopes of recreating the issue, trying to replicate, maybe you try again and again and again in a context of test automation, it might be rerunning the test again. Walmart had a nice demonstration of how you can sort of burn in the test and try again to reproduce sort of like a flaky test or something like that.
And this brings me to, like, the final thing of how we can learn from the past, and that's time machines. I personally find time machines the most exciting, especially in context of test automation. Time machines can reveal subtleties that would otherwise be hidden from us. For example, time machines can show us dumb snapshots. This is a screenshot of a Cypress test that is failing. We have an eCommerce demo application where we want to buy this fancy sneaker. And the test basically we click a button, add to cart, and we want to see a message, like successfully added to cart. Now, this test is failing, but we can use the time machine, we can travel back in time and look at the click event. And what we can see in this short video is that when we clicked on the button, there was this counter that shows number zero. So we were actually trying to purchase zero sneakers. And this kind of information, this snapshot from the point where the click was made, can be really, really useful. So this is the test. And we can now change that test, use that information from time traveling, and fix our test. We can take the input and say, oh, we should, it should not have value of zero.
4. Troubleshooting Test Flakiness
The problem with API calls in Cypress tests is their asynchronous nature. We often click too early, resulting in unstable tests. Cypress provides a snapshot of the request and response time, allowing us to identify issues. Playwright's trace viewer helps with action observability and narrowing down test problems. Console outputs and test definition in Playwright's Time Machine are also useful for troubleshooting. These options emphasize the importance of addressing test flakiness.
And then we would continue on and click on this. We can also take a look at network requests, right? This is, again, the same Cypress test. The usual problem with these API calls is that they're asynchronous, right? We don't really wait for the response. They don't block us from doing stuff on the page. So maybe we clicked too early. That's actually exactly what happened in this test.
So what Cypress does with these XHR requests is that it will take a snapshot at the request time, and it will take a snapshot at the response time. So we here are zooming in on the check availability endpoint. So there's the endpoint that checks whether the sneakers are available. And then we, at some point, click on the button. Now, the response snapshot, you can see that we see the message failed to add product to the cart. And this demonstrates that we clicked on the Add to Cart button before the response from this check availability endpoint came in. And this, again, is information from the time machine, from the timeline, that we can use to make our test more stable. For example, in Cypress, we can use intercept to catch a certain API call, and then wait for that to happen to give us the response, and only then move on to the next test.
Here's another thing. Here's Playwright. This is a trace viewer that provides us with action observability. I'm not sure how well you can see that, but I'm pointing to this lower part over here. This is all the information, all the logs, of what happens when you try to click on something. So you can see that we are checking whether the element that we're trying to click on is available, it's not disabled, it's not covered by anything else, et cetera. And because our test is actually failing on the visibility check, right? We want to see a message appear and it's not there. We might be trying to look back and trying to look on the commands that were before, so we can sort of rule out that there's a problem with the click event or something like that. And, again, this helps us sort of narrow down the problems, and we can see what's going on with our test. We can see what's wrong. Another example, we have the console outputs. In this case, test flakiness may be pointing to a real issue, and that's important. And the real issue can be intermittent, but if we have the logs from our application, the console logs, this can help us a lot. Another example in the Playwright's Time Machine is the test definition. It helps us specifically if we have, if we're dealing with more complex scenarios. And yeah, so we've got all of these options, which sort of leads me to reiterate on a point that I made earlier.
5. Test Flakiness and App Flakiness
The overwhelming majority of people believe that if their app and server were completely flake-free, their tests would become less flaky. Corey House sums it up well by saying that adding automated tests to a flaky app leads to flaky tests. Test flakiness and app flakiness are important topics that need to be addressed.
Time machines are cool. All right. But there's one problem that I have been thinking about a lot, especially in relation to test flakiness in time machines. I recently asked people on LinkedIn the following question. If your app and server were completely flake free, what do you think that would happen? What would happen to your tests? Would they be less flaky, as flaky as before, more flaky, or something else? The overwhelming majority said that their tests would become less flaky. And perhaps no one has put it better than Corey House over here. Adding automated tests to a flaky app leads to flaky tests. And I totally agree. So we talk a lot about test flakiness. We talk that often about app flakiness. And we need to talk about it.
6. Introducing Replay.io
Our users can run into the same problems as our tests due to network or CPU issues. We often lack information, struggle with replication and debugging, and ignore the app when using time machines. The solution I propose is Replay.io, a time-traveling debugger that creates recordings of your application for analysis in the dev tools. Recordings can be created manually using the browser or CLI.
So let's go back to the solution from before, where I would intercept the API call. Something does not sit right with this test. Our users don't wait for API endpoints. And they can easily run into the same problem as our tests do, because they may have a slower network, or may have a slower CPU, or whatever. Which is not good.
So all of the ways of how we can learn from the past, we can always find some foot gun. We might have not enough information, we're not able to replicate, debugging is hard and frustrating, and also with the time machines, we're ignoring the app. So what's the solution then? Well, I think I might have one. And I want to show it to you.
So three months ago, I joined a company called Replay.io. You might have heard about them. And I believe it solves these problems. And you might call me biased. I know, because I work for Replay. But I became a fan before I joined the company. So I think... Well, you be the judge of that. All right.
So let me explain how it works. The principle is kind of simple. We have built a time-traveling debugger. And the idea is that you create a recording of your application. And then you open that recording in the dev tools. And you can create a recording in many ways. You can do that manually. So we have our browser that you can just open, hit the record button, and then start interacting with your application. We have a CLI that will open the browser for you. So you can, again, create that recording manually. And close the browser. And you have a recording available to you.
7. Using Replay Chromium for Recording and Debugging
You can use Replay Chromium with Cypress using the npx-cypress-run-browser-replay-chromium command. Additionally, you can use npx-playwright-test. The recording allows you to play back the interactions and view the executed code. It's a collaborative tool that facilitates communication between QA engineers and developers.
Or since it is a browser, you can actually use it with Cypress. And the npx-cypress-run-browser-replay-chromium would be how you would run that. And then npx-playwright-test. This would be the setup for Replay Chromium. By the way, these are all in our docs. So you can check those out. Also we support WebDriver, IOPuppeteer, or even Selenium. So, yeah.
That's the recording part, right? So what do we do then with the recording? Actually let me show you the debugging part of the experience. I'm going to close my presentation. Switch to mirroring the display. And hopefully we can move on. All right. This seems to be working. All right. So here I have a recording of the same application that I've been showing to you. We have this e-commerce store. And down below here we have a timeline. And I can play this like a video. Let me play that from beginning. Right? So you open the app and you can see me sort of iterating on the counter, hitting one, two, three, et cetera. That's where the recording ends. On the left side over here, you see that we have this click event. So it's not only a video. It's actually a recording of the interaction. Now what we can do here is to click on this blue button over here. We can jump to code which will switch me to this DevTools panel. And we can see the code that has been executed when we clicked on that plus button over here in the application. Now this is a collaborative tool. So if you are not on the test automation side, maybe you are a QA engineer on an exploratory session and you want to hand that over to the developer so that they will deal with that, you can sort of just go ahead and add a comment.
8. Replay Debugging and Sharing
This way they cannot tell you they are not able to reproduce it. You have the recording, the proof, and the code inside. The jump to code takes you to the triggered component. The gutter shows how many times the code was hit. Clicking the plus symbol shows instances of the code being called. Printing variables provides insight into their changes over time. The debugger allows for backtracking and debugging. Replay allows you to record the bug once and add print statements as needed. You can share the recording with others.
Like this is not a right image or hello or whatever. This way they cannot tell you they are not able to reproduce it. Because you have that recording, you have the proof, and you actually have the code inside here.
So what that means is that if you look over here, the jump to code has taken me into the component that was actually triggered when I was clicking on that plus button. Now here on the left, there's this gutter, and this number 10 that you see over here is actually telling me how many times this line of code was actually hit. So that's already useful information. It sort of works like code coverage, where you can see which lines of code were hit and which ones were not.
And what you can do is you can click on this plus symbol, and on the console you can now see all of the instances where that line of code was called. So there's this default text quantity control and line number nine. But what I can do to get some insight is to print out some variable. For example, this quantity over here. And now I can see the quantity variable that has been passed into this setQuantity function, and I can see how that's changed over time. And I can rewind back or go forward.
Now there are some people that are using debugger. Well, you can use debugger in here, not only print statements. So I added a breakpoint by clicking on this number nine, and now I can go back and debug this, right? So let's go to the timeline. Let's look at this Scopes panel over here. Wait, actually, let's jump into the breakpoint. We got the quantity two, so we can move forward, or we can move backward. Wait, let me wait for this to load. Now we got quantity three. Let's move on backwards to quantity two. So if you have used breakpoints and debugger, you might have accidentally clicked one more time that was needed. I guess everyone has felt that. So if you have that recording and you're debugging the recording, you can move back and forth.
Now usually when you debug, the way you do that is that you sort of modify your code. You add a console log somewhere or add that debugger statement in your code, and then you run that and try to replicate that and hope that you could get to that issue. And you need to do that several times. With replay, if you have that recording, you can actually just record, cache the bug once, and then add those print statements and whatever, anytime you need. Also, if you want to share that, you can click on share button and send it to whoever needs to fix that.
9. Replay Browser and Test Comparison
I want to show you how to create recordings automatically with our replay browser. Let's compare a failing test and a passing test. The failing test calls the add to cart function, which sends a quantity of zero to the server. By disabling the add to cart button when the quantity is zero, we can fix the test issues without modification. Replay solves the not enough information problem.
All right. One more thing I want to show you. I mentioned that the recording, you do that with our replay browser. And you can hook up the browser to your test suite. So if you are using Playwright or Cypress, you can just create your recordings automatically.
Let's now move on to this recording that I have made with the Cypress test. Now in this one, what I did is that I ran my test and it failed the first time, then it failed the second time, and then finally it was passing. So I can now compare my failing test and my passing test. If I go to the failing test and take a look at this click, again I have this jump to code function so I can see what my Cypress test was clicking on and what part of code was executed as the Cypress did that. And you can see here that it has called this add to cart function, which calls the add to cart API. And in the body, we're sending the quantity that's over here. Now the quantity is a state that's zero by default. And the state is mutated over here in the set quantity. We are setting that after we get the response from the check availability. So if we add a couple of print statements here, let me, I'm actually out of time so I need to make it quick. I'll set sent to server like this. We can see what has been sent to server, right? We've been sending the number zero. If I print out this data, QTI and print out availability, I can see what has come from the server, right? And this is a timeline so I can actually see in what order was this made. So we're sending number zero and I'm not sure why this is not loading but at least you can see we're live. If we take a look at the whole test, I can see the difference between the passing instance and the failing instance, which is now loading. So here we have the fail attempt, failed attempt and then finally we have the passing attempt where we finally are sending the information, ordering one sneaker and not zero sneakers. Time is blinking at me so I need to finish this talk. Basically my point over here is that something is happening in code. We have some asynchronous operations that are not playing well with our test. If we were to disable this add to cart button while we have the quantity set to zero, we would actually fix all of the issues with our tests because Cypress and Playwright, they check whether the button we want to click on is actually actionable. So we don't have to wait for the intercept. We don't have to assert that the number is not zero. The test would magically just pass without any modification.
Yeah, back to my presentation, I have final two slides where I want to demonstrate that if you basically replay solving the not enough information problem. If you can replicate it once you have enough information.
Debugging and Test Flakiness
Also the debugging, if you just record whatever happened, you can add a print statement, debugger. We have Redux, React DevTools, Redux State, etc. And most importantly, we're not ignoring the app we are testing. Is it slower to run tests on the replay browser over real browsers? And if so, how much? I don't know. The fact that flaky tests are hard to find is interesting. Have you seen it in the wild saving a lot of time maybe on a project where replay helped you find that flaky test? Recently we have been looking at flakes of a client of ours and we would try to find the reasons for those flakes. We have found a flaky test which should actually be failing because it was giving a false positive.
Also the debugging, if you just record whatever happened, you can add a print statement, debugger. We have Redux, React DevTools, Redux State, etc. And most importantly, we're not ignoring the app we are testing, which I think is a great mistake that is being done when we do the end-to-end testing and when we are dealing with test flakiness.
All right, that's my talk. Thank you very much. Come talk to us. We have a booth. And thank you. I know there were a bunch of questions. We definitely won't be able to get to all of them. But what we will do is afterwards people can find you over at the replay booth. Yeah, definitely go check it out. I'm going to go to the most upvoted question. We'll just do that one and then we'll call it.
Is it slower to run tests on the replay browser over real browsers? And if so, how much? I don't know. I haven't measured that. I don't think if there's a difference it's very significant. Essentially what we have is a fork of Chromium. So I don't think the difference is that big. But I could not tell you how big of a difference it is. And that kind of makes sense as well, because one question that I've been very curious about is specifically with flaky tests like first you have to find a flaky test. The fact that they're hard to find is interesting. And how much time has it maybe saved? Have you seen it in the wild saving a lot of time maybe on a project where replay helped you find that flaky test? One thing that I have left out of my presentation was another poll that I did on LinkedIn where I would ask like, do you spend more time debugging or writing your tests? And debugging tests was the overwhelming majority. Recently we have been looking at flakes of a client of ours and we would try to find the reasons for those flakes. Now a very funny thing happened. We have found a flaky test which should actually be failing because it was giving a false positive. It should be failing all the time, but it was failing intermittently. And the problem was that there was this accessibility check that was checking the site. But then the marketing team came in and added like a model, Hey, you want a new experience or whatever. That was not accessible.
Test Flakiness and Real-world Scenarios
The accessibility check can fail based on whether it captures the model window or not, even with millisecond differences. Test flakiness is a common problem due to running tests quickly. Slowing down the network and processor can simulate real-world scenarios for users with slow connections and CPUs.
So the accessibility check would be failing based on whether it caught the model window or not. And it should probably be failing all the time if you want to be like accessibility compliant. That's crazy as well because it's like that's probably like millisecond differences that would tell whether the test had passed or failed, which is crazy. I mean, all flakes kind of are. That's the problem because we're running our tests super fast, which might feel like this is unrealistic. But if you slow down everything, if you slow down the network or processor, I think that what happens on CI is just happening faster to the same thing happening faster that may be happening to some user with like very slow network and very slow CPU. That makes sense.
Testing Approach with Replay Browser
Would you recommend testing the whole suite over replay browser on every CI run? It would make sense, especially at the beginning when implementing and trying to get rid of flakes. Another approach is to run all tests and if any are flaky or fail, rerun with just that subset to catch and debug the problematic ones.
I'm going to go back to the audience questions. This kind of makes sense, especially when we talk about speed, because the person asked like, would you recommend testing their whole suite over replays browser on every CI run? I'm guessing that might take time, but I'm curious as to whether you would have that running. I think it would make sense, especially like at the beginning when you are implementing and you want to get rid of flakes. So if your flakiness rate is really high, then you can do that. But there's another way of how you can approach it. You can maybe run all of your tests. And if any of those were flaky or they failed, you can rerun with just that subset. So you can just catch the problematic ones and then debug it with DevTools. So I think that might be a good approach as well.