Detox: The Unobtainable Test Stability (or is it?)

Bookmark

In this talk, we'll discuss how Wix is using Detox internally, how we manage configuration, how we fight flakiness, and some best practices we've developed over the ~3 years of building and using Detox in our CI process. We'll also discuss our endless striving for "0 manual QA", which always seems in reach, if we only overcome that one last technical hurdle.



Transcription


Hello, everyone. I'll start a bit with the motivation for this talk. Detox is a grey box testing solution for mobile applications. It manages sync between test code and the app so the users don't have to do it manually. Despite eliminating the users need to do so and the abundant documentation and guides, developers and testers can still get tripped upon bad usage patterns, misconfiguration, and suffer from poor test stability. We know and we feel this pain every day internally at Wix. So, in this talk, I aim to explain how Wix is using Detox internally and how we manage configuration, how we fight flakiness, and some of the best practices we've developed over three and almost three years building and using Detox in our CI process. We will also discuss the endless drive towards zero from manual QA which always seems enriched if you only overcome that just one last technical hurdle. So, hi, I'm Rotem. I'm a software engineer working at Wix. In the past four years, I've been working on Detox since its inception. In the past two and a half years, I led the team behind Detox, and recently, I left this role in order to join Wix's server infra group. In the picture, you can see me with two of my favorite side projects, especially now during the third full lockdown in Israel. So, in order for us to get everything into the right context, we'll start at the beginning. Detox was built at Wix in order to solve a growing problem with flaky end-to-end tests, especially on our then new react Native application. The main idea was to follow the approach successfully executed by Espresso and Earl Grey. These are two grey box testing projects created by Google for Android and iOS respectively. As opposed to black box testing, where the tester needs to decide how much time they wait for the app to finish what it's doing before sending the next action or expectation, the grey box approach includes running a synchronization mechanism inside the app under a test to detect the busyness and the idleness of the process. Then it only interacts with the app when it is considered idle, meaning it has no more events to handle, no more network requests, no more animations, no more transitions. It's actually doing nothing. This approach guarantees that any action or expectation with the app will only happen when the app finishes processing everything. And nothing will change anymore until the next user action. In order for us to talk of how we incorporate end-to-end testing inside Wix, let's also discuss how Wix's mobile app is architected. So, in a bird's eye view, there are four types of parts which we build the app from. The first part is what we call the engine. It's the backbone of the application. It's an entity that includes all the native code and all the api registry for all the modules that are built on top of it. So, they can communicate with one another and with the engine. The infrastructure libraries used with the engine, most of which are actually open source, and the engine itself are usually written in languages native to the platform. So, it's Objective-C and Swift for iOS and Java and Kotlin for Android. Alongside, all this is wrapped with a unified javascript api. The second type is our UI library, aka react Native UI. This is an open source project. And this is the project which all the UI is built on top of, including and it includes a lot of javascript and native components. Module is a single product. It's a blog. We have a lot of them in the app. This can be a blog, store manager, CRM, chat. We have a few of those. And those expose an api, expose an api of itself to consumers. And also, it consumes the engine's api. Also, it consumes, of course, UI lib to build its own UI. And other modules, of course. And this is the actual product implementation. So, here we'll have most of the screens and the business logic and all of those will be defined inside a module. And the fourth type is some other libraries. Either if it's open or closed source. These are somewhat disconnected from the release process, actually. And they are because they are inherently just transitive dependencies of the abovementioned parts. And your releases are being updated manually with those. Mostly through PRs, to the engine, to UI lib or a module. And regarding testing, they're being tested and treated separately with their own build, with their own test suites. Each of those parts has its own independent CI process. And when everything looks good to us, a new version can be published for everyone to use. The final stage is to take all of those with the configuration files and build one big binary that includes everything. And this is the app released to Google Play and the App Store. Now that we have a slight overview of how everything is laid out, let's discuss testing. I will not discuss unit tests because those that we have a lot of them in all the projects and they run all the time. But they mostly are very easy to run and they're very, very cheap to run. On the end to end side, we differentiate between a few types of tests. And split those into a few test suites and execute them in different timings and different stages of the development cycle. Here are four types of end to end tests that we have. So, the first one will be production end to end. Those are fully functional. Those run on fully functional app with very minimal mocking. Wix use experiments, a lot of experiments and A-B tests all around. And with this type of tests, we want to make sure that we get the same features and tests on every test execution because they can differ on different runs of the app. So, we developed an experiment override mechanism that enables configuration of the predefined experiments blob. And at the beginning of every test. And this launches the app with these experiments running, thus ensuring that we never get different behavior while the app is under a test. Production end to end run whenever a module owner wants to GA, publish their own work, into the full app. Mocked end to end tests. These are these tests are actually just with mocked server endpoints. They can run on specific module or on the entire application. We usually just do it on a single module. They do not interact with production environments. This controls the mock server and we're able to control the mock servers outputs and tests the actual module behavior with in predefined states. The big upside is the ability to control all the inputs to the test. And then we have consistent inputs with the local mock server, which guarantees consistent output of the tests. Usually more stable and a bit faster than production end to end. And this runs on CI on every push to the module's codebase. We also incorporate screenshot testing. Those are mostly used in our shared UI library, the UI lib. DDocs doesn't have any sort of screenshot comparison mechanism. And we take those it just knows how to take screenshots on demand. We know how to do both device level screenshots and element level screenshots to be compared with external libraries. We use Applitools in order to get smart comparisons to avoid false positives on slight pixel variations. It's worth mentioning here that slight pixel variations is not an uncommon issue. It can happen by comparing screenshots taken with two different graphics cards or drivers, for instance. So, if you take something with the local dev environment and you try to take a baseline on that machine and then run the test on the CI machine, you'll have different screenshots. This can easily fail and have not be handled properly. So, component testing is another type of testing that is done. Some of the modules incorporate testing of actual react components, states. These load components with their own states and then switching state and props programmatically throughout the test. This is being done on top of detox with Compot. It's a react component testing library. And these run on CI on every push to the module's code base just like we do with mock end to end. So, how do we choose what to test? Wix app modules are written 100% with javascript or typescript. And this means that most of the module business logic runs on the same code on the two platforms. A lot of times, product developers find the bug related to the module manifests on both platforms similarly. So, the bug which manifests only on one platform usually is an issue with an infrastructure library. This means that module devs are able to get pretty good coverage by only running it on one platform on top of their module. This is not, however, this is not, however, mean that the whole app can be tested on one platform. All of Wix's infrastructure libraries are running tests on both iOS and Android. So, in order for us to trust end to end tests, they must be very stable. But also provide a good insight of what's happened and why the test fails. Over the years, we try to improve this in two ways. The first is by developing a series of tools to help us figure things out. View hierarchy dumps, trace logs, videos, test timeline, graphs. And also trying to make it easier for users to understand why tests hang when they do. And it's usually caused by some sort of something that keeps the app not idle. Either if it's an animation or a network request or something similar. The second is education. And the latter was mostly directed internally at Wix engineering team. But some of it also reached our documentation. So, I want to go a bit over the second one. So, two things that are very important for us to get or to have when we have want to have stable end to end. I want to have test isolation. This is probably one of the most important tips that I can give you. Make sure that every test is starting fresh. Make sure it is not dependent on execution of a previous test. The second one is input consistency. An app might have multiple inputs that might change the behavior when running. Various responses from servers. Some application might have experiment A-B tests. Experiments and A-B tests executing on different code paths in the app. Causing it to behave differently during different runs. So, you must make sure that these are handled and configured in your tests and they're identical throughout all the iterations of those tests. Whenever it's whenever it's locally or in CI. At Wix, we developed an experiment override mechanism that helps us predefine those experiments and ensuring it works well while the app is under a test. So, regarding configuration, as Detox becomes more and more mature, we get more features and tighter integration with Jest. Which we usually do that in order to leverage some features available in Jest or new cool features in Detox. Those configurations become harder to handle. Although the documentation itself is pretty good, it's very hard to keep track of the config. So, as mentioned before, this is actually all the configuration files that we have in Detox in one of our internal projects. So, as mentioned before, Wix app is built from many independent developed modules. Each has its own Git project and build configuration in CI. And each of these was required to add the Detox configuration files into its own repo. Most of these were similar config files. And during the last six months, we got to a realization that it makes zero sense to keep those this way. And we went for and unified the configuration for everyone. We created a library that helps us share everything. This configuration files are an extremely important, yet fragile part of Detox. It is easy to misconfigure. And whenever it's misconfigured, you get subpar results. For those reasons, we have decided to supply some sort of basic shareable configuration right out of the box with the open source Detox. And we hope to do this sometime in the near future. This is something we already do internally. CI for mobile is one of those things that are still pretty hard to nail right out of the box solution. So, we've tried all the major SaaS CI solutions, running iOS on either CircleCI, Travis, or Bitrise is pretty easy. And the big problem was with Android emulators. The real insight here is that Android emulators run better on Mac VMs than on the offered Linux ones. They are guaranteed to run with nested virtualization on Macs, which is required for running x86 emulators. We basically run a VM, the Android x86 emulator inside a VM, which is the Mac OS. So, it requires nested virtualization. But those still miss hardware acceleration and the resulting with poor emulator performance, although it works. So, the current solution for our in-house CI includes VMware ESXi on Mac Pros in Mac Stadium with Mac OS VMs. This works for both iOS and Android as VMware supports nested virtualization. But performance for those machines is still pretty bad. Compilation time is slow and test parallelization is somewhat limited. We cannot use a lot of simulators or a lot of emulators on the same machine. And by that, slowing down or not slowing down, decreasing the amount of time taken to run it. So, our next gen setup includes bare metal Mac minis, such that run everything on the host OS. These have much better performance, at least two times faster than the VMs on an Intel-based Mac mini. Hopefully, we can do it even faster with Mac minis with M1 chip. On the Android emulator front, we recently released support for a GenyCloud SOS devices. And we orchestrate all those with detox right out of the box. By using GenyCloud emulators, we gain four key benefits. The first is we drop all prerequisites for the CI machines, as these emulators run remotely on GenyCloud infrastructure. The second is we get better performance per individual emulator. In our case, it decreased a 35-minute suite with two workers running Android emulators to 15 minutes with two workers running with GenyCloud emulators. The third is maybe the biggest benefit is the ability to scale infinitely and run as many workers as we want in parallel. So, all the SaaS CI options that we talked about now become valid, as they can be used for builds and test orchestration and offload all the device emulation to GenyCloud. The main caveat with GenyCloud is the fact that this is a paid service, even for open source projects. So, the holy grail for the release process is to have it all automated and fast. As we continue to solve our problems, we get just a bit closer to the goal on each step. The main issue we're facing today is the test execution speed in CI, especially for elaborate production end-to-end suites. This is absurd. The absurd is that it is faster for our QA engineers to go over the test suite manually than to wait for 150 minutes for everything to finish. So, they actually do it. This will not stay this for long, though. We just a bit more work will get much faster test suite execution with potentially infinite scale. Thank you. Thanks for having me, Ali. What did you think about the poll? I know Detox was in the top there, and I kind of expected Detox to be the number one. What do you think about the other bit of it? Okay. So, I guess two things. One is that I wonder what the other is, of course. And the second is that I know that Detox is probably the first choice when it comes to react Native applications, but when it comes to Native applications or applications written in different frameworks, then I see a point of using something that is not Detox. So, yeah, it's not like a total surprise, I guess. But the other is very interesting, indeed. That actually makes a lot of sense, the point about Detox being a really framework-specific tool. And it's like from your talk, you saw that. I've got a bunch of questions from a bunch of people. I'm going to mangle their names. I'm not going to focus on their names. I'm going to focus on the questions we've had for you during your talk. Don't forget, if you've got questions for Rotem, you can still go to the talk Q&A channel in Discord and ask your question. The first one was, I started working with Detox recently. It's pretty easy to start by getting started. But what about more difficult flows, for example, using device features like camera, disabling internet, blocking and unblocking the device? There are very few materials regarding Detox compared with other tools. Where and how can I learn the best Detox? That was a big question. So, let's start small. How do I disable and enable things on the device, like blocking and unblocking the device with Detox? Yeah. Okay. So, let's see. So, regarding iOS, Detox still supports simulators, and it doesn't really run on devices yet. We have a plan. And what we did with Detox 17 and Detox 18 is prepare ourselves to run on real devices on iOS. Now, that doesn't solve the issue of disconnecting network and disabling things similar to that. And actually, I'm not really sure what has this kind of ability. It would only be, as far as I understand, it would only be to some kind of mock the network layer in some way. On Android, it's a bit more achievable. But yeah, we didn't really take this into account while developing Detox. So, we don't really use those. But if you think that this is like a super feature that you really want and think it's very important, then of course, let's have a discussion in our GitHub issues, and let's see how we can make this happen. That actually makes a really good point. I remember from my testing days, we didn't always try to test interaction with the device. The app was the most important thing. The app was the object of our tests, right? What happened with the phone while we were using the app wasn't really the purpose of it. So, how about materials? Where can people go to kind of find advanced Detox materials? So I think that the documents are, I mean, there's a lot of documentation. And if you go, you will find pretty good gems over there. I'm pretty sure. There is a lot covered in the documentation. There are also things that I wrote previously on Wix's Medium that I think are a bit different from what's in the documentation. You can go and read those as well. Things regarding, I just recently published this week, the beginning of the week, a blog post about how we make our tests super stable internally at Wix. So, what are the ground rules for making super stable tests? And I go over all kinds of features that help us achieve that. All kinds of artifact collection, artifacts that we collect during tests to make sure that we understand what's going on during testing, especially when tests fail. And these are, I think, good materials as well. And so you said things about documentation of, what was that? About device and network, like mocking network, bad network, which this is something that we don't really have. What were the other things, Alex? So I think the examples mentioned specifically were device features like camera, disabling the internet and blocking and unblocking the device. So camera is something that could help. To be honest, I'd be curious about the camera one, yeah. I don't see an issue specific to camera. If you have a camera module in your app, you can just open it. If you run emulator, then you can make sure that your emulator is connected to the camera or to like a mock camera that Android emulator provides. If it's Genymotion, it does the same. If it's the actual device, then it can open the camera as well. So if you have a camera module in your app and you open it, it should just work. There is nothing that we disable from making it work, as far as I understand. So how do you deal with things like, for example, my phone has three cameras. It's a new iPhone, so it has three cameras. I have to select which camera I want from every app that uses the camera, right? If I use Lightroom, for example. Yeah, there are two types of things that you want to add, I guess. There is a way of going to the camera app and taking a picture. And this is something that Detox usually just concentrates on the application that you let it work with. So it's your application, not other system applications. You don't want to actually test those system applications. You want to test your own application. So if you want to take pictures that were taken with other cameras, with other camera apps, then you can go and fetch those. Again, in your app, if there is a way for you to go to the gallery and get those. And the other one would be if you have a specific camera module, like react Native Camera Kits or something similar, that opens the camera module as a view in your application. And with this, I don't think we have any limitation. I think it just works. Okay, that's really good. We've got a bunch of interesting questions. One of them says, what's the best practices of debugging Detox test fails? There's a comment at the end that it says it fails very often in our pipeline while running locally is fine. So how do you deal with debugging test fails? What's your best practice for that? Okay. So this is exactly, I think, there is an exact answer of this with the latest blog post that I just published this week. Again, using all the tools that we provide, all those artifact collection tools and understanding the output of the tests themselves, like the outputs of what Detox outputs to the log. And there are a few ground rules of making sure that you have a base level of good testing setup. And this is also written over there. Okay, that's cool. I would say on a nutshell, this is something that I ended the talk with, is have consistent inputs so you can have consistent outputs. That's for sure for any tests, not only end-to-end tests, but for every test. You don't want to have different time in the clock for every test that you run. You don't want to have different inputs. That's a problem. And the second one would be have isolated tests. So it would be much, much easier for you to debug issues when they arise, because you know that your tests begun on a specific screen and not on a screen that you don't really know how you got to. If it's like a residue of a previous test. That makes a lot of sense. That makes a lot of sense. There's a question around device files that support Detox. Someone says, we have to run Detox tests on emulators in our pipeline. Are there any solutions for device files that support Detox? Running on simulators doesn't provide the same level of confidence as running on real devices. That's a good point. This one is a long answer as well. So I'll split it again for iOS and Android. So for iOS, we still support only simulators, but we're working on support for real devices. This would be with like wrapping or at least adding XE UI tests to the mix. It would be what launches Detox inside the iOS application. So for iOS, we don't really do that yet, though we have sort of a plan with a big device farm provider, but it's still not something that we can make sure that or at least not be sure that it will happen, but we really do want it to happen. And for Android, it does support devices and you can on every connected device on your computer, you can just run Detox tests. I know that it's not the same as running on a device farm and this is not really supported. There is no device farm that supports Detox out of the box, but we recently added support, like native support for launching Genymotion emulators. These are different from Android emulators. There's also x86, but they run on designated machines on Genymotion cloud. So for us, I know that you said that it's not the same as running on real devices and I agree to some extent, though we didn't really find a lot of issues specific to devices. Most of the issues we find are actually issues with the app, with the SDK, specific SDK levels with Android. But the fact that we run on Genymotion cloud now, we started doing it internally at Wix and this dropped our end to end tests from, let's say, I'm just taking an example of one of the suites took 35 minutes. We dropped to 15 minutes with the same worker count. So with the same two emulators that ran on that machine, with Genymotion, we were able to run with two workers in Genymotion cloud only 15 minutes for the same test suite. So it's like 60% improvement in test time. I know this is not exactly the answer you wanted to hear, but I just had to talk about this. To be honest, it actually makes a lot of sense. For example, where we were running tests on a lot of Panda boards instead of rigging up actual devices in racks. We tried that for a while, but it wasn't efficient. Panda boards gave us 99% the same deal. So it was good. I've had so many complicated questions for you. I feel bad about picking the complicated ones. So I've got the simple one, your last question of the day, and then you're done with me. I've got the really simple yes or no question. Can Detox be used with XBox react native? Okay. Yes, but we were contacted by Expo a few times in the past. For iOS, I think it should just work with the current setup. There isn't something that should be an issue, as far as I understand. And with Android, it is actually up to Expo to also provide a test APK, a pre-built test APK, like they do with a production APK. So the test APK is the APK that has Detox actually inside it. And it knows how to connect to the production app and ask questions and connect it to the node service that we run and ask those questions. So we did speak with Expo in the past, I think about two years ago, and it was done, but it's something that needs to be maintained every time we have a new version. We don't really have breaking changes in Android, at least not in the past year, I think. So whenever someone will set it up in Expo, in the library itself, yes? Not like a project. Whenever someone from the Expo team will set it up, it should just work. Thank you. Thank you. That actually makes a lot of sense. That actually makes a lot of sense. Thank you, Rotem. I've had a blast with your talk. The Q&A session was really engaging. Thank you so much, Rotem. Thank you so much. Hard questions. Thank you for those. Yeah, really hard questions. Thank you. Thank you so much.
36 min
15 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic