In this talk, we'll discuss how Wix is using Detox internally, how we manage configuration, how we fight flakiness, and some best practices we've developed over the ~3 years of building and using Detox in our CI process. We'll also discuss our endless striving for "0 manual QA", which always seems in reach, if we only overcome that one last technical hurdle.
Detox: The Unobtainable Test Stability (or is it?)
AI Generated Video Summary
Detox is a grey box testing solution for mobile applications that manages sync between test code and the app, eliminating the need for manual synchronization. It follows the grey box testing approach used by Espresso and Earl Grey. Wix's mobile app architecture consists of four types of parts, each with its own independent CI process. Test isolation and input consistency are important for stable end-to-end testing. Android emulators perform better on Mac VMs with nested virtualization, but our next-gen setup includes Bare Metal Mac minis for improved performance. Detox is primarily used for React Native applications and has limitations, but there are learning resources available. Detox supports camera functionality and provides solutions for debugging test fails and running tests on device farms. Currently, Detox supports simulators for iOS and devices connected to the computer for Android, with work underway to support real devices. The test APK for Detox connects to the production app and Node service, and Expo is responsible for providing a test APK for Android.
1. Introduction to Detox and Motivation
Detox is a grey box testing solution for mobile applications that manages sync between test code and the app, eliminating the need for manual synchronization. In this talk, I'll explain how Wix uses Detox internally, including configuration management, fighting flakiness, and best practices developed over three years. We'll also discuss the drive towards zero manual QA.
Hello, everyone. I'll start a bit with the motivation for this talk. Detox is a grey box testing solution for mobile applications. It manages sync between test code and the app, so the users don't have to do it manually. Despite eliminating the users' need to do so and the abundant documentation and guides, developers and testers can still get tripped upon bad usage patterns, misconfiguration, and suffer from poor test stability. And we feel this pain every day internally at Wix.
So in this talk, I aim to explain how Wix is using Detox internally and how we manage configuration, how we fight flakiness, and some of the best practices that we've developed over almost three years building and using Detox in our CI process. We'll also discuss the endless drive towards zero manual QA, which always seems enriched if you only overcome that just one last technical hurdle.
So hi, I'm Rotem. I'm a software engineer working at Wix. In the past four years, I've been working on Detox since its inception. In the past two and a half years, I led the team behind Detox. And recently, I left this role in order to join Wix's server infra group. In the picture, you can see me with two of my favorite side projects, especially now during the third full lockdown in Israel.
2. Detox: Solving Flaky End-to-End Tests
Detox was built at Wix to solve the problem of flaky end-to-end tests in React Native applications. It follows the grey box testing approach used by Espresso and Earl Grey, which involves running a synchronization mechanism inside the app to detect its busyness and idleness. This ensures that actions and expectations only occur when the app is idle and nothing will change until the next user action.
So in order for us to get everything into the right context, we'll start at the beginning. Detox was built at Wix in order to solve a growing problem with flaky end-to-end tests, especially on our then new React Native application. The main idea was to follow the approach successfully executed by Espresso and Earl Grey. These are two grey box testing projects created by Google for Android and iOS respectively. As opposed to black box testing, where the tester needs to decide how much time they wait for the app to finish what it's doing before sending the next action or expectation, where the grey box approach includes running a synchronization mechanism inside the app under a test to detect the busyness and the idleness of the process. Then it only interacts with the app when it is considered idle. Meaning it has no more events to handle, no more network requests, no more animations, no more transitions, it's actually doing nothing. This approach guarantees that any action or expectation that with the app will only happen when the app finishes processing everything and nothing will change anymore until the next user action.
3. Wix Mobile App Architecture and End-to-End Testing
In Wix's mobile app architecture, there are four types of parts: the engine, the UI library (React Native UILib), modules, and other libraries. Each part has its own independent CI process, and when everything looks good, a new version can be published. End-to-end tests are split into different test suites and executed at different stages of the development cycle. The first type of end-to-end tests is production end-to-end, which ensures consistent features and tests on every test execution.
Now that we have a slight overview of how everything is laid out, let's discuss testing. I will not discuss unit tests because those, though we have a lot of them in all the projects and they run all the time, but they mostly are very easy to run and they're very, very cheap to run. On the end-to-end side, we differentiate between a few types of tests and split those into a few test suites and execute them in different timings and different stages of the development cycle. Here are four types of end-to-end tests that we have. So the first one will be production end-to-end. Those are fully functional, those run on fully functional app with very minimal mocking. Wix use a lot of experiments and A-B tests all around, and with this type of tests, we want to make sure that we get the same features and tests on every test execution because they can differ on different runs of the app. So we developed an experiment override mechanism that enables configuration of predefined experiments blob and at the beginning of every test, and this launches the app with these experiments running, thus ensuring that we never get different behavior while the app is under a test.
4. End-to-End Testing and Test Selection
Production end to end run whenever a module owner wants to GA, publish their own work into the full app. Mocked end to end tests. These tests are actually just with mocked server endpoints, they can run on specific module or on the entire application. We usually just do it on a single module. They do not interact with production environments, this controls the mock server and we're able to control the mock servers outputs and tests. And test the actual module behavior with in predefined states. The big upside is the ability to control all the inputs to the test, and then we have consistent inputs with the local mock server, which guarantees consistent output of the tests, usually more stable and a bit faster than production end to end, and this run on CI on every push to the modules code base.
We also incorporate screenshot testing, those are mostly used in our shared UI library, the UI lib. Detox doesn't have any sort of screenshot comparison mechanism, and we take those, it just knows how to take screenshots on demand. We know how to do both device level screenshots and element level screenshots to be compared with external libraries, we use Apple tools in order to get smart comparisons to avoid false positives on slight pixel variations. It's worth mentioning here that slight pixel variations is not an uncommon issue. It can happen by comparing screenshots taken with two different graphics cards or drivers for instance. So if you take something with the local dev environment and you try to take a baseline on that machine and then run the test on the CI machine, you'll have different screenshots. This can easily fail and not be handled properly.
So component testing is another type of testing that is done. Some of the modules incorporate testing of actual React components states. These load components with their own states and then switching state and props programmatically throughout the test. This is being done on top of Detox with Compot. It's our React native component testing library. And these run on CI on every push to the module's code base, just like we do with mock end-to-end.
5. Stable End-to-End Testing and Configuration
Two important aspects for stable end-to-end testing are test isolation and input consistency. Test isolation ensures that each test starts fresh and is not dependent on previous test execution. Input consistency handles the various inputs and configurations that may affect the app's behavior during different test runs. Detox's configuration files are crucial but fragile, and misconfiguration can lead to subpar results. To address this, we plan to provide a basic shareable configuration with Detox. CI for Mobile remains a challenge, but we have tested it on CircleCI, Travis, and Bitrise with good results.
The second is education, and the latter was mostly directed internally at Wix engineering team, but some of it also reached our documentation. So I want to go a bit over of the second one. So two things that are very important for us to get or to have when we want to have stable end-to-end, I want to have test isolation. This is probably one of the most important tips that I can give you. Make sure that every test is starting fresh, make sure it is not dependent on execution of a previous test.
The second one is input consistency. An app might have multiple inputs that might change the behavior when running various responses from servers. Some application might have experiments and A-B tests executing on different code paths in the app, causing it to behave differently during different runs. So you must make sure that these are handled and configured in your tests. And they're identical throughout all the iterations of those tests. Whenever it's locally or in CI. At least we developed an experiment override mechanism that helps us predefining those experiments and ensuring it works well while the app is under a test.
So regarding configuration, as Detox becomes more and more mature, we get more features and tighter integration with Jest, which we usually do that in order to leverage some features available in Jest or new cool features in Detox. Those configuration become harder to handle. Although the documentation itself is pretty good, it's very hard to keep track of the config. So as mentioned before, this is actually all the configuration files that we have in Detox, in one of our internal projects. So as mentioned before, Wix app is built from many independent developed modules. Each has its own git projects and build configuration in CI, and each of these was required to add the Detox configuration files into its own repo. Most of these were similar config files, and during the last six months, we got to a realization that it makes zero sense to keep those this way. And we went for and unified the configuration for everyone. We created a library that helps us share everything. These configuration files are an extremely important yet fragile part of Detox. It is easy to misconfigure. And whenever it's misconfigured, you get subpar results. For those reasons, we have decided to supply some sort of basic shareable configuration right out of the box with the open source Detox. And we hope to do this sometimes in the near future. This is something we already do internally. CI for Mobile is one of those things that are still pretty hard to nail right out of the box solution. So we've tried all the major SUS CI solutions running iOS E2E on either CircleCI, Travis, or Bitrise. It's pretty easy.
6. Optimizing Android Emulators and CI
Android emulators perform better on Mac VMs with nested virtualization, but still lack hardware acceleration. Our current CI solution uses VMware ESXi on Mac Pros with Mac OS VMs, but performance is slow. Our next-gen setup includes Bare Metal Mac minis for improved performance. We recently released support for Genycloud SAS devices, which offer better performance, scalability, and the ability to offload device emulation. However, Genycloud is a paid service. Our main challenge is test execution speed in CI, especially for elaborate end-to-end tests. We aim to automate and speed up the release process.
And the big problem was with Android emulators. The real insight here is that Android emulators run better on Mac VMs than on the offered Linux ones. They are guaranteed to run with nested virtualization on Macs, which is required for running x86 emulators. We basically run a VM, the Android x86 emulator inside a VM which is the Mac OS so it requires nested virtualization. But those still miss hardware acceleration and the resulting with poor emulator performance although it works.
So the current solution for our in-house CI includes VMware ESXi on Mac Pros on InMix stadium with Mac OS VMs. These work for both iOS and Android as VMware supports nested virtualization. But performance for those machines is still pretty bad. Compilation time is slow and test parallelization is somewhat limited. We cannot use a lot of simulators or a lot of emulators on the same machine. And by that slowing down, or not slowing down decreasing the amount of time taken to run it.
So our next gen setup includes Bare Metal Mac minis such that run everything on the host OS. These have much better performance at least two times faster than the VMs on an Intel based Mac mini. Hopefully we can do it even faster with Mac minis with M1 chip. On the Android emulator front, we recently released support for a Genycloud SAS devices and we orchestrate all those with D-talks right out of the box. By using Genycloud emulators, we gain four key benefits. The first is we drop all prerequisites for the CI machines as these emulators run remotely on Genycloud infrastructure. The second is we get better performance per individual emulator. In our case, it decreased a 35 minute suite with two workers running Android emulators to 15 minutes with two workers running with Genycloud emulators. The third is maybe the biggest benefit is the ability to scale infinitely and run as many workers as we want in parallel. So all the SaaS CI options that we talked about now become valid as they can be used for builds and test orchestration and offload all the device emulation to Genycloud. The main caveat with Genycloud is the fact that this is a paid service, even for open source projects. So the holy grail for the release process is to have it all automated and fast. As we continue to solve our problems, we get just a bit closer to the goal on each step. The main issue we're facing today is the test execution speed in CI, especially for elaborate production end-to-end switch. This is absurd. The absurd is that it is faster for our QA engineers to go over the test suite manually than to wait for 150 minutes for everything to finish. So they actually do it. This will not stay this for long though. With just a bit more work, we'll get much faster test with execution with potentially infinite scale.
Detox: Limitations and Learning Resources
Detox is a framework-specific tool primarily used for React Native applications. While it may not be the best choice for native applications or those written in different frameworks, it still has its advantages. When it comes to more complex flows, such as using device features or disabling internet, Detox has limitations, especially on iOS. However, there is documentation available to help users learn and navigate the tool.
Thank you. Thanks for having me, Alex. What did you think about the poll? I know Detox was in the top there and I kind of expected Detox to be the number one, what do you think about the other bit of it? Okay, so I guess two things. One is that I wonder what the other is, of course, and the second is that I know that Detox is probably the first choice when it comes to React native applications, but when it comes to native applications or applications written in different frameworks, then I see a point of using something that is not Detox. So yeah, it's not a total surprise, I guess, but the other is very interesting indeed.
That actually makes a lot of sense. The point about Detox being a really framework-specific tool, and from your talk you saw that. I've got a bunch of questions from a bunch of people. I'm gonna manual their names, so I'm not gonna focus on their names. I'm gonna just focus on the questions we've had for you during your talk. Don't forget, if you've got questions for Rotem, you can still go to the talk Q&A channel and Discord and ask a question. The first one was, I started working with Detox recently. It's pretty easy to start via getting started, but what about more difficult flows, for example, using device features like camera, disabling internet, blocking it, and unblocking the device? There are very few materials regarding Detox compared with other tools. Where and how can I learn the best Detox? That was a big question. So let's start small. How do I disable and enable things on the device, like blocking it and unblocking the device with Detox?
Yeah. Okay, so let's see. So regarding iOS, so Detox still supports simulators and it doesn't really run on devices yet. We have a plan and what we did with Detox17 and Detox18 is prepare ourselves to run on real devices on iOS. Now that doesn't solve the issue of disconnecting network and disabling things similar to that. And actually, I'm not really sure what has this kind of ability. It would only be, as far as I understand, it would only be to some kind of mock the network layer and in some way. On Android it's a bit more achievable, but yeah, we didn't really take this into account while developing Detox. So, we don't really use those, but if you think that this is like a super feature that you really want and think it's very important, then of course, let's have a discussion in our GitHub issues and let's see how we can make this happen.
That actually makes a really good point. I remember from my testing days, we didn't always try to test interaction with the device. The app was the most important thing. The app was the object of our test, right? What happened with the phone while we were using the app wasn't really the purpose of it. So how about materials? Where can people go to kind of find advanced detox materials? So I think that the documents are, I mean, there's a lot of documentation. And if you go, you'll find pretty good gems over there.
Test Stability and Camera Functionality
There is a lot covered in the documentation, but there are also additional resources available on Wix's Medium. A recent blog post explains how to make tests super stable at Wix, including the collection of artifacts during testing. Regarding device and network functionalities, such as camera, Detox does not have any specific issues. If you have a camera module in your app, it should work seamlessly. Detox focuses on testing your own application, not system applications.
I'm pretty sure that there is a lot covered in the documentation. There are also things that I wrote previously on Wix's Medium that I think are a bit different from what's in the documentation. You can go and read those as well. Things regarding, I just recently published this week, the beginning of the week, a blog post about how we make our tests super stable internally at Wix. So what are the ground rules for making super stable tests? And I go over all kinds of features that help us achieve that.
Now, all kinds of artifact collection, artifacts that we collect during tests to make sure that we understand what's going on in during testing, especially when tests fail. And these are I think good materials as well. And so you said things about documentation of, what was that? About device and network like mocking network, bad network, which this is something that we don't really have. What were the other things, Alex?
So I think the examples mentioned specifically were device features like camera disabling internet and blocking and unblocking the device. So camera is something that... I'd be curious about the camera one, yeah. I don't see any issues specific to camera. If you have a camera module in your app, you can just open it. If you run emulator, then you can make sure that your emulator is connected to the camera or to like a mock camera that Android emulator provides. If it's a genie motion, it does the same. If it's the actual device, then it can open the camera as well. So if you have a camera module in your app and you open it, it should just work. There is nothing that we disable from like making it work, as far as I understand. I understand. So how do you deal with things like, for example, my phone has three cameras. It's a new iPhone. It has three cameras. I have to select which camera I want from every app that use the camera, right? If I use light, for example. Oh, yeah. There are two types of things that you wanna add, I guess. If you have, there is a way of going to the camera app and taking a picture. And this is something that, Detox usually just concentrates on the application that you let it work with. So it's your application, not other system applications. You don't wanna actually test those system applications. You wanna test your own application.
Camera Functionality and Debugging Test Fails
Detox supports fetching pictures taken with other cameras or camera apps, as well as using specific camera modules like React Native camera kits. When debugging test fails, it is important to have consistent inputs and isolated tests. This ensures consistent outputs and makes it easier to identify and debug issues. When it comes to running Detox tests on device farms, there are solutions available for both iOS and Android.
So if you wanna take like pictures that were taken with other cameras, with other camera apps, then you can go and fetch those. Again, in your app if there is a way for you to go to the gallery and get those. And the other one would be if you have a specific camera module like React Native camera kits or something similar that opens the camera module as a view in your application. And with this, I don't think we have any limitation. I think it just works.
Okay, that's really good. That's really good. We've got a bunch of interesting questions. One of them says, what's the best practices of debugging detox test fails? There's a comment at the end that it says it fails very often in our pipeline whereas running locally is fine. So how do you deal with debugging test fails? What's your best practice for that? Okay. So this is exactly, I think. There is an exact answer of this with the latest blog post that I just published this week. Again, using all the tools that we provide, all those artifact collection tools and understanding the output of the tests themselves like the outputs of what detox outputs to the log. And there are a few ground rules of making sure that you have a base level of good testing setup and this is also written over there. But I would say in a nutshell, this is something that I ended the talk with is have consistent inputs. So you can have consistent outputs, that's for sure for any tests, not only end to end tests, but for every test. You don't wanna have different time in the clock for every test that you run. You don't wanna have like different inputs. That's a problem. And the second one would be have isolated tests. So it would be much, much easier for you to debug issues when they arise, because you know that your test begun on a specific screen and not on a screen that you don't really know how you got to. If it's like a residue of a previous test. That makes a lot of sense. That makes a lot of sense.
There's a question around device files that support DTOX. Someone says we have to run DTOX test on emulators in our pipeline. Are there any solutions for device farms that support DTOX? Running on simulators doesn't provide the same level of confidence as running on real devices. Right. So this one is a long answer as well. So I'll split it again for iOS and Android.
Detox Support for iOS and Android
Detox currently supports simulators for iOS, but work is underway to support real devices. For Android, Detox supports devices connected to the computer, but there is no out-of-the-box support for device farms. However, Detox recently added native support for launching Geny Motion emulators, resulting in a significant improvement in test time. While running tests on actual devices has its advantages, running on Geny Motion Cloud has proven to be efficient and comparable to real devices. Detox can be used with X-Box React Native, and Expo is responsible for providing a test APK for Android.
So for iOS, we still support only simulators, but we're working on support for real devices. This would be with like wrapping or at least adding XEUI test to the mix that it would be what launches DTOX inside the iOS application. So for iOS, we don't really do that yet. Though we have sort of a plan with a big device farm provider, but it's still not something that we can make sure that, or at least not be sure that it will happen, but we really do want it to happen.
And for Androids, it does support devices and you can on every connected device on your computer, you can just run detox tests. I know that it's not the same as running on a device farm and this is not really supported. There is no device farm that supports detox out of the box, but we recently added support, like native support for launching Geny Motion emulators. These are different from Android emulators. They're also x86 but they run on designated machines on Geny Motion Cloud.
So for us, I know that you said that it's not the same as running on real devices and I agree to some extent, though we didn't really find a lot of issues specific to devices. Most of the issues we find are actually issues with the app, with the SDK, specific SDK levels. with Android. But the fact that we run on Geny Motion Cloud now, we started doing it internally at Wix and this dropped our end-to-end tests from, let's say I'm just taking an example. One of the suites took 35 minutes, we dropped to 15 minutes with the same worker count. So we're the same two emulators that run on that machine. With Geny Motion we were able to run with two workers in Geny Motion Cloud only on the 15 minutes for the same test suite. So it's like 60% improvement in test time. I know this is not exactly the answer you wanted to hear but I just had to talk about this. To be honest, it actually makes a lot of sense. For example, where we were running tests on a lot of Panda boards instead of rigging up actual devices in racks. We tried that for a while but it wasn't efficient and Panda boards gave us 99% the same deal, so it was good.
I've had so many complicated questions for you. I feel bad about picking the complicated ones. So I've got the simple one, your last question of the day and then you're done with me. I've got the really simple yes or no question. Can detox be used with X-Box React Native? Okay, yes. But we were contacted by Expo a few times in the past. For iOS, I think it should just work with the current setup. There isn't something that should be an issue as far as I understand. And with Android, it is actually up to Expo to also provide a test APK, a pre-built test APK like they do with the production APK.
Detox Test APK and Expo Integration
The test APK contains Detox and connects to the production app and Node service. Expo was involved in the setup, which requires maintenance for new versions. Setting up in the Expo library should work. The talk was engaging with a Q&A session. Thank you Rothem for the insightful talk and challenging questions.
So the test APK is the APK that has detox actually inside it and it knows how to connect to the production app and ask questions and connect it to the Node service that we run and ask those questions. So we did speak with Expo in the past, I think about two years ago and it was done but it's something that needs to be maintained every time we have a new version, we don't really have breaking changes in Android, at least not in the past year I think.
So whenever someone will set it up in Expo in the library itself, yes, not like the project. Whenever someone from the Expo team will set it up, it should just work.
Thank you, thank you. That actually makes a lot of sense, the picture makes a lot of sense. Thank you Rothem, I've had a blast with your talk. The Q and A session was really engaging. Thank you so much, Rotherm. Thank you so much. Hard questions, thank you for those. Yeah, really hard questions, really hard questions. Thank you, thank you so much. Thanks. Thanks. Thanks. Thanks.