Testing in Production



Hi everyone, I'm Talia and today we're going to talk about how to enable tests in production. We're going to talk about what testing in production is, how to set it up, and common pitfalls that people usually run into. So this is my contact information, my Twitter and my email in case you guys have questions later. But a little bit about me, I'm a developer advocate at Split and I used to be a test engineer and I worked in QA and automation and testing for a while before I joined Split. And being a test engineer was really difficult for me because most of the problems that I had revolved around staging and using this dummy environment and staging isn't the same as production. So I would have so many problems and these are some of the problems that I dealt with that I'm sure most of you have dealt with too if you've dealt with any sort of test environment, any sort of QA environment, anything that's not production. These are some of the things that made it really hard for me to do my job. So the first problem was data mismatch. So the data and staging doesn't match production which means test results don't always match. So I used to work really hard on making sure I tested every single product requirement and I would go through the documentation with the product donor and I worked with my developers to fix all the bugs, make sure my end to end tests were passing and then I would sign off on the feature and as soon as it's launched to production, there would be a bug. And it's such a horrible feeling when there's all this pressure on you to make sure that your feature works in a dummy environment. And then the next thing with data mismatch that happened to me was something called configuration drift and what this is is let's say that you get paged one night because there's an incident for your app and you look at the logs and you identify the problems but in order to fix it, you have to update a specific configuration in production and so you make the change in production and you go back to sleep. And although you fixed the issue, you've just created an even bigger divide between your staging and your production environment. So this divide is called configuration drift and many times staging environments are not the same as production because of changes made during incident management which just adds to a bigger configuration drift. And I felt like what's the point of testing and staging if it's not going to give me the same results as production? The next thing, the next problem I had was staging was really slow. There was just really bad performance and a lot of times when you're writing tests and staging, you often have to add waits and sleeps because things take longer to load. For example, like click on a button, wait 10 seconds for something to happen, perform this action, wait another 10 seconds for something to happen. Your user is not going to wait 10 seconds for something to appear. In tech time, that's crazy talk. So that's not how my users are going to interact with my features in production. So why make that different in staging? Nobody cares if staging is down. This is another reason, another thing that I had to deal with is that I would be assigned to test different issues, to test different hot fix tickets. And these were just critical bug fixes that needed to get immediately released to production. So I would log into staging to test it, but staging would be down. So I have to ping the DevOps guy, but the DevOps guy says you need to open an IT ticket and then the IT ticket has to get escalated by my manager. And meanwhile, all I'm trying to do is test this ticket for our product and nobody seems to care. It's not a priority for anybody. Nobody's going to get a call in the middle of Thanksgiving dinner as staging is down. And I was so fed up with dealing with a really bad staging environment and a really bad testing experience and being blamed for when things didn't work. And I thought there has to be a better way to test software. My end users are not going to log into staging to use my application. They're going to log into production. So I did a ton of homework and I researched what other companies are doing. And this is what I learned. So the first thing is that it's the norm for companies to use staging environments, especially companies that are still waterfall. The next thing is that most companies use more than one staging environment. So staging, pre-prod, beta, most companies have more than one. And big name companies like Google, Facebook, Netflix, Twitter, they're all testing in production. And when I read that, I thought, what is testing in production? Like how is that possible? What do you mean testing in production? So testing in production means testing your features and the environment that your features will live in, not using a dummy environment like staging. And I thought, wow, this is so perfect. This is going to solve all of my problems. And I also learned that testing in prod doesn't mean you only test in prod. So you're still going to use staging for GDPR and SOX related data and privacy issues. And I thought like, this is perfect because what I can't test in production, I would just test in staging, but those critical user flows, I can run those in production. And I thought, this is great. Like how do I do this? What are the steps to get there? And the answer was feature flags. And a feature flag is basically just a way to separate code deployment from feature release. And the idea here is you deploy your code to production behind a feature flag, test it in prod, and then release the feature with a click of a button as soon as it's bug free. So how does it work? This is kind of what it looks like. So our developers would create a feature flag from the UI and then target all of our internal teammates. And what that means is that only the users who are inside of the feature flag while the flag is off will be able to have access to the feature. So here you can see devs, testers, product design, only they are going to have access to this new feature while the feature flag is off because they're the only ones who are targeted. These people on the right, these real end users, they can't see anything related to the feature because they're not targeted in the feature flag. And so while the feature flag is off, you go in and you test everything. So you test all of your functionality, you test your design, you go through all the requirements, make sure everything works. If there's a bug, it has no impact on your end users because again, they don't have access to it. They're not targeted. So when there's a bug, you send it back to your development team, they fix it, you test it again, and that process will continue until you have a bug free feature in production. And then once you know your feature is working in production, you can turn on the flag already knowing that your features are working in production 100% and you didn't break anything that was existing. And now your users are happy and they're dancing because they have a perfect feature. And I thought, this is all wonderful. This is such a great process. Feature flagging is great, but how do you automate it? You can't possibly manually test every feature every time you release. And with feature flags, you have this added complexity, right? So how do you automate it? And there's two options here for automation, and I'm going to go through both. So the first time is that you target your test users and automate the flows with them. So what that means is here, when you target your users, you also create an automation robot, just a test user that's going to be used to run these tests in production. So every time this test user logs in, they have access to this new feature. And what's great about this option is that the test will continue to run even when you turn the feature flag on, you won't have to do any additional configuration. The only downside to this approach is that there is increased fragility because if someone removes that user from the targeting list or from the allow list in the feature flag configuration, then your tests are going to fail. So you just have to make sure that if you add that user, that no one is going to remove it from the configuration. The next option is to override your feature flags and make a custom feature flag abstraction. So basically what this means is that for each feature, you have three tests. So in the first test, you simulate the feature flag on. And for this test duration, if you get any request asking if the feature flag is on, so like if the test comes in and says, hey, is the feature flag on? You say yes, and then you run the test that way. And then the second test, you simulate the feature flag off. And if any requests come in from the test asking if the feature flag is on, you say no, and you run the test that way. And then in the last test, you want to validate that you can go through the entire flow regardless of if the flag is on or off. And so with this approach, you're very explicit in the test, and the test becomes much more self-documenting and descriptive. So whenever any test runs using feature flags, the system under test is going to fake out all the variants in the experiment. And because it's fake, you're going to reduce the complexity of the different scenarios, which means faster tests. So basically what you're doing is you're setting the state of the flag for the duration of the test. And then when you run your tests in production, you want to make sure that your tests only interact with other testing entities, right? This is something a lot of people fear, is that, you know, I don't want to affect real people and real users in production. So what you do is you have a backend flagging system, something like is test user equals true, is test equals true, something that clearly identifies all of your testing objects in production. And that way you separate real data from test data in your data dashboard. So let's say you're using Datadog or Looker or whatever. When your data comes in, you can create a dashboard in Datadog or Looker or whatever you're using, and you can say everything, all of the business logic that's coming in that has for test users is going to be in this bucket and all the data that's coming in for real users is going to be in this bucket. And that's how you can differentiate between real data and test data. So like my stakeholders are going to look at the real data while I, as a tester, and my engineers and my engineering team, they're going to look at the test data and see if there's any bugs that were caught, see what needs to be updated in the test. Like those two need to be separated, and this is how you separate them. And what's great about this is that the tests are looking for specific elements with these specific test attributes in production. So if the test doesn't find that test thing in production, it's going to fail and you're going to get alerted. So this can be something like an ARIA label or a data attribute, just something that you can say, this is a test thing and this is a real thing. There are exceptions though. If your software is integrated with a third party, it can be tricky to test. You can create a unique header in the API request that you send to the third party and say, hey, any requests that you get with this header is a test and I want you to treat it in this other way. So sometimes you have to make exceptions when you're testing in prod, maybe send an email confirmation to a specific place rather than to the end user. Sometimes you have to make those changes, but it's worth it when you're testing in production, when you're testing in a live environment. A question I get a lot of times is how do you know what to test in production? So there's two places to start. The first one is to go to your product person, go to your product manager and ask them, what are the most important business flows in our product? So which features give us the most business value? What gives us the most revenue? What's the most important thing for our product that we need to make sure that this works all the time? The next place is to go to your data analyst, your data scientist and figure out what are people doing the most? And keep in mind, these are two separate things. So what are people doing the most that if it breaks, you're going to have a lot of issues, you're going to have a lot of problems in production. So between those two lists, you should have a really good idea of where to start and which flows to test in prod. And then besides feature flags, there's some other dependencies that you need. So you're going to need an automation framework. So you don't want to manually run every test, you want to have that process automated, because you need to know when something fails, and you need to know right away. And with the speed of automation, that it makes that really easy. I think that's pretty self explanatory. You also need a job scheduler. And I'll go through a couple of my recommendations. But you need a job scheduler to run your tests incrementally. And you can have two different sets of tests. So your most important tests that run every hour because they're business critical. And then you can have nightly tests that run, you know, every night, because they're less critical. The next thing you need is an alerting tool to alert you when your tests fail. And just an alerting tool that can be integrated with your job scheduler that says, you know, hey, this test fails, go figure out what's going on. So these are the recommended tools that I've used for testing in prod. So for feature flagging, obviously, I recommend split. There are other there are other tools, and I'm happy to talk about those. But for an automation framework, my absolute favorite is robot framework. And robot is a keyword driven automation library. And if you haven't checked it out, you should. Cypress is a JavaScript library. You know, there's a few other ones for JavaScript that I've used like puppeteer. I've had some problems with in the past. So I wouldn't recommend puppeteer. But there's also Angular for protractors. I said that wrong. Protractor for Angular applications. But obviously, my favorite is robot. It works with most applications. And then for your job scheduler, you know, there's Jenkins and CircleCI, Travis, I don't have a preference between any of these. For your alerting tools, there's pager duty, Slack. And again, you can customize these to say, you know, for those business critical alerts, I want to use pager duty. And for, you know, those maybe warnings or things that look kind of weird in the test, use Slack. Okay. So we went through the how. And this is this was the entire testing and production process. We went from A to Z, how to set it up, how to make sure that your tests don't interact with real end users, how to differentiate that data, how to set it up with feature flags. And I thought this makes total sense to me. But if it's so simple, why isn't everybody doing this? Why isn't everybody testing in production? And the truth is that people are scared. Companies don't test in production because of this fear and lack of trust in their systems. And for the same reason, they refuse to invest in the tools and process changes that are going to generate that trust. They're too afraid of the risks. And there's a few things you can do to mitigate the risks of testing in prod. So the first one that we talked about is using feature flags. So target your internal teammates, test with them. So this is also called dogfooding. And then turn on the feature already knowing that your feature works and you didn't break anything that was existing. Next thing you can do is a canary release, which is just a percentage rollout. And it allows you to release your feature to a small subset of users before you release it to your entire user base. Because if something goes wrong, would you want 100% of your users to encounter the issue or 1%? The next thing you can do is start with an AA test, which means you give both sets of users in and out of the feature flag the same experience and make sure that the data that's coming in is the same for both. And what this is going to do is it's going to build your confidence in the feature flagging system. And obviously start small. Don't start out with your most complex flow and decide to test in prod. You want to start with something simple. And the outcome of testing in prod is you can release faster because you just press a button and your feature is released. You don't have to go through an entire release cycle because your code is ready. You just separate deployment from release. And the next thing is that you have increased developer velocity. So your developers spend more time creating new features and less time fixing bugs. And this just leads to an increased confidence and increased team happiness. And if I haven't convinced you that this is a good idea, I would like everyone to think of the last feature that your team deployed. Is it working? Right now? In production? How do you know? Your users haven't reported anything to you. So you don't know. Testing in production is the only way to know that your features are working in production right now. And oftentimes, shifting your company's testing culture is the hardest part of this process. So getting over that fear is a really big, big part of this. So what I would suggest is start using feature flags. Go to split.io, click on free developer account, and you can start using feature flags and see if it works for you. And in case you haven't been paying attention at all for the past 20 minutes, I want you to take away two things. The first is that nobody cares if your features are working in staging. We care if it works in production. And the only way to know if it's working in production is to test it in production. So thank you guys so much for listening. And I'm here for questions. You can follow me on Twitter, send me an email. And thank you. Thank you so much for that talk. Thanks for taking the time to chat with us. We do have some QA from our audience. You ready to jump in? Yeah, I'm ready. Let's do it. All right. So RDM asked, Talia, but how do you and your team deal with critical bugs that impact on the whole site or page? So you mean like after we do the whole testing and production process and we have a bug in production? So if that happens and you're using a canary release, it'll only affect a small percentage of your users. But what's more important is if you use feature flags to test that feature in production ahead of time, you're not going to have these big production issues anyway. You'll be able to test in the environment that the feature will live in. So you won't have those surprises. You won't have those big production issues. And if you do, you can feature flags have something that's called a kill switch, where you just turn the feature off. It's like the click of a button and you just turn the feature off. And you don't have to redeploy any code. You don't have to like revert anything in GitHub. It's just you press a button and the feature is off. So the damage is like very, very minimal when you're using feature flags. That's awesome. That's a great way to, I can just think of like all the ways that I would use, especially like sometimes, yeah, if you have the application I work on, for example, has a bunch of bugs that show up for specific accounts that are super hard to test in local because you're going to have to like test all of, you know, like replicate all of the conditions. That seems super useful. To Cren asked, environments in my experience are as much about company structure as it is about tech. Do you agree? If so, what kind of organizational changes do you think would be necessary to facilitate this approach? That's a great question. That's a really good question. So in terms of the organization, I feel like there has to be a couple of things that need to be in place. So the first thing is your team needs to have a solid automation framework. And I talked about this a little bit in my talk, but you need to have a solid testing practice in place with automation in place. You can't just start testing in production and not have any automation set up. So that's like a big part of the company's testing culture. Another thing is people need to want this to happen. Like if you have people on your team who are really against testing in production and they really don't understand the value, those two things that I suggested at the end, like use examples from your past, like ask them, have you, remember when we tested this thing in staging and it worked perfectly and then as soon as we launched a production, there was this issue or, you know, think of times where like your staging environment was down and you had to test something and, you know, use examples from your past. And then also if you haven't gone to split.io, you can create a free developer account and start using our SDKs. It's like super useful. I also have a ton of tutorials on there. So yeah, that's where I would start. But I also will say like there's always going to be people who say testing in production will never work and you have to use staging. And usually it's like those old people at the companies who have like been there for like 20 years and like don't really like change. So ignore those like few people. For the most part, this is like a really innovative practice. And if done correctly, you can, I mean, the benefits are just endless. Yeah, for sure. Yeah, it's always difficult to drive that organizational change, change people's minds, especially if they've been burned by the thing that you're trying to do. Yeah, testing and production is much different now than it was years ago. Yeah, because we have the tools that enable us to do it safely. Like we're not just, you know, throwing code into production and like, okay, let's see what happens. It's very secure, it's very planned. Yeah, yeah, yeah, you can do it now. Tom asks, do you remove flags after a feature is released and working? Yes. So I actually wrote like a whole blog post on when to sunset a feature flag and when to deprecate a feature flag. So basically, depending on the use case that you're using a flag for, that's when you know when to remove the flag. But for testing and production, once a feature is completely released, and it's released to 100% of your population, and you know it's working, then you can remove the flag. And you don't want to have like old feature flags in your code base. Right, of course, yeah. William asked, this one I think is really interesting. I work with a lot of libraries that we have to update across products. William's question jumped out to me. He asked, how do you manage dependencies between feature flags? OK, so you basically, when you're targeting your automation bots inside of your feature flags, you just make sure that you target the same bot in the different feature flags that you need. So let's say you have like user flow one and user flow two, and they're two different features, but they're dependent on each other to work. So what I would do is I would target my test user in both feature flags so that when that user runs the automation for both flags, when those tests run, that like you'll know if anything fails, because it's the same user that's running the tests. Nice, that's great. Yousef said, great talk. I agree. It was great. Is this practice usually paired with trunk-based development? Yes, yes, yes, it is. OK. Thomas asked, we're evaluating feature flag services now, and we're looking for one that supports multiple dimensions for describing users. Dimensions in parentheses, I think it's. For example, we'd like to change flags for specific users across all free tiers versus pro-tier customers or even globally across every user. Does split support this? Sorry, can you repeat that? Yeah, it was a long question. So it sounds like Thomas is evaluating feature flag services, and they are looking for one that supports multiple dimensions for describing users. For example, we'd like to change flags for specific users across all free tiers versus pro-tier customers or even globally across every user. And they're asking if split supports this. Yeah, so with split, what you can do is you can segment your user base into different categories. So you can say like free users in one segment, paid users in another segment. And then you can add dynamic configurations to say, for these users, I want this displayed, and for this user, I want this displayed. And then you can configure it however you like. So as long as you create the different segments of users that you need, that's totally possible on split. I would really suggest if you haven't logged in to split.io, create a free account. We have so many different SDKs you can use. And I'm happy to answer questions if you guys have questions about different tutorials. That's awesome. Thank you so much, Talia, for joining us, and thanks for that awesome talk. Thank you. Bye. Bye.
29 min
24 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic