1. Introduction to Testing in Production
Today we're going to talk about how to enable tests in production, including what testing in production is, how to set it up, and common pitfalls. As a former test engineer, I faced challenges with staging environments and data mismatch. The data in staging doesn't always match production, leading to test results that don't align. Configuration drift also creates a divide between staging and production, making testing in staging less reliable. Additionally, staging environments often have slow performance, which doesn't accurately reflect user interactions in production.
Hi, everyone, I'm Talia and today we're going to talk about how to enable tests in production. We're going to talk about what testing in production is, how to set it up and common pitfalls that people usually run into. This is my contact information, my Twitter and my email, in case you guys have questions later.
But a little bit about me, I'm a developer advocate at Split and I used to be a test engineer and I worked in QA and automation and testing for a while before I joined Split. Being a test engineer was really difficult for me because most of the problems that I had revolved around staging and using this dummy environment and staging isn't the same as production. So I would have so many problems and these are some of the problems that I dealt with that I'm sure most of you have dealt with too. If you've dealt with any sort of test environment, any sort of QA environment, anything that's not production. These are some of the things that made it really hard for me to do my job.
So the first problem was data mismatch. So the data and staging doesn't match production, which means test results don't always match. So I used to work really hard on making sure I tested every single product requirement and I would go through the documentation with the product donor and I worked with my developers to fix all the bugs, make sure my end-to-end tests were passing, and then I would sign off on the feature. And as soon as it's launched to production, there would be a bug. And it's such a horrible feeling when there's all this pressure on you to make sure that your feature works in a dummy environment. And then the next thing with data mismatch that happened to me was something called configuration drift. And what this is, is let's say that you get paged one night because there's an incident for your app and you look at the logs and you identify the problems, but in order to fix it, you have to update a specific configuration in production. And so you make the change in production and you go back to sleep. And although you fixed the issue, you've just created an even bigger divide between your staging and your production environments. So this, this divide is called configuration drift. And many times staging environments are not the same as production because of changes made during incident management, which just adds to a bigger configuration drift. And I felt like, what's the point of testing and staging if it's not gonna give me the same results as production?
The next thing, the next problem I had was staging was really slow. There was just really bad performance. And a lot of times when you're writing tests and staging, you often have to add weights and sleeps because things take longer to load. For example, click on a button. Wait 10 seconds for something to happen. Perform this action. Wait another 10 seconds for something to happen. Your user is not gonna wait 10 seconds for something to appear in tech time. That's crazy talk. So that's not how my users are gonna interact with my features in production. So why make that different in staging? Nobody cares if staging is down.
2. Testing Challenges and the Solution: Feature Flags
I faced challenges with a bad staging environment and a bad testing experience. Testing in production means testing features and their environment, not using a dummy environment like staging. Big companies like Google, Facebook, Netflix, and Twitter are testing in production. Feature flags separate code deployment from feature release, allowing bug-free releases with a click of a button.
This is another thing that I had to deal with is that I would be assigned to test different issues. To test different hot fix tickets, and these were just critical bug fixes that needed to get immediately released to production. So I would log into staging to test it, but staging would be down. So I have to ping the DevOps guy. But the DevOps guy says, you need to open an IT ticket. And then the IT ticket has to get escalated by my manager. And meanwhile, all I'm trying to do is test this ticket for our product, and nobody seems to care. It's not a priority for anybody. Nobody's going to get a call in the middle of Thanksgiving dinner if staging is down.
And I was so fed up with dealing with a really bad staging environment and a really bad testing experience and being bling for when things didn't work. And I thought there has to be a better way to test software. My end users are not going to log into staging to use my application. They're going to log into production. So I did a ton of homework and I researched what other companies are doing. And this is what I companies are doing. It's the norm for companies to use staging environments, especially companies that are still waterfall.
The next thing is that most companies use more than one staging environment. So staging, preprod, beta, most companies have more than one. And big name companies like Google, Facebook, Netflix, Twitter, they're all testing in production. And when I read that, I thought, what is testing in production? Like how is that possible? What do you mean, testing in production? So testing in production means testing your features and the environment that your features will live in, not using a dummy environment like staging and I thought, wow, this is so perfect. This is going to solve all of my problems. And I also learned that testing in prod doesn't mean you only test in prod, so you're still going to use staging for GDPR and socks related data and privacy issues and I thought, like, this is perfect because what I can't test in production, I would just test in staging, but those critical user flows, I can run those in production and I thought this is great. Like, how do I do this? What are the steps to get there? And the answer was feature flags. And a feature flag is basically just a way to separate code deployment from feature release. And the idea here is you deploy your code to production behind a feature flag, test it in prod, and then release the feature with the click of a button as soon as it's bug free.
So, how does it work? This is kind of what it looks like. So, our developers would create a feature flag from the UI, and then target all of our internal teammates. And what that means is that only the users who are inside of the feature flag while the flag is off, will be able to have access to the feature. So, here you can see devs, testers, product design. Only they are going to have access to this new feature while the feature flag is off, because they're the only ones who are targeted.
3. Testing Features and Turning on Flags
Testing everything while the feature flag is off ensures that any bugs found won't impact end users. The development team can fix the bugs and retest until the feature is bug-free in production. Turning on the flag after testing ensures that the feature is working without breaking anything existing.
These people on the right, these real end users, they can't see anything related to the feature because they're not targeted in the feature flag. And so, while the feature flag is off, you go in and you test everything. So, you test all of your functionality, you test your design, you go through all the requirements, make sure everything works. If there's a bug, it has no impact on your end users, because, again, they don't have access to it, they're not targeted. So, when there's a bug, you send it back to your development team, they fix it, you test it again, and that process will continue until you have a bug free feature in production. And then, once you know your feature is working in production, you can turn on the flag, already knowing that your features are working in production 100%, and you didn't break anything that was existing. Now your users are happy and they're dancing because they have a perfect feature.
4. Automating Feature Flag Testing
Feature flagging is a great process, but how do you automate it? There are two options: target test users and automate flows with them, or override feature flags and create a custom feature flag abstraction. Targeting test users requires creating an automation robot that runs tests in production. The downside is increased fragility if the user is removed from the configuration. Overriding feature flags involves simulating the flag on and off in separate tests, ensuring the entire flow works regardless of the flag state. Tests in production should only interact with testing entities, separating real data from test data using a backend flagging system.
And I thought, this is all wonderful. This is such a great process. Feature flagging is great. But how do you automate it? You can't manually test every feature every time you release. And with feature flags, you have this added complexity. So how do you automate it?
And there's two options here for automation, and I'm gonna go through both. So the first time is that you target your test users and automate the flows with them. So what that means is here, when you target your users, you also create an automation robot, just a test user that's gonna be used to run these tests in production. So every time this test user logs in, they have access to this new feature. And what's great about this option is that the test will continue to run even when you turn the feature flag on, you won't have to do any additional configuration. The only downside to this approach is that there is increased fragility because if someone removes that user from the targeting list or from the allow list in the feature flag configuration, then your tests are gonna fail. So you just have to make sure that if you add that user, that no one is gonna remove it from the configuration.
The next option is to override your feature flags and make a custom feature flag abstraction. So basically what this means is that for each feature, you have three tests. So in the first test, you simulate the feature flag on and for this test duration, if you get any request asking if the feature flag is on. So like if the test comes in and says, hey, is the feature flag on? You say yes. And then you run the test that way. And then the second test, you simulate the feature flag off. And if any requests come in from the test asking if the feature flag is on, you say no and you run the test that way. And then the last test, you want to validate that you can go through the entire flow, regardless of if the flag is on or off. And so with this approach, you're very explicit in the test and the test becomes much more self-documenting and descriptive. So whenever any test runs using feature flags, the system under test is going to fake out all the variants in the experiment and because it's fake, you're going to reduce the complexity of the different scenarios, which means faster tests. So basically what you're doing is you're setting the state of the flag for the duration of the test.
And then when you run your tests in production, you want to make sure that your tests only interact with other testing entities, right? This is something a lot of people fear, is that, you know, I don't want to affect real people and real users in production. So what you do is you have a back end flagging system, something like is test user equals true, is test equals true, something that clearly identifies all of your testing objects in production, and that way you separate real data from test data in your data dashboard. So let's say you're using Datadog or Looker or whatever. When your data comes in, you can create a dashboard in your in Datadog or Looker or whatever you're using and you can say everything, all of the business logic that's coming in that has for test users is going to be in this bucket, and all the data that's coming in for real users is going to be in this bucket. And that's how you can differentiate between real data and test data. So like my stakeholders are going to look at the real data, while I as a tester and my engineers and my engineering team, they're going to look at the test data and see if there's any bugs that were caught, see what needs to be updated in the test, like those two need to be separated, and this is how you separate them. And what's great about this is that the tests are looking for specific elements with these specific test attributes in production.
5. Testing Elements and Tools
When testing in production, it's important to identify test elements and distinguish them from real elements. Exceptions may be necessary when integrating with third-party software. To determine what to test in production, consult with product managers to identify critical business flows and data analysts to understand user behavior. In addition to feature flags, automation frameworks, job schedulers, and alerting tools are essential for efficient and effective testing in production.
So if the test doesn't find that test thing in production, it's going to fail, and you're going to get alerted. So this can be something like an ARIA label or a data attribute, just something that you can say, this is a test thing and this is a real thing.
There are exceptions though. If your software is integrated with a third party, it can be tricky to test. You can create a unique header in the API request that you send to the third party and say, hey, any requests that you get with this header is a test and I want you to treat it in this other way. So sometimes you have to make exceptions when you're testing in prod, maybe send an email confirmation to a specific place rather than to the end user. Sometimes you have to make those changes, but it's worth it when you're testing in production when you're testing in a live environment.
A question I get a lot of times is, how do you know what to test in production? There's two places to start. The first one is to go to your product person, go to your product manager and ask them, what are the most important business flows in our product? So, which features give us the most business value? What gives us the most revenue? What's the most important thing for our product that we need to make sure that this works all the time? The next place is to go to your data analyst, your data scientist and figure out what are people doing the most? And keep in mind these are two separate things. So, what are people doing the most, that if it breaks, you know, you're going to have a lot of issues. You're going to have a lot of problems in production. So, between those two lists, you should have a really good idea of where to start and which flows to test in prod.
And then besides feature flags, there's some other dependencies that you need. So, you're going to need an automation framework. So you don't want to manually run every test. You want to have that process automated because you need to know when something fails and you need to know right away. And with the speed of automation, that it makes that really easy. I think that's pretty self-explanatory. You also need a job scheduler. And I'll go through a couple of my recommendations. But you need a job scheduler to run your tests incrementally. And you can have two different sets of tests. So, your most important tests that run every hour because they're business critical. And you can have nightly tests that run every night because they're less critical. The next thing you need is an alerting tool to alert you when your tests fail. And just an alerting tool that can be integrated with your job scheduler that says, you know, hey, this test fails, go figure out what's going on.
So these are the recommended tools that I've used for testing in prod. So, for feature flagging, obviously, I recommend split. There are other there are other tools. And I'm happy to talk about those.
6. Automation Frameworks and Mitigating Risks
But for an automation framework, my absolute favorite is robot framework. It works with most applications. For your job scheduler, there's Jenkins and circle CI, Travis. For your alerting tools, there's Pager Duty, Slack. Companies don't test in production because of fear and lack of trust in their systems. Mitigate the risks of testing in prod by using feature flags, canary release, and AA test. Start small and separate deployment from release.
OK. So, we went through the how, and this was the entire testing and production process. We went from A to Z, how to set it up, how to make sure that your tests don't interact with real end-users, how to differentiate that data, how to set it up with feature flags, and I thought, this makes total sense to me. But if it's so simple, why isn't everybody doing this? Why isn't everybody testing in production? And the truth is that people are scared. Companies don't test in production because of this fear and lack of trust in their systems, and for the same reason, they refuse to invest in the tools and process changes that are going to generate that trust. They're too afraid of the risks, and there's a few things you can do to mitigate the risks of testing in prod.
So the first one that we talked about is using feature flags. So target your internal teammates, test with them, so this is also called dogfooding, and then turn on the feature already knowing that your feature works and you didn't break anything that was existing. Next thing you can do is a canary release, which is just a percentage rollout, and it allows you to release your feature to a small subset of users before you release it to your entire user base, because if something goes wrong, would you want 100% of your users to encounter the issue or 1%? The next thing you can do is start with an AA test, which means you give both sets of users in and out of the feature flag the same experience and make sure that the data that's coming in is the same for both. And what this is going to do is it's going to build your confidence in the feature flagging system. And obviously, start small. Don't start out with your most complex flow and decide to test in prod. You want to start with something simple. And the outcome of testing in prod is you can release faster because you just press a button and your feature is released, you don't have to go through an entire release cycle. Because your code is ready. You just separate deployment from release.
7. Benefits of Testing in Production
Increased developer velocity leads to increased confidence and team happiness. Testing in production is the only way to know if features are working in production. Shifting the testing culture is the hardest part of the process.
And the next thing is that you have increased developer velocity. So, your developers spend more time creating new features and less time fixing bugs. And this just leads to an increased confidence and increased team happiness. And if I haven't convinced you that this is a good idea, I would like everyone to think of the last feature that your team deployed. Is it working right now in production? How do you know? Your users haven't reported anything to you, so you don't know. Testing in production is the only way to know that your features are working in production right now. And oftentimes, shifting your company's testing culture is the hardest part of this process, so getting over that fear is a really big part of this.
Using Feature Flags and Handling Bugs
Start using feature flags and test in production to ensure your features work in the real environment. Feature flags allow you to test features ahead of time and prevent big production issues. If a bug occurs, feature flags have a kill switch to turn off the feature instantly. This minimizes damage and avoids code redeployment. Using feature flags is a great way to handle bugs that are difficult to replicate in local testing. Company structure plays a significant role in creating testing environments. Organizational changes may be necessary to facilitate a testing in production approach.
What I would suggest is start using feature flags, go to Split.io, click on free developer account and you can start using feature flags and see if it works for you. And in case you haven't been paying attention at all for the past 20 minutes, I want you to take away two things. The first is that nobody cares if your features are working in staging. We care if it works in production and the only way to know if it's working in production is to test it in production.
Thank you guys so much for listening and I'm here for questions. You can follow me on Twitter, send me an email and thank you. Thank you so much for that talk. Thanks for taking the time to chat with us. We do have some Q&A from our audience. You ready to jump in? Yeah, I'm ready. Let's do it. Alright.
So, RDM asked, Talia, but how do you and your team deal with critical bugs that impact on the whole site or page? So, you mean like after we do the whole testing and production process and we have a bug in production? So, if that happens and you're using a Canary release it'll only affect a small percentage of your users. But what's more important is if you use feature flags to test that feature in production ahead of time, you're not gonna have these big production issues anyway. You'll be able to test in the environment that you're that the feature will live in. So, you won't have those surprises. You won't have those big production issues. And if you do, you can feature flags have something that's called a kill switch where you just turn the feature off. It's like a click of a button and you just turn the feature off. And you don't have to redeploy any code. You don't have to like revert anything in GitHub. It's you press the button and the feature is off. So, the damage is very, very minimal when you're using feature flags. That's awesome. That's a great way to... I can just think of all the ways that I would use, especially like sometimes, yeah, if you have the application I work on, for example, has a bunch of bugs that show up for specific accounts that are super hard to test in local because you're going to have to like test all, you know, like replicate all of the conditions. Yeah. That seems super useful. To Kran, I asked, environments in my experience are as much about company structure as it is about tech. Do you agree? If so, what kind of organizational changes do you think would be necessary to facilitate this approach? That's a great question.
Organizational Requirements and Benefits
In terms of organization, having a solid automation framework and testing practice in place is crucial. It's important to address any resistance to testing in production by using examples from the past and emphasizing the value. Ignore those who oppose the practice due to fear or resistance to change. Testing in production is an innovative approach with endless benefits.
That's a really good question. So, in terms of the organization, I feel like there has to be a couple things that need to be in place. So, the first thing is your team needs to have a solid automation framework. And I talked about this a little bit in my talk, but you need to have a solid testing practice in place with automation in place. You can't just start testing in production and not have any automation setup. So, that's like a big part of the company's testing culture.
Another thing is people need to want this to happen. If you have people on your team who are really against testing in production and they really don't understand the value, those two things that I suggested at the end, like use examples from your past, ask them, have you, remember when we tested this thing in staging and it worked perfectly and then as soon as we launched to production there was this issue? Or think of times where your staging environment was down and you had to test something and use examples from your past. And then also if you haven't gone to split.io, you can create a free developer account and start using our SDKs. It's super useful. I also have a ton of tutorials on there. So yeah, that's where I would start. But I also will say there's always going to be people who say testing in production will never work and you have to use staging, and usually it's those old people at the companies who've been there for like 20 years and don't really like change. So ignore those few people. For the most part, this is a really innovative practice. And if done correctly, you can, I mean, the benefits are just endless.
Testing in Production and Feature Flag Management
Testing in production is much different now than it was years ago. We have the tools that enable us to do it safely. Removing flags after a feature is released and working depends on the use case. Managing dependencies between feature flags involves targeting the same automation bots. Testing in production is usually paired with trunk-based development. Feature flag services that support multiple dimensions for describing users are being evaluated.
Yeah, for sure. Yeah, it's always difficult to drive that organizational change, change people's minds, especially if they've been burned by the thing that you're trying to do. But testing and production is much different now than it was years ago. Yeah, because we have the tools that enable us to do it safely, like we're not just, you know, throwing code into production and like, okay, let's see what happens. It's very secure, very planned. Yeah, you can do it now.
Tom asks, Do you remove flags after a feature is released and working? Yes. I actually wrote a whole blog post on when to sunset a feature flag and when to deprecate a feature flag. So, basically, depending on the use case that you're using a flag for, that's when you know when to remove the flag. But for testing and production, once a feature is completely released, and it's released to 100% of your population and you know it's working, then you can remove the flag. And you don't want to have, like, old feature flags in your code base. Right. Of course.
William asked, this one, I think is really interesting. I work with a lot of libraries that we have to update across products. William's question jumped out to me. He asked, how do you manage dependencies between feature flags? You, okay, so you basically, when you're targeting your automation bots inside of your feature flags, you just make sure that you target the same bot in the different feature flags that you need. So let's say you have, like, user flow 1 and user flow 2, and they're, you know, two different features, but they're dependent on each other to work. So what I would do is I would target my test user in both feature flags, so that when that user runs the automation for both flags, when those tests run, that, like, you'll know if anything fails, because it's the same user that's running the tests. Nice. That's great.
Giuseppe said great talk. I agree. It was great. Is this practice usually paired with trunk based development? Yes. Yes, yes, it is. Okay. Thomas asked, we're evaluating feature flag services now, and we're looking for one that supports multiple dimensions for describing users. Dimensions in parentheses.
Feature Flag Services and User Segmentation
Thomas is evaluating feature flag services and wants to know if split supports changing flags for specific users across different tiers or globally. Split allows you to segment the user base into different categories and add dynamic configurations for each segment. Creating different user segments and configuring them in split is possible. Talia suggests creating a free account on split.io to explore the available SDKs and ask questions about tutorials.
Dimensions in parentheses. For example, we'd like to change flags for specific users across all free tiers versus pro tier customers or globally across every user. Does split support this?
Sorry, can you repeat that? Yeah. It was a long question. So it sounds like Thomas is evaluating feature flag services and they are looking for one that supports multiple dimensions for describing users. For example, we'd like to change flags for specific users across all free tiers versus pro tier customers or even globally across every user, and they're asking if split supports this.
Yes. So with split, what you can do is you can segment user base into different categories. So you can say free users in one segment, paid users in another segment and then you can add dynamic configurations to say you know for these users, I want this displayed and for this user, I want this displayed. And you can configure it however you like. So as long as you create the different segments of users that you need, that's yeah totally possible in split. I would really suggest if you haven't logged in to split.io, create a free account. We have like so many different SDKs you can use and I'm happy to answer like questions if you guys like me have questions about like different tutorials and you know. That's awesome. Thank you so much Talia for joining us and thanks for that awesome talk. Thank you.