How do you know your feature is working perfectly in production? If something breaks in production, how will you know? Will you wait for a user to report it to you? What do you do when your staging test results do not reflect current production behavior? In order to test proactively as opposed to reactively, try testing in production! You will have an increased accuracy of test results, your tests will run faster due to the elimination of bad data, and you will have higher confidence before releases. This can be accomplished through feature flagging, canary releases, setting up a proper CI/CD pipeline, and data cleanup. You will leave this talk with strategies to mitigate risk, to better your understanding of the steps to get there, and to shift your company’s testing culture, so you can provide the best possible experience to your users. At the end of the day, we don't care if your features work in staging, we care if they work in production.
Using Feature Flags to Enable Testing in Production
AI Generated Video Summary
Today's Talk discusses enabling tests in production, including challenges with staging environments, the use of feature flags for testing, and automating feature flag testing. It also covers running tests in production while ensuring no impact on real users, determining what to test in production, recommended tools and dependencies, and mitigating risks. The importance of testing in production and shifting the testing culture is emphasized, along with the need for a solid automation framework and managing feature flag dependencies.
1. Enabling Tests in Production
Hi, everyone. Today we'll discuss enabling tests in production, including what it is, how to set it up, and common pitfalls. As a former test engineer, I faced challenges with staging environments that differed from production. Data mismatch and configuration drift caused issues, and staging was slow with poor performance. Staging downtime hindered critical bug fixes. Let's explore these problems and their impact on testing.
Hi, everyone. I'm Talia, and today we're going to talk about how to enable tests in production. We're going to talk about what testing in production is, how to set it up, and common pitfalls that people usually run into.
So, this is my contact information, my Twitter and my e‑mail, in case you guys have questions later. But a little bit about me. I'm a developer advocate at Split. And I used to be a test engineer, and I worked in QA and automation and testing for a while before I joined Split. And being a test engineer was really difficult for me, because most of the problems that I had revolved around staging and using this dummy environment, and staging isn't the same as production. So, I would have so many problems, and these are some of the problems that I dealt with that I'm sure most of you have dealt with too. If you've dealt with any sort of test environment, any sort of QA environment, anything that's not production, these are some of the things that made it really hard for me to do my job.
So, the first problem was data mismatch. So, the data and staging doesn't match production, which means test results don't always match. So, I used to work really hard on making sure I tested every single product requirement, and I would go through the documentation with the product owner, and I worked with my developers to fix all the bugs, make sure my end-to-end tests were passing, and then I would sign off on the feature, and as soon as it's launched to production, there would be a bug. It's such a horrible feeling when there's all this pressure on you to make sure that your feature works in a dummy environment.
And then the next thing with data mismatch that happened to me was something called configuration drift, and what this is, is let's say that you get paged one night because there's an incident for your app, and you look at the logs and you identify the problems, but in order to fix it, you have to update a specific configuration in production, and so you make the change in production and you go back to sleep. And although you fixed the issue, you've just created an even bigger divide between your staging and your production environments. So this divide is called configuration drift, and many times, staging environments are not the same as production because of changes made during incident management, which just adds to a bigger configuration drift. And I felt like, what's the point of testing and staging if it's not gonna give me the same results as production?
The next problem I had was staging was really slow. There was just really bad performance. And a lot of times when you're writing tests and staging, you often have to add waits and sleeps because things take longer to load. For example, click on a button, wait 10 seconds for something to happen, perform this action, wait another 10 seconds for something to happen. Your user is not going to wait 10 seconds for something to appear. You know, in tech time, that's crazy talk. So that's not how my users are going to interact with my features in production. So why make that different in staging?
Nobody cares if staging is down. This is another reason, another thing that I had to deal with is that I would be assigned to test different issues, to test different hotfix tickets. And these were just critical bug fixes that needed to get immediately released to production. So I would log into staging to test it, but staging would be down. So I have to ping the DevOps guy. But the DevOps guy says you need to open an IT ticket and then the IT ticket has to get escalated by my manager.
2. Testing in Production and Feature Flags
And meanwhile, all I'm trying to do is test this ticket for our product. My end users are not going to log into staging to use my application. So I did a ton of homework and I researched what other companies are doing. The first thing is that it's the norm for companies to use staging environments. Most companies use more than one staging environment. Big name companies like Google, Facebook, Netflix, Twitter, they're all testing and production. Testing and production means testing your features and the environment that your features will live in. I also learned that testing and prod doesn't mean you only test in prod. You're still going to use staging for GDPR and SOX related data and privacy issues. The answer was feature flags. A feature flag is basically just a way to separate code deployment from feature release. How does it work? Our developers would create a feature flag from the UI and then target all of our internal teammates.
And meanwhile, all I'm trying to do is test this ticket for our product. And nobody seems to care. It's not a priority for anybody. Nobody is going to get a call in the middle of Thanksgiving dinner as staging is down. And I was so fed up with dealing with a really bad staging environment and a really bad testing experience and being blamed for when things didn't work. And I thought there has to be a better way to test software.
My end users are not going to log into staging to use my application. They're going to log into production. So I did a ton of homework and I researched what other companies are doing. And this is what I learned. So the first thing is that it's the norm for companies to use staging environments, especially companies that are still waterfall. The next thing is that most companies use more than one staging environment. So staging pre-prod, beta. Most companies have more than one. And big name companies like Google, Facebook, Netflix, Twitter, they're all testing and production. And when I read that, I thought, what is testing in production? Like, how is that possible? What do you mean? Testing and production. So testing and production means testing your features and the environment that your features will live in, not using a dummy environment like staging. And I thought, wow, this is so perfect. This is going to solve all of my problems. And I also learned that testing and prod doesn't mean you only test in prod. So you're still going to use staging for GDPR and SOX related data and privacy issues, and I thought like, this is perfect because what I can't test in production, I would just test in staging. But those critical user flows, I can run those in production. And I thought, this is great. Like, how do I do this? What are the steps to get there? And the answer was feature flags. And a feature flag is basically just a way to separate code deployment from feature release. And the idea here is you deploy your code to production behind a feature flag, test it in prod, and then release the feature with a click of a button as soon as it's bug-free. So how does it work? This is kind of what it looks like. So our developers would create a feature flag from the UI and then target all of our internal teammates. And what that means is that only the users who are inside of the feature flag while the flag is off will be able to have access to the feature. So here you can see devs, testers, product design, only they are going to have access to this new feature while the feature flag is off because they're the only ones who are targeted, these people on the right, these real end users, they can't see anything related to the feature because they're not targeted in the feature flag.
3. Automating Feature Flag Testing
While the feature flag is off, you test all functionality, design, and requirements. Bugs have no impact on end users as they are not targeted. If a bug is found, it is sent back to the development team for fixing and retesting. Once the feature is working in production, the flag can be turned on, ensuring existing features are not broken. Automating feature flag testing has two options: targeting test users and automating flows or overriding feature flags with custom abstractions. Both options have their pros and cons, but they provide faster and more self-documenting tests.
And so while the feature flag is off, you go in and you test everything. So you test all of your functionality, you test your design, you go through all the requirements, make sure everything works. If there's a bug, it has no impact on your end users because, again, they don't have access to it. They're not targeted. So when there's a bug, you send it back to your development team. They fix it. You test it again and that process will continue until you have a bug-free feature in production.
And then once you know your feature is working in production, you can turn on the flag already knowing that your features are working in production 100% and you didn't break anything that was existing. And now your users are happy and they're dancing because they have a perfect feature and I thought, this is all wonderful. This is such a great process. Like feature flagging is great, but how do you automate it? You can't possibly manually test every feature every time you release. And with feature flags, you have this added complexity, right? So how do you automate it? And there's two options here for automation. And I'm going to go through both.
So the first time is that you target your test users and automate the flows with them. So what that means is here, when you target your users, you also create an automation robot, just a test user that's going to be used to run these tests in production. So every time this test user logs in, they have access to this new feature. And what's great about this option is that the test will continue to run even when you turn the feature flag on, you won't have to do any additional configuration. The only downside to this approach is that there is increased fragility, because if someone removes that user from the targeting list or from the allow list in the feature flag then your tests are going to fail. So you just have to make sure that, if you add that user, that no one is going to remove it from the configuration.
The next option is to override your feature flags and make a custom feature flag abstraction. So basically what this means is that for each feature you have three tests. So in the first test you simulate the feature flag on. And for this test duration, if you get any request asking if the feature flag is on, so like if the test comes in and says, hey, is the feature flag on, you say yes, and then you run the test that way. And then in the second test, you simulate the feature flag off. And if any request come in from the test asking if the feature flag is on, you say no, and you run the test that way. And then in the last test, you want to validate that you can go through the entire flow, regardless of if the flag is on or off. And so with this approach, you're very explicit in the test, and the test becomes much more self-documenting and descriptive. So whenever any test runs using feature flags, the system under test is gonna fake out all the variants in the experiment. And because it's fake, you're gonna reduce the complexity of the different scenarios, which means faster tests. So basically what you're doing is you're setting the state of the flag for the duration of the test.
4. Running Tests in Production
When running tests in production, it's important to ensure that they only interact with testing entities and not affect real users. This can be achieved through a back-end flagging system that separates real data from test data. By using specific test attributes, tests can identify and differentiate between test and real data in production. However, there may be exceptions when testing software integrated with third parties, requiring special handling. Despite these challenges, testing in production is valuable for identifying bugs and improving the quality of the software.
And then when you run your tests in production, you want to make sure that your tests only interact with other testing entities, right? This is something a lot of people fear, is that, you know, I don't want to affect real people and real users in production. So what you do is you have a back-end flagging system. Something like is test user equals true, is test equals true, something that clearly identifies all of your testing objects in production and that way you separate real data from test data in your data dashboard.
So let's say you're using Datadog or Looker or whatever. When your data comes in, you can create a dashboard in Datadog or Looker or whatever you're using, and you can say everything, all of the business logic that's coming in that has for test users is going to be in this bucket and all the data that's coming in for real users is going to be in this bucket. And that's how you can differentiate between real data and test data. So like my stakeholders are going to look at the real data while I as a tester and my engineers and my engineering team, they're going to look at the test data and see if there's any bugs that were caught, see what needs to be updated in the tests. Like those, those two need to be separated and this is how you separate them.
And what's great about this is that the tests are looking for specific elements with these specific test attributes in production. So if the test doesn't find that test thing in production, it's going to fail and you're going to get alerted. So this can be something like an ARIA label or a data attribute, just something that you can say, this is a test thing and this is a real thing. There are exceptions though. If your software is integrated with a third party, it can be tricky to test. You can create a unique header in the API request that you send to the third party and say, hey, any requests that you get with this header is a test, and I want you to treat it in this other way. So sometimes you have to make exceptions when you're testing in prod, maybe send an email confirmation to a specific place rather than to the end user. Sometimes you have to make those changes, but it's worth it when you're testing in production, when you're testing in a live environment.
5. Determining What to Test in Production
To determine what to test in production, consult your product manager to identify the most important business flows and revenue-generating features. Additionally, work with your data analyst to understand user behavior and prioritize testing for critical areas. These insights will guide you in selecting which flows to test in production.
A question I get a lot of times is how do you know what to test in production. There's two places to start. The first one is to go to your product person, go to your product manager and ask them, what are the most important business flows in our product? Which features give us the most business value? What gives us the most revenue? What's the most important thing for our product that we need to make sure that this works all the time? The next place is to go to your data analyst, your data scientist and figure out what are people doing the most? Keep in mind, these are two separate things. What are people doing the most that if it breaks, you're going to have a lot of issues, you're going to have a lot of problems in production. In those two lists, you should have a really good idea of where to start and which flows to test in prod.
6. Dependencies and Recommended Tools
Besides feature flags, you'll need an automation framework to automate the testing process. A job scheduler is necessary for running tests incrementally, with critical tests running every hour and nightly tests for less critical ones. An alerting tool integrated with the job scheduler can notify you of test failures. Recommended tools include Split for feature flagging and Robot Framework for automation. Other options like Cypress and Protractor for Angular applications are available. Job schedulers like Jenkins, CircleCI, Travis, and Maven are also commonly used.
Then, besides feature flags, there's some other dependencies that you need. You're going to need an automation framework. You don't want to manually run every test. You want to have that process automated, because you need to know when something fails and you need to know right away. The speed of automation makes that really easy. I think that's pretty self-explanatory.
You also need a job scheduler, and I'll go through a couple of my recommendations. You need a job scheduler to run your tests incrementally. You can have two different sets of tests. Your most important tests that run every hour, because they're business critical, and then you can have nightly tests that run every night, because they're less critical.
7. Mitigating Risks of Testing in Production
For alerting tools, you can use PagerDuty for critical alerts and Slack for warnings or anomalies. Testing in production is not common due to fear and lack of trust in systems. Mitigate risks by using feature flags, conducting canary releases, and starting with AA tests. Testing in production allows for faster releases, increased developer velocity, and improved team confidence. Consider the last feature your team deployed and ask yourself if it is working in production without user reports.
For your alerting tools, there's PagerDuty, Slack. And again, you can customize these. To say, for those business critical alerts, I want to use PagerDuty. And for those maybe warnings or things that look weird in the test, use Slack.
OK, so we went through the how. And this was the entire testing and production process. We went from A to Z, how to set it up, how to make sure that your tests don't interact with real end users, how to differentiate that data, how to set it up with feature flags. And I thought, this makes total sense to me. But if it's so simple, why isn't everybody doing this? Why isn't everybody testing in production? And the truth is that people are scared. Companies don't test in production because of this fear and lack of trust in their systems. And for the same reason, they refuse to invest in the tools and process changes that are going to generate that trust. They're too afraid of the risks. And there's a few things you can do to mitigate the risks of testing in prods. So the first one that we talked about is using feature flags. So target your internal teammates, test with them. So this is also called dog fooding. And then turn on the feature, already knowing that your feature works and you didn't break anything that was existing.
Next thing you can do is a canary release, which is just a percentage rollout. And it allows you to release your feature to a small subset of users before you release it to your entire user base. Because if something goes wrong, would you want 100% of your users to encounter the issue or 1%? The next thing you can do is start with an AA test, which means you give both sets of users in and out of the feature flag the same experience and make sure that the data that's coming in is the same for both. And what this is gonna do is it's gonna build your confidence in the feature flagging system. And obviously start small, don't start out with your most complex flow and decide to test in prod. You want to start with something simple. And the outcome of testing in prod is you can release faster because you just press a button and your feature is released. You don't have to go through an entire release cycle because your code is ready, you just separate deployment from release. And the next thing is that you have increased developer velocity. So your developers spend more time creating new features and less time fixing bugs. And this just leads to an increased confidence and increased team happiness. And if I haven't convinced you that this is a good idea, I would like everyone to think of the last feature that your team deployed. Is it working? Right now? In production? How do you know? Your users haven't reported anything to you, so you don't know.
8. Testing in Production and Dealing with Bugs
Testing in production is the only way to know that your features are working in production right now. Shifting your company's testing culture is the hardest part. Start using feature flags and see if it works for you. Nobody cares if your features are working in staging. We care if it works in production. Thank you for listening. I'm here for questions. How do you deal with critical bugs? If you use Feature Flags to test features in production ahead of time, you won't have big production issues. Feature Flags has a kill switch to turn off features easily.
Testing in production is the only way to know that your features are working in production right now. And oftentimes, shifting your company's testing culture is the hardest part of this process. So getting over that fear is a really big, big part of this.
So what I would suggest is start using feature flags, go to split.io, click on free developer account, and you can start using feature flags and see if it works for you. And in case you haven't been paying attention at all for the past 20 minutes, I want you to take away two things. The first is that nobody cares if your features are working in staging. We care if it works in production. And the only way to know if it's working in production is to test it in production.
So thank you guys so much for listening. And I'm here for questions. You can follow me on Twitter, send me an email. And thank you. Thank you so much for that talk. Thanks for taking the time to chat with us. We do have some QA from our audience. You ready to jump in? Yeah, I'm ready. Let's do it.
Alright. So RDM asked, Talia, but how do you and your team deal with critical bugs that impact on the whole site or page? So you mean like after we do the whole testing and production process and we have a bug in production. So if that happens and you're using a Canary release, it'll only affect a small percentage of your users. But what's more important is if you use Feature Flags to test that feature in production ahead of time, you're not going to have these big production issues anyway. You'll be able to test in the environment that the feature will live in. So you won't have those surprises, you won't have those big production issues. And if you do, Feature Flags has something that's called a kill switch, where you just turn the feature off. It's like the click of a button and you just turn the feature off. And you don't have to redeploy any code. You don't have to revert anything in GitHub. It's just you press a button and the feature is off. So the damage is very, very minimal when you're using Feature Flags. That's awesome.
9. Testing in Production and Organizational Changes
Testing in production requires a solid automation framework and a testing culture. Use examples from past experiences to show the value. Ignore those resistant to change. Testing in production is different now with the tools available.
That's a great way to... I can just think of all the ways that I would use, especially sometimes... Yeah, if you have... The application I work on, for example, has a bunch of bugs that show up for specific accounts that are super hard to test in local because you're going to have to test... Replicate all of the conditions. Yeah. That seems super useful.
Tukran asked, environments in my experience are as much about company structure as it is about tech. Do you agree? If so, what kind of organizational changes do you think would be necessary to facilitate this approach?
That's a great question. That's a really good question. In terms of the organization, I feel like there has to be a couple things that need to be in place. The first thing is, your team needs to have a solid automation framework. I talked about this a little bit in my talk, but you need to have a solid testing practice in place with automation in place. You can't just start testing in production and not have any automation set up. That's a big part of the company's testing culture. Another thing is, people need to want this to happen. If you have people on your team who are really against testing in production and they really don't understand the value, those two things that I suggested at the end, use examples from your past. Ask them, have you, remember when we tested this thing in staging and it worked perfectly and then as soon as we launched a production, there was this issue? Think of times where your staging environment was down and you had to test something and use examples from your past. And then also, if you haven't gone to split.io, you can create a free developer account and start using our SDKs. It's super useful. I also have a ton of tutorials on there. So yeah, that's where I would start. But I also will say, there's always gonna be people who say testing in production will never work and you have to use staging and usually it's like those old people at the companies who've been there for 20 years and don't really like change. So ignore those few people. For the most part, this is a really innovative practice and if done correctly, the benefits are just endless. Yeah, for sure. Yeah, it's always difficult to drive that organizational change, change people's minds, especially if they've been burned by the thing that you're trying to do. But testing in production is much different than it was years ago. Yeah, and it's because we have the tools that enable us to do it safely. Like we're not just throwing code into production and like, okay, let's see what happens.
10. Removing Flags and Managing Dependencies
When to remove a feature flag depends on the use case. For testing and production, once a feature is completely released and working, the flag can be removed. Managing dependencies between feature flags involves targeting the same user in different flags to ensure interdependent features work properly. Feature flagging is commonly used in conjunction with trunk-based development. SPLIT supports multiple dimensions for describing users, allowing for targeted flag changes across different user segments.
It's very secure, it's very planned. Yeah, yeah, yeah, you can do it now.
Tom asks, do you remove flags after a feature is released and working? Yes. Yes, so I actually wrote a whole blog post on when to sunset a feature flag and when to deprecate a feature flag. So basically depending on the use case that you're using a flag for, that's when you know when to remove the flag. But for testing and production, once a feature is completely released and it's released to 100% of your population and you know it's working, then you can remove the flag. And you don't wanna have like old, old feature flags in your code base. Right, of course, yeah.
William asked, this one I think is really interesting. I work with a lot of libraries that we have to update across products. William's question jumped out to me. He asked how do you manage dependencies between feature flags? You, okay, so you basically, when you're targeting your automation bots inside of your feature flags, you just make sure that you target the same bot in the different feature flags that you need. So let's say you have like user flow one and user flow two and they're two different features, but they're dependent on each other to work. So what I would do is I would target my test user in both feature flags so that when that user runs the automation for both flags, when those tests run, you'll know if anything fails because it's the same user that's running the tasks. Nice, that's great.
Yousef V, or Yousef, said, great talk. I agree, it was great. Is this practice usually paired with truck-based development? Yes, yes, yes, it is. Okay, Thomas asked, we're evaluating Feature Flags Services now, and we're looking for one that supports multiple dimensions for describing users. Dimensions in parentheses, I think it's. For example, we'd like to change flags for specific users across all free tiers versus pro-tier customers or even globally across every user. Does SPLIT support this? Sorry, can you repeat that? Yeah, it was a long question. So it sounds like Thomas is evaluating Feature Flag Services, and they are looking for one that supports multiple dimensions for describing users. For example, we'd like to change flags for specific users across all free tiers versus pro-tier customers or even globally across every user. And they're asking if SPLIT supports this. Yeah, so with SPLIT, what you can do is you can segment your user base into different categories. So you can say free users in one segment, paid users in another segment, and then you can add dynamic configurations to say for these users I want this displayed and for this user I want this displayed. And you can configure it however you like. So as long as you create the different segments of users that you need, that's totally possible on SPLIT. I would really suggest if you haven't logged in to SPLIT.io, create a free account. We have so many different SDKs you can use and I'm happy to answer questions if you guys have questions about different tutorials. That's awesome. Thank you so much Talia for joining us and thanks for that awesome talk. Thank you. Bye. Bye. Bye, bye.