Most of us have heard that tests should be isolated, composable, or deterministic, but what does that mean in practice? How do you write a good test and how does the rest of your codebase change once you do? What effect does it have on your developer experience? In this talk I'll walk through a hand full of properties good tests have, show how we can write tests that follow these guidelines in JavaScript, and discuss when to consider bending the rules a bit.
Test Kitchen: A Recipe for Good Tests
AI Generated Video Summary
Iris, a software engineer, shares her cooking journey and invites others to share recipes on Twitter. She discusses testing guidelines, emphasizing the importance of automated tests that inspire confidence and run all the time. Iris provides tips for faster and more effective testing, including running tests in parallel and focusing on behavior. She also highlights the importance of making tests robust, readable, and maintainable. Finally, Iris emphasizes the value of testing, predictive tests, and audience preferences in software development.
1. Introduction to Iris and her cooking journey
I'm Iris, a software engineer from Austria, currently living in London. During the pandemic, I've been cooking a lot and trying different recipes from around the world. If you have any interesting recipes or dishes from your home country, please share them with me on Twitter.
My name is Iris, which is Greek for rainbow. I'm a software engineer from Austria, but I've lived kind of all over Europe, and I currently call London my home. And the reason I chose a cooking related title for my talk is that I haven't done much else since the pandemic started. So from homemade pasta, bread, and pizza, to theme nights like West Africa, Sri Lanka, Mexico, or my own Vamana, I've pretty much made it all over the last year. So if you have any interesting recipes or anything from your home country that you'd like to share with me, please send them to me on Twitter my hand is right up there.
2. Testing Guidelines and Automation
And when I don't cook, I develop products for our users at Spotify. We are one of the world's biggest audio streaming services with over 70 million tracks and almost 350 million monthly active users. Let's dive into the guidelines. Kent Beck lists 12 properties of tests. Tests should inspire confidence, enabling you to refactor your application without fear. Tests should be automated and run without human intervention, allowing teams to work autonomously and iterate quickly. Automated tests should run all the time, from development to deployment.
And when I don't cook, I develop products for our users at Spotify. In case you're not familiar with Spotify, we are one of the world's biggest audio streaming services with over 70 million tracks, including 2.2 million podcast titles. And with almost 350 million monthly active users, you can imagine the quality and testing practices are taken very seriously here.
So let's dive right into the guidelines. Do you remember the slide? Where did I have all of the adjectives from? The truth is I have not spent nearly enough time in this space to come up with a comprehensive list of properties a test should have. So I instead leave that up to you know, someone who has, like, Kent Beck. In a Medium post in 2019, Kent lists 12 such properties. But what do they mean in detail? Let's look at them one by one and see how we can use them to write better tests and code in JavaScript and React.
Let's start with what I believe is the most important one of all of these and probably the only one on the list that I would never, ever compromise on. Tests should inspire confidence. Confidence that your application works. Let's think about an example here. You want to refactor a central part of your application. Maybe you want to, I don't know, go from state management and Redux to context API. Or maybe you want to use a script to migrate your codebase from JavaScript to typescript. Let's not think about automated tests for a moment, but how you would generally go about doing a big change like that. Would you just change the code and deploy it? At least I would be extremely careful and thoroughly test each part that I'm touching before deploying because that manual testing is what gives me the confidence that I didn't just break the application. And that is exactly what testing is all about. Inspiring that confidence and enabling you to vigorously refactor your application without the fear of breaking it.
And that perfectly brings me to the second point, tests will be automated and run without human intervention. Say you work on a bigger project where several teams are in different components in your application. You now want to change a single line in the base button component that all other teams use. Not only will it take a lot of coordination from all these different teams, the testing will also have to be a joint effort because you might be able to do the changing button, but you don't know how it is used in all different places all over the application. And it's simply not feasible for this and any other seemingly small change to have to go through manual testing for each release because the coordination and time effort are just too high and grow with the scale of the application as well. So if you want to enable teams to work autonomously and iterate quickly in their part of the application at a certain size, you can only achieve this through automation. That brings me back to the question, when should automated tests run? And the answer is basically all the time. During development, it helps to have a quick feedback loop to at least run unit tests in watch mode so that you can find bugs early before even pushing any code. Once you're ready to merge your changes, all your tests should run on that branch. Then after merging, before deploying to production, your tests should be run on the main branch as well. This includes unit tests, integration tests, and if possible end-to-end tests if you have a way of running them on this branch.
3. Tips for Faster and More Effective Testing
Lastly, in a bigger system with many services and websites playing together, you might want to run your tests whenever upstream dependency is changing as well. But how do we make our tests fast? First of all, run them in parallel and only rerun the tests related to the files you've changed. Mock slow dependencies and run tests in the appropriate environment. When writing tests, focus on the behavior of the system under test and test from the consumer's perspective. This allows you to change the internal structure without touching the tests.
Lastly, in a bigger system with many services and websites playing together, you might want to run your tests whenever upstream dependency is changing as well. Because you often don't know exactly when that happens. And my team often switches from writing to E every couple of hours.
You might now be wondering, isn't this impractical? Doesn't faulting deploys to you down? This is when the next piece of advice comes in. You do not want to have to wait for half an hour for your tests to pass before you can deploy an urgent fix. Or worse yet, disable test runs on important bug fixes because they take too long to run. This would remove the confidence again that we talked about earlier and makes writing tests pointless.
But how do we make our tests fast? First of all, nothing says that we need to run tests sequentially. If there is one thing that will save you a ton of time running your tests, it's to run them in parallel. Then there's no need to run your entire tests with every single change. Just rerun the test related to the files that you have actually changed or touched. You can do this in many tools. Just watch mode does it, for example.
Another thing I like to do in my integration test is to mock slow dependencies like network calls, animations or anything related to timers. If you have a countdown component, which renders countdown done after two seconds, I wouldn't sit around and wait for two seconds for that to happen until I can confirm it worked. Instead I would mock the timers and run all of them, and there are also ways of making sure that it's been exactly two seconds. Doing this will make my test more deterministic and faster, but I'm always careful not to mock too much because I do want to test my components like a consumer would.
Lastly, another tip to make your tests faster, run them in the appropriate environment. What I mean by that is, don't run tests for a utility function in a real browser, for example, but instead make use of tools like JS DOM, which are faster, but a good approximation of a browser environment. Go for the fastest tool that still gives you confidence.
All right, and now how do you write tests that give you confidence? Tests should be behavioral and structured and sensitive. Speak less. What this means to me is that I need to think about the responsibility of the thing that I'm testing. If I'm testing a function, I would call it like its consumers would. If it's a React component used by our users, I would think about how users interact with it and try to replicate that. If you avoid testing from the components internal perspective and instead find ways of testing your consumer's perspective, be it a different React component or a human being, you will test the behavior of the system under test rather than the internal structure. This also means that you can now change the internal structure however you like or refactor the component without having to ever touch the test. All of a sudden, the test actually becomes a way of ensuring that your behavior has not changed while you were working on the inner workings.
Let's look at a practical example. Let's say we have a counter component which increases each time you click a button.
4. Testing Behavior and Properties of Good Tests
To test the behavior of your component, check the text rendered to users. Simulate user behavior instead of calling properties on React components directly. Good tests should be deterministic, isolated, and composable. Deterministic tests always produce the same results given the same inputs. Isolated tests do not depend on one another and can be run in any order. Robust tests are self-sufficient and don't fail randomly. Making the underlying code more robust helps in creating more robust tests.
How would you go about testing it? You could, for example, check the internal state of the button, of the counter, sorry, then call the on click property of the button, and then check that the internal state of the count has now incremented. If you do that, if we just change the name of the internal state from count to, say, number of clicks, the test will rate all the behavior stays the same.
So, what I would do personally instead is to actually check the text rendered to users because that's the bit of behavior that we're actually interested in. So, if you want to test a behavior rather than a structure of your component, don't peek into your internal states to test the logic, but instead test the resulting change in the user interface just like a user would.
And there's something else fishy with this test. We are relying on the primary button's on click property. If we choose to vendor a button differently, say we use a secondary button or the name of the on click property changes, again, our test will fail, all the behavior will still be the same. Instead, I would opt for actually simulating a click on increment because that is what a user would be doing as well. And so, don't test the behavior of your component by calling properties on React components directly. Instead, simulate the actual user behavior.
Now, let's look at some more properties of good tests. Tests should be deterministic, isolated, composable. But what do these words actually mean in the context of our React applications for tests? Let's start with deterministic. At the core, this means that given the same inputs, your test will always give you the same results. There's no randomness, no flakiness. And this is great because if your tests do in fact fail at random, you can't rely on them and they won't give you that confidence that we talked about.
Now, isolated. This means that your tests do not depend on one another. And if tests are, in fact, isolated, that results in them being composable. They can be run in any order and will always produce the same result. Tests that have these properties have two major benefits. On one hand, if the tests are isolated, so are the failures. So, that makes it really easy to find out where a bug is happening. On the other hand, there is also a practical advantage. If the order doesn't matter, you can run them in parallel which I probably mentioned this a lot faster. To summarize, all three of these properties are properties of robust tests. And they're self sufficient and don't fail at random. How can we achieve that? Well, one great way of making your test more robust is by actually making the underlying code more robust in the first place. So, let's look at how we can do both. One thing that will help is not changing state outside the scope of your code.
5. Resetting Global State for Isolated Tests
If you have to change states outside of your scope, reset the changes to your global state before and after each test. For example, if you have a product module that writes to a global product storage, make sure the tests are isolated and do not depend on each other.
If you do really have to change states outside of your scope, to make tests isolated and remove dependencies between them, you will need to reset to changes to your global state before and after each test. Let's look at an example. Here, for example, we have a product module which writes to a global product storage. It has functions for retrieving, adding, and removing products. The first test here has an apple and a banana, and the second test removes the apple again. This means these tests are not, in fact, isolated, because the second one depends on the first one running fast. That means, if you change the order, your test will fail, making it brittle.
6. Making Tests More Robust
The functions in the product module change the global part state. To fix this, we could reset the product before each run or fix it on the side of the code. Let's look at some more tips to make your tests more robust. Changing something outside of your scope is a side effect, so robust code isolates side effects. Let's isolate side effects in an example component that creates a random number. Extract business logic from components into different modules to make it reusable and tests more deterministic. Make sure to wait for everything to return and test specific parts of a component's behavior per test case. Use specific assertions or create custom matchers for more specific tests.
The underlying problem, however, is not the test. The functions in the product module change the global part state. To fix this, we could, for example, reset this product before each run, or we could fix it on the side of the code, and, instead of writing to its global products array, inline the state, and with that, every consumer, be it a test or your actual consumer can create their own scope version of the product, and the test becomes very important again.
Let's look at some more tips to make your tests more robust. Changing something that is outside of your scope is a side effect, so robust code in general isolates side effects, and robust tests mock those side effects and other randomness. Let's look at an example. We are here creating a component that creates a random number and then renders that random number to the user. Creating a random number is a side effect, so on the side of your code, let's isolate that. Using React, there are two great ways of isolating the side effects. We can either move this here into a property of our component, or just into a utility function that we've forged. I'm going to do the second one. All the changes, as you see, is that the generateRandomNumber now comes from the outside, but this has the effect that, on one hand, if we ever need a random number generator somewhere else again, we can reuse this. On the other hand, testing this becomes trivial, because all of a sudden, we can just look at the generateRandomNumber and mock its return value, and then make sure that we actually rendered that number. Here, you can use the same trick for your business logic as well. If you extract business logic from your components into different modules, you'll make your business logic reusable and your tests more deterministic. With that, you can then also test the extracted module in isolation, without having to render a component around them, which should be faster and a little bit easier.
Lastly, one common source of bugs, as well as virtual tests, are race conditions where the order in which asynchronous activities return might play a role. Make sure you wait for everything to return, or test different sequences to account for this. Tests should also be specific. If a test fails, the cause of the failure should be obvious. The name of the test should already give you a good hint as to where and what might be failing, but if that doesn't help, the exact error message in the console should. To achieve that, try to only test one specific part of the behaviour of a component per test case. While this might not always be possible for integration tests or end-to-end tests, in unit tests this setup should be quick enough to set up several tests with a similar setup. So, if we look at this for example, rather than testing your happy path and different error scenarios all in one test case, like here on the left, if you split them out, on the right, you will see a couple of benefits. The first one is that if you now have a failure, this here, you don't know which tests are failing, you only know that this one is failing, because after that one failed, the other test cases are not being executed anymore. Whereas here on the right side, you see that other ones are still working and it's just this one specific behavior that is failing, and that should tell you exactly where to look for your bug. Another tip that can make tests more specific is the use of specific assertions or the creation of custom matchers. Look for example at the two following ways of asserting an array has three items in it. On the top, we are checking that the length is equal to three and we get the error on the right, expected three, received two. If you now instead you see two have length assertion or a matcher, you get a lot more relevant information in the console when you get a failure.
7. Making Tests Readable and Maintainable
Using custom matchers is a great way to make test failures more readable. Good tests should be quick to comprehend and serve as documentation for expected behavior. Descriptive naming and keeping tests short contribute to readability. Extracting common functionality into utility functions improves readability and reusability.
And this is just a contrived example, but using custom matchers is a great, great way to make your test failures more readable. And speaking of readable, that's another property of good tests. It should be really quick to comprehend the test from reading it, and reading a test should basically be like documentation for the expected behavior of a component. And how do we achieve that? Spoiler alerts, because tests are code too, it's pretty much the exact same tips as making your code more readable. Starting with the classic good naming. So we use descriptive names. I like to follow the pattern of nesting components or features and then at the end going for it does X when given Y. For example, it closes the modal when pressing escape key. Also, keep your tests short. Just like with your code, with your other code, you can achieve that by just extracting parts of your tests into other functions or utility functions. Let's look at this for example. On the left side and the right side, which one is more readable? Spoiler alert, they're the exact same tests, just that on one example, I've already extracted a common functionality for waiting for tracking an event, whereas on the right side, that is still there, and this function can only be reused in other parts of the test suite. It also makes the test a whole lot more readable.
8. Importance of Testing and Predictive Tests
Lastly, make sure you use the testing library, whichever one you might be using, to its fullest, to make the test more expressive, and your intent clearer to the reader. Tests should not only be readable, they should also be quick and easy to write. There should always be a balance between spending time writing tests and writing code. Use the right tool to write your tests. Tests should be predictive, allowing you to refactor fearlessly and automate manual tasks. Achieving a high level of confidence in your test suite will change the way you work as an engineer, reducing time spent on hotfixing bugs and monitoring deploys.
Lastly, make sure you use the testing library, whichever one you might be using, to its fullest, to make the test more expressive, and your intent clearer to the reader. If we look back at this example of implementing a counter from before, what is more expressive, peeking into a randomly named state variable, or making sure that the headline has changed from something to something else? Similarly, what is more expressive? A line of CSS changing in the snapshot of the component, or a visual representation of that change before and after? This is why I'm not a big fan of snapshot tests and prefer using visual regression testing tools like Percy instead.
And tests should not only be readable, they should also be quick and easy to write. In most cases, there is no point in spending more time writing your tests than writing your code. And while I believe that refactoring your code in a way that makes it easier to test is generally beneficial for your code base because it will make it more reusable, deterministic, and less coupled, there should always be a balance. If something is particularly hard to test, ask yourself, how important is it to be confident that this part of the application works? I would, for example, absolutely spend a lot of time writing a hacky and unreadable test, in other words, compromising on all the other properties that I've walked through already to ensure that a user can log into Spotify. But I wouldn't necessarily do the same to ensure their profile picture is aligned correctly on Internet Explorer.
Another thing that often helps with this is moving up or down in the testing stack of Permit. If something's really tricky to test in a unit test, maybe ignoring an edge case and testing general behavior in an integration test is a good compromise. And lastly, use the right tool to write your tests. In the past, it's been extremely cumbersome to write tests for web applications, but with tools like Cypress with its Visual Debugger or Testing Library, which encourages testing from user's perspective, there's no excuse at all not to at least have coverage for the most important flows in your application. So if the tool that you're using is holding you back, I strongly encourage you to consider switching to a different one. And now for the last one.
Tests should be predictive. The reason I saved this one for the end is that I think it ties everything together beautifully. We've already discussed that tests should give you confidence. This property goes a step further. You have to say your test suite as a whole should be able to predict if your application works or not. If a test fails, that part of the application will fail in production. If all tests pass, the application will work in production. It is a really, really high bar to achieve this level of confidence and we need good coverage on all front-ends speed, automated integration tests, end-to-end tests, visual regression tests, whatever tests you have. But once you get to that level, the way you spend your days as an engineer changes. You'll spend a lot less time hotfixing bugs in production or monitoring your deploys to make sure they don't break the system. No longer will you be scared of deploying on Fridays or before going home. Instead, you will be able to refactor any small or big part of the application fearlessly. And you can even start automating manual tasks, like keeping dependencies up-to-date. As long as all tests pass, your code is suitable for production, no matter if the code was changed by you, a different team, or even a bot. And that just gives you endless possibilities. With that comes the most important change for us as developers. Without needing to extinguish fires on a weekly or even daily basis, our work becomes a lot less stressful, and we have the time to focus on what's actually important.
9. Delightful Experiences and Audience Preferences
Developing delightful experiences for our users. Thank you. During the pandemic, 52% of people have been enjoying homemade sourdough, bread, or other Cajun projects. The poll results at Spotify were very different, reflecting the international nature of our audience.
Developing delightful experiences for our users. Thank you. Well you asked what people have been eating during the pandemic, and 52% of the people have said homemade sourdough, bread, or other Cajun projects. Are you proud of our audience? Yeah, very much so. I mean, that's the same for me. So as you've seen all of those pictures, yeah, that's that's really really cool. We had the same poll at Spotify and the results were very very different. But only for the people in the UK or also international? Because I think international crowd changes everything. Yeah, it was actually for our entire mission, which I think spans the US office. So, New York and also London and also Stockholm. Okay, so that's also really international, cool.
QnA
Q&A on Crashes and Refactoring Tests
Alexius is asking about crashes in the Spotify desktop client. While I don't work on the client, we do track crashes and have monitoring in place. If you want to report bugs, you can reach out to our customer support on Twitter or through our forum. We value our audience's input and have implemented suggestions through our HiFi announcement. Regarding refactoring or redesigning components, it depends on whether your tests give you a good idea of the component's functionality. If they do, you can prioritize refactoring the component first. However, if there are important scenarios not covered by your tests, it's best to update the tests before refactoring.
So let's go into the Q&A. We have a lot of questions from our audience for you. So let's get to them right. Alexius is asking Spotify desktop client fails to load the homepage quite often recently. Do you have any reporting tool which collects crashes on the client side? Yes, I don't work on the client, so I can't talk too much about that. I know that we generally track crashes and stuff like that. We do have monitoring. I really can't talk to any details because I don't touch those clients. If you want to report bugs, I know that we have different interfaces for that. You can even just reach out to Twitter. I know we have one of the nicest customer supports in the world. We hear every year. There are apparently ratings for them somewhere online. They're really nice and friendly people. Cool, yeah, and also, I forget the name, but there's this blog where you can make suggestions also, right, for Spotify? Yeah, exactly. There's a forum that we actually regularly look at as well. Yeah, you might have heard that we announced HiFi. One of the highest voted things in there. That's super nice. That workflow, really giving your audience a say in what happens. Yeah, so that's awesome.
Next question is from Hama. Imagining you need to do some refactoring or redesigning components, would you make sure to improve the old test before refactoring, or would you prefer to refactor them and then improve the new tests? Okay, so you're refactoring the tests. Did I understand the question correctly? No, you're refactoring or redesigning components. Okay, okay. Then would you prefer updated tests or after? I think it depends. If your tests give you a good idea if your component is working or not, then you can probably go ahead with refactoring the component first, because really you do want to invest in what's most important for your company. Refactoring tests is probably not what pays the bills for your company. If you can, I would start with the component. But if your test, say, you lack, I don't know, coverage for this one really important scenario that your component does, then do go for it at that first just to make sure that the behavior of the components change as we refactor it.
Test Coverage and Wrapping Test Assertions
I agree that test coverage is not useless. It helps identify missing tests and highlights areas of the code base that are well-tested or lacking coverage. For example, we discovered that while our low-level components had good coverage, our bigger containers had poor coverage. To address this, we added Cypress tests to test integration between these components. By focusing on what is missing rather than aiming for a specific percentage, we can effectively improve our test suite. Another question raised the issue of wrapping test assertions in functions, which some claim can decrease readability. However, I need more information to provide a definitive answer.
I'm thinking about it and I agree. Next question is from Sasha. Oh, if you're sharing your slides after the presentation? Say again? If you're sharing your slides after? Yeah, absolutely. They're probably already in a folder somewhere. If not, if I forgot to do that, then I'll add them very soon. But yeah, they will all be available online, I think. Great. Great.
So Sasha, look out for them. Popplinguje is asking, do you think testing the coverage is useless? So what do you think about test coverage? I don't think it's useless. I think it's a very good tool to spot where you're missing tests. It's not necessarily... I don't think that it makes sense to go like, okay, we need to have 90% coverage on this level and 78 on that. On that level, I don't think coverage is a good measurement of success. But where it's helped my team a lot is just to see what parts of our code base are well-tested, which parts are missing. And we noticed, for example, that we have really, really good coverage for those very low-level components, but we had pretty bad coverage on the bigger containers, everything that combines logic with the tiny little new components. So then... The hard to test stuff. Yeah, exactly. Then we made the decision, okay, we need to change this. So we added our Cypress tests on top of that to test integration between those kinds of things. And then it actually makes sense to look at where are our gaps using the coverage reports. Yeah. So then you're not looking at the percentage, but you're looking at the report line by line and saying, hey, this is red, but we need to test this scenario. Exactly, like what is missing currently rather than trying to fulfill some form of percentage ratio, anything like that. All right, awesome.
Next question from Sochiakropka. Some claim wrapping test assertions in function decreases readability enforcing, seeking for them. What do you think about this? Okay, I'm not sure I can compute this. Can you repeat the first part, wrapping? Wrapping test assertion in a function, so kind of extracting the test into its own function, could decrease readability and enforces you to seek for them.
Extracting Functions and Improving Readability
I think it depends. I'm a big fan of extracting things into functions that make sense in a unit. If you have something that you use in several places, then extract it into a function. For readability reasons, if the test is long and you want to test other things as well, put them in a separate function. Also, if a big test is mostly set up, extract the setup into a separate function.
And what do you think about doing this? I think it depends. So I'm a big fan of extracting things into functions that make sense in a unit rather than going the other way around and saying, you know, like you can have a setup function that sets up your tests. I think that's fine, whereas going the other way around where you kind of have all of your different scenarios in, say, an array or something with different objects inside that set up your test, and then mapping over those and then using different parts, like, that's, I think, where it becomes unreadable.
So I just try to abstract away things that make sense as a unit. If you have, you know, users successfully logged in as something that you use in several places, then I think that makes sense to extract into a function.
Yeah, exactly. So it's only when you're reusing you would do this? Or also, honestly, for readability reasons, if it's this long, right, and you at the same time want to, it's just like a side thing in your test, you want to test other things as well, maybe that's already your fault, like, maybe you should not do that, you should have a separate test for each. But if you, in an integration test, maybe want to also check that the logging works or something like that, I wouldn't dedicate half of the function to that logging, I would put that in a separate function, just to make it more readable and bring the point across, right, get the point.
Yeah. Yeah. So, also, if you have like a big test, but 80% of that is set up for getting to the right page or setting up the data, then extract the set up, what's this act, ACT, right, it's the three steps, and you can make a function, a C function, a T function. Yeah, absolutely.
All right. We have time, oh, we don't have time for one last question, but luckily, Iris will be in a spatula chat where you can continue the conversation, talk about testing, but unfortunately we have to let you go here now. I really enjoyed talking to you and hope to see you again. I hope you make the cut again on our next edition. Hopefully. Thank you so much.
Comments