Storybook is a complex OSS project, integrating with a wide range of stacks, and used in various ways by millions of devs. What's it like maintaining a project like that? How do we ensure it doesn't break?
How We Test Storybook Itself
AI Generated Video Summary
This Talk discusses the use of TypeScript and Storybook in software development. It covers the premise of components and the complexity of testing Storybook. The setup process for Next.js and Storybook is explained, along with the testing workflow and CI integration. The Talk also touches on caching, bug reports, and the release process. Documentation management and improving test run time are discussed, as well as testing feature flags and mobile usage.
1. Introduction to TypeScript and Storybook
Not perfect. And this is kind of a talk of a type that I've never given before. So before, I always gave a talk about, oh, you should do this new thing right now. This one is more of a case study. And so there's a lot of interesting bits for me. I hope they're interesting for you as well.
I'm from the Netherlands. And as introduced, I work for a company called Chromatic. But my full-time job is maintaining Storybook. So I have to get this out of the way before I can talk about what makes Storybook internals tick and work.
What is Storybook? Storybook is a tool that you can use to build better components. Because the core principle of Storybook is that you can work on them in isolation. Instead of working on your components top to bottom, you can basically start working on any component first. You can catalog all of those components and all of their states, all of the variances. And you can very importantly test them. That is what this conference is all about, right? And then you definitely want to share those components with your stakeholders. Storybook is quite popular, which is very humbling for me. All of these companies are using Storybook. And the reason is that at some point when you're building applications, you might start with some easy components. And everything feels super easy. But at some point, you reach a level of components that is just complex. I've seen components of like 5,000 lines of code. And everything is contained within the file. Like if you were to understand all of it, you could. But oftentimes, you kind of learn how code runs and how it works by how it's used.
2. The Premise of Components and Storybook
The premise of components is that you can use them all over. Storybook brings together different edge cases and allows you to visualize them. These edge cases can be combinatorial, combining various states and languages.
The premise of components is that you can use them all over. So if a component that is complex has all of these different variances and variations across the application, well, you're not going to look all across the application and search for all of those use cases. So that's really what Storybook is about. It's bringing all of those different weird edge cases that you might not even encounter in your app normally and bring them together so that you can visualize them there.
Often overlooked is that these weird edge cases can actually be combinatorial. So a loading state and a non-authenticated state and a different language can all coalesce together.
3. Complexity and Testing of Storybook
All right. With that out of the way, I can start talking about my actual presentation, right? So Storybook is, on itself, pretty complex. And that's because of what we do. Storybook is like a web app that you bootstrap next to your own web app. And so we have to deal with whatever your web app kind of did because we're looking at the same source code as your web app.
So we have to deal with multiple builders, Webpack and Vite. We have to deal with multiple frameworks like React, Vue, Angular, Svelte, et cetera. And then there's meta frameworks layered on top of those, like Next.js and SvelteKit. We have to deal with multiple package managers, npm, yarn, pnp, and pnpm. Some people really love TypeScript, me included. But then some people feel like, no, I don't want like a translation layer between the writing my code and running my code, which is also valid. And so we support both languages. And we try to support them both equally.
And something just went wrong with the slides. And then on top of that, like, all of those kind of things are outside of our control. But we also make our own lives more difficult by adding more feature flags and making Storybook very configurable. Because we want anyone to be able to use our tool, but that means that loads of people become loads of opinions. And making everyone happy means you must add configurability. We try to modularize our feature set into add-ons. And we want to allow anyone else to create such feature sets as well. So we actually have a community add-on API that anyone can use. And then the elephant in the room, honestly, is backwards compatibility. So Storybooks have been around for about 7 to 8 years. And so our code base has changed a lot. Our mentality, our ideas of what Storybooks should do, has actually changed a lot. But at the same time, we don't want to tell users, like, what you've been doing for the last N years, just throw that away, and do this other thing instead. Or when we have to do something like that, we want to give them clear migration guides, or even auto-migrations.
4. Testing and UI of Storybook
We use TypeScript to ensure code interlinking and boost confidence. ESLint helps with subtle bugs, and unit tests cover easy translations. Storybook has its own UI, with components and stories within Storybook.
Yes, that is part of the easy stuff. We use TypeScript across our entire code base to make sure that all of these different packages and all of these code paths, they interlink correctly. TypeScript is a huge factor of how much confidence we have in our code. We use ESLint to kind of take care of some of the subtle bugs that you might encounter. And then obviously we've got a bunch of unit tests for all of the small fragments of code individually. Anything that is an easy translation from A to B, we've got unit tests for. And then interestingly enough, Storybook does have some UI of its own, which is written in React. And so we have a Storybook for Storybook. We have a bunch of UI components that have stories that make up Storybook. And then it can be quite meta where you see a piece of Storybook inside of Storybook.
5. Testing Storybook and Performance
If someone adds Storybook to their project, they need to ensure it still runs and can create a build. Add-ons and configuration changes should also be tested. Performance and install size are important factors, and merging PRs should not negatively impact them. The size of the static version of Storybook should also be monitored.
All right. The hard stuff. Because this is actually something that would keep us up at night if we wouldn't have. If someone were to add Storybook to their project, does that work? After they've added it, does Storybook still run? And can it create a build? If they add add-ons to it, do they work? If they change some configurations, do those configurations do what they're supposed to do? Many of our configurations actually kind of portal into some other tools configuration. If you configure Storybook to support MDX, actually we configure Webpack to go and support MDX. So our tests also test a lot of stuff like the giants that we stand upon. And another big part for Storybook, very important for us, something that we've focused on a lot for the last one and a half years is performance and install size. And so for us, it's really important to know that when we merge PRs, that is performance the same or is it better? Is it worse? Is it an okay trade-off to make? And same for install size and how big is it when you create a static version of your Storybook? How big is that? Is that increasing by this PR? Yes or no?
6. Setting up Next.js and Storybook with NPM Proxy
We just do the triple A: arrange, act, and assert. Setting up a new Next.js project and adding Storybook is not easy. The Storybook CLI does different things based on the project's existing directory. We need an NPM proxy to test code from our repository. We download and compile local packages, set up the NPM proxy, and publish the packages. Then we can set up the project, add Storybook, and make key modifications to the Storybook config. Performing these checks takes time.
Well, we don't want to do this manually all the time, right? Like I definitely don't. And these steps are not super easy because setting up a new Next.js project like needs an active Internet connection. It needs to talk to NPM. Adding Storybook also invokes a CLI that you would normally invoke via something like NPX. And then it goes and talks to NPM again to download the right packages based on the file or folder that it's in.
So the Storybook CLI does different things based on what's already in the directory that you're running it in. So if you have a view project, the Storybook CLI will add Storybook for view onto it. If you're using React, it will add Storybook for React onto it, etc. So in this test, if we were to write it just like this, there's another complication factor because, well, we don't want to test stuff that's already on NPM, right? That would be a bit too late.
So we want to test code that is in our repository, but then the CLI comes from NPM. And it's going to talk to NPM to download code. So what we need is like an NPM proxy in front of all of this. And that NPM proxy needs to be filled with stuff, right? And so in order to fill that proxy with the packages that come from our repository, we need to go and get that repository. We need to download all of the packages to make that work. We need to compile all of those local packages, then set up the Ferdacho NPM proxy and publish the packages. And then we can set up the project. Then we can add Storybook to it. And we might actually want to make some key modifications to the Storybook config because we might want to test those. And then we can just perform those checks.
All right, we're almost there. We don't want to do this whole thing in one go for every line of code that we change. That would be horrendous. This takes some time to run all of this.
7. Testing Workflow and CI Integration
To ensure efficient testing and support for various configurations, we compile and publish 80 packages, set up Storybook in 50 projects, make 5 modifications, and run 8 tests. Although running these tasks in parallel on CI helps, it's crucial to consider the workflow for day-to-day Storybook maintainers. To provide fast feedback, we've created a script that allows targeted testing and handles repository state and caching. The script is the same for CI and local runs, and failing CI jobs provide a simple command to reproduce the issue.
This takes some time to run all of this. So if we want to do this more often than twice a day, we need all sorts of caching layers in between. But the scope is even much bigger than that. So I just showed you one project, right? But, again, we support a lot of different types of projects. We support projects generated by Vue CLI. We support CRA. We support Next.js and Vite CLI. Many of these generators have all sorts of flags to generate with different languages, etc. And, as I said, we want to support and test for the multitude of configurations that we allow users to set. And we not just have one quick check whether Storybook works, but we want to check multiple things. We want to check whether the test runner works. We want to test whether the static build works, etc. So the real scenario kind of looks like this. 80 packages to compile, 80 packages to publish, 50 different projects to set up, which we need to add Storybook to, and 5 different modifications to make, and then about 8 tests to run.
Now, luckily, the CI, if we're running this on a CI, we can do a lot of this stuff in parallel. And so it doesn't have to take a lot of time, but it still takes some time. But it's very important to not just think about CI. It's important for what this workflow and what all of this testing means for the average day-to-day Storybook maintainer. How do you make sure that the PR that you're working on works? Well, you could just push it onto GitHub and wait for the CI, but you definitely want to be able to do it locally as well and get a decently fast feedback cycle. So what we've done is we've created a script that anyone in the repository can invoke with a specific target test or step in mind. So I want to test whether an end-to-end test after a static build, whether that works. So you can target that specific test. And then the script just kind of figures out exactly what the repository state right now is in all of its caching that we do. And then it can start from the right place in time. And then importantly, the script is the same in the CI as when running it locally, which means that when a CI job fails, which is kind of the point of a CI job, right? A CI job should tell you when something is wrong. If a CI job never fails, then something is actually wrong. So we definitely are good there. Our CI jobs fail a lot. But importantly, when the CI job fails, then at the end it just says run this one single line of code in your command line and you will be able to reproduce. And this line of code is not super complex.
8. Yarn Task and Templates
I think it's very descriptive. Yarn task end-to-end-dev. You can optionally tell it where to start. We have a long list of templates that Storybook supports. Real people run Yarn, create Next app every day. Our CI config involves setup, sandbox creation, building, and testing. We have a daily job that runs project generators and caches the outputs.
I think it's very descriptive. And this is what it looks like. Yarn task end-to-end-dev. And then you can optionally tell it where to start. In this case, I say auto. And then we specify a template.
And so our CI config kind of looks like this. We've got some setup to do. So this build takes just over three minutes. And then we create all of these sandboxes. And then we build them all. And then we test them all. And if I were to zoom in onto the build sandboxes here. These are actually running many, many projects all at the same time. Which is not cheap. But it does work. So all of these projects are running in complete parallel. And if one of them fails, we'll know. Something we realized is that these projects, these sandboxes, that we bootstrap Storybook on top of, they don't actually change all that often. So we have a daily job that runs these project generators. And then caches these outputs into a repository.
9. Cache, Templates, and Bug Reports
The cache is useful for speeding up repository cloning and tracking changes in the Next.js generator's output over time. It helps us adapt to new versions and ask users to create issue reproductions. Users can choose templates on storybook.new, start a project with initialized Storybook, and file bug reports without checking out anything.
And this cache turns out to be super useful. Not just for us, but also for others. So cloning a repository is a lot faster than running this generator usually. But also importantly, we can actually see what the output of the Next.js generator is over time. So we can see what the output is as it changes. And this gives us hints for if Storybook fails to initialize on a new version of Next.js, for example. Like what internal changes they made. So that we can adapt to it and stay future-compatible but also backwards-compatible. And we can use this cache as well to ask users to create issue reproductions. So this is actually linked to StackBlitz. And if you use storybook.new, you can choose one of these templates. And start a project and get Storybook initialized on top of it. And you can file an issue that is demonstrating the bug. Without actually checking out anything.
10. Testing, Performance, and Publishing
We inject more components and stories into Storybook to improve coverage. We test sandbox building, dev mode, and run tests with different tools. We monitor Storybook's performance over time and compare it to future PRs. Our release cadence is every 4 to 6 weeks, with major versions once or twice a year. We can do alpha releases frequently and easily create canary releases. Staged releases allow continuous merging of PRs into the development branch. Automation helps with creating change logs.
Okay, so we've done all of this arranging. Now let's act. So what do we actually test? So when you initialize Storybook yourself as an end-user, you only get like three stories and three components. That is obviously not enough to get good coverage. So we inject a whole bunch of more components and stories into. We check if every sandbox can be statically built. And whether the dev mode works. We test it using the test runner. And we test it with playwrights. And we also importantly test it with chromatic. Which my colleague Ruben will say more about in a later talk.
We also ensure that Storybook gets faster and smaller over time. Because we take data from all of these caching steps and put it in a database. And we can see how long these steps took. So when we're building the static version of Storybook. To then run an end-to-end test on later. We take that timing on how long it took. And can compare it to future PRs.
Alright, I also want to say something about publishing. Our release cadence is every 4 to 6 weeks. We try to do a major version about once or twice a year. We can do alpha releases pretty often. Often multiple times a week. And we can take any PR and do a canary release with a few clicks of the button. We do do staged releases. Which means that we can get PRs merged into our development branch pretty much constantly. We don't need to wait. And then we have a very easy way of creating a PR that is effectively representing. If we merge this we do a release. And there's a whole bunch of automation for creating the change log automatically there as well.
11. Storybook Documentation Management
We have documentation for all versions of Storybook, allowing users to access relevant information even when using older versions. The documentation is managed in the monorepo and stored in Git for version control. The front end of the documentation website retrieves the content from GitHub based on the desired version.
One often overlooked for a project like this is that we have documentation as well. And this documentation is not just for one version, but for all versions that we ever published. Because someone that's using Storybook 7.x, they still want to be able to read how they should do things, even though we've released Storybook 8. So the way that works is we have a front end for the documentation website somewhere else. But then the content of the documentation website is managed in the monorepo, which allows you to when you're adding features, also add the documentation along with it, which is great. And then Git is just a perfect way to actually keep that content in a history there as well. So the front end makes requests to GitHub to pull the content in of the right version that people want to show.
12. Managing Documentation and CI Challenges
We have a front end for the documentation website managed in the monorepo, allowing us to add documentation along with features. Git is used to keep the content in history. Our biggest mistake was getting the parallelization wrong, causing one template to affect the others' failure. Storing and restoring the workspace takes the most time in our CI. Some sandboxes are expected to fail, which is problematic for alpha releases. I'm pretty much over time.
So the way that works is we have a front end for the documentation website somewhere else. But then the content of the documentation website is managed in the monorepo. Which allows you to when you're adding features. Also add the documentation along with it. Which is great. And then Git is just a perfect way to actually keep that content in a history there as well.
So the front end makes requests to GitHub. To pull the content in of the right version that people want to show. We learned a bunch of lessons writing all of this and doing all of this. I would say our biggest mistake was kind of getting the parallelization wrong. So kind of the separation of concerns I guess. So when I showed you that graph of RCI. Notice how building sandboxes is one thing. That all run in parallel. But that means that if the next.js or Vue CLI thing breaks. Then that CI job just stops. And we don't run anything further. So we constantly are kind of racing to make sure everything works. And so one template can affect the others failure.
We also noticed that storing and restoring the workspace. Which is a feature of CircleCI we heavily depend upon. It takes a lot of time. In fact it's kind of taking the most time in our entire CI. We also noticed that later that not all sandboxes are kind of equal in importance. Which means that we have some sandboxes that we kind of expect to fail sometimes. Which is really bad because of the previous mistake we have made. Because like some of these sandboxes are for alpha releases of things we want to support. But then those are not stable themselves. And that is my talk. I think I'm pretty much over time.
13. Improving Test Run Time and Developer Experience
I forked Storybook last summer and ran the test. We were testing out replay with it. It takes about 16 minutes for a PR to go from kicking off the CI build to going green. Developers tend to weigh on the CI for their first PR. It's a balance between test confidence and developer experience. The goal is to invert the parallelization to improve run time.
But thanks so much for listening. Yeah so that was super interesting. I actually forked Storybook last summer and ran the test. We were testing out replay with it. And it's obviously very robust, very mature test base. And so it was great to see those details there.
One of the questions that we have here is how long does it take to run all the tests in CI? And does it feel productive for you on a daily basis? It definitely can be improved. It is a question of what is fast enough for the amount of pressure that you're behind. So if you've got a chill development then you can take some more time. If you've got lots of small tasks to work on it's okay to wait. But to answer the question I think it takes about 16 minutes or so for a PR to go from just kicking off the CI build to it going green.
Okay, yeah and I know that you mentioned talking about that developer testing experience as well. And doing that locally. Do you find that developers tend to do that before kicking it to CI pretty frequently? I think it differs between a collaborator or a contributor that just opened their first PR. I think they're more likely to weigh on the CI because reading the contributing docs is a lot of work. So I think that's also just fine. Yeah, like you said it's always a balance right? Between the confidence and the security that you get from the test and how long they take to run. And how that could potentially impact developer experience versus the end user experience.
Now you mentioned that there are some things that you may have done along the way so far to improve that run time. Can you talk a little bit more about what some of those are? Could you clarify the question? Yeah, sorry. So you mentioned that there's things that you could do to make the run time a little bit faster. But that you've worked on that so far. What have you found has improved that run time? So the thing that I really want to do but just haven't found the time for yet is to invert that paralyzation that I was talking about. So that we effectively say each sandbox gets its own little lane. And so one thing that it doesn't feel like it's wasting a lot of time. But if we have like 40 sandboxes and one takes like half a minute to complete and the other one takes four. Well the end to end test for that first one that only took like 30 seconds won't kick off until the other one is completed. And so again it doesn't feel like a lot of time but I feel like there's probably actually quite a lot of time on the table. Not to get the final green checkmark at the PR. But oftentimes you're just waiting for one specific thing which could be a lot faster.
14. Testing Feature Flags and Mobile Usage
Instead of waiting for the final check, finding bottlenecks along the way can be helpful. Storybook uses feature flags, and sandboxes allow for config overrides. Mobile testing is not currently a priority, as Storybook is primarily used as a desktop app. Improvements to the mobile UI are planned for Storybook 8.
Instead of waiting for the final check. Yeah, so it's always finding those little bottlenecks along the way that can help a lot. So you mentioned that obviously Storybook is very configurable and you use feature flags for that. So how does Storybook test those feature flags? So when we set up those sandboxes we can override some of the config. And kind of force the Storybook's main.ts config file to be edited. So we can put that into the template object. What those modifications are. And so we just have more templates that show the Storybook with each modification.
All right, great. Yeah, so I see lots of questions. We have time for maybe one or two more. But let's start with how do you test on mobile? We do not have tests for mobile at this point in time. We have Playwright tests on all of those sandboxes. And we also have Chromatic tests on all of those sandboxes. But I don't think any of those are tested currently on a mobile viewport. Definitely not on a real device. I'm not sure if that is worth it for us on a cost perspective to do. Because Storybook kind of feels like a desktop app to a lot of folks. And more of a quick reference thing on a mobile. So for us it's just not a high priority concern.
Yeah, that was going to be my follow up. How often do you actually see people using Storybook on mobile? And I imagine probably not too often at least yet. I mean that's a bit of a chicken and egg problem. Where if Storybook doesn't function all that well on a phone people are less inclined to use it. And then we're less inclined to make it a very nice product. I think we should put in the time. We'll see in Storybook 8 that the mobile UI is going to be a lot better. Yeah, and if anybody is using Storybook on mobile I guess make sure that you... They'll be happy! Yeah, let Norbert know and see how y'all can help with that. So awesome. Well I think that's about the time that we have. But if you do have any additional questions for Norbert, again make sure that you join him at the speaker Q&A. And we'll see you there and around the conference. So thank you so much.