The Evolution of Browser Automation


In this session, we’ll take a look at what has happened behind the scenes in browser automation throughout the years and what the future will have in stock for us. We will examine how web testing will develop and what challenges this will bring for conventional frameworks like Selenium or WebdriverIO, as well as new frameworks such as Cypress, Puppeteer and Playwright. Lastly, we will experiment with some new automation capabilities these frameworks provide to test some of the new web features.


Hello, y'all. Thank you for joining the session and particularly big thanks to the test.js summit organizer and speaker committee for inviting me to open the conference. I'm very excited about all the great talks from experts around the world that we will get to see over the next two days. And I'm very happy to see such great events continue to take place despite the difficult global situation that we find ourselves in. I would love to spend the next 25 minutes to speak a little bit about how browser automation has been involved over the last decade or so. And I hope it gives a little bit more context when you hear about automation tools in the upcoming sessions. But before we start, let me introduce myself. I'm Christian. I'm working in the open source program office at Sauce Labs. And most people probably know me as the maintainer of, which is a project that got me excited about automated testing and browser automation many, many years ago. And maintaining the project really taught me a lot of things about generally how browsers work and how open source and open standards are being developed. And those are all topics that I'm fortunate to work on full time these days. The reason I wanted to give this talk is because I see a lot of misconceptions about how browser automation actually works. It's an interesting challenge, especially for cloud vendors and cloud providers, because as a user, there's not much delineation between your automation framework and the automation actually happening in the browser. So for instance, if your click doesn't happen, even though the test script passes without errors or if the script cannot find an element, even though you clearly see that the element is there when you check it yourself, people kind of blame the frameworks first and then at some point the cloud vendor second. While in reality, there are a lot of nuances and processes responsible for making that click happen in the first place, maybe in a VM that is miles and miles away from the machine that actually runs your test. So let's have a look how a click command in the framework actually ends up being a click event in the browser. And to do so, I would like to start with a small recap. Browser automation has been around for more than a decade, and there have been quite some interesting developments and influences happening over the years, especially with the web changing from how it was 15 years ago to what it is today. So let's recap what has happened so far and how we got where we are right now. It started all kind of in 2004 with someone called Jason Huggins having the need to test an expense tool at ThoughtWorks to make sure it works on IE as well as on Firefox back then. He called that tool Selenium and it's probably a project that you all know already. A year later, another actor jumps into the scene claiming to have built a better tool which was called WebDriver. That guy was Simon Stewart. Both tools over the years gained more and more popularity as browser automation became a thing to test web application. So at some point, Jason thought it would be a good idea to create a company. That company was called Source Labs. It was apparent that both tools, the WebDriver project and the Selenium project, they were great, but they have their certain limitation in specific areas. Selenium, which was back then running in the browser, had problems with cross-region policies and automation around the browser in general, while WebDriver had other limitations when it came around automating certain elements. So Jason and Simon merged the project together in joint forces to overcome these limitations and really provide the best experience possible at the time. Over the years, these frameworks gained more and more popularity to a point where people had a clear idea about how automation works. So a working group at the W3C was formed to standardize this process. The goal here was really to make sure that a click in, for instance, Chrome was the same as a click in Firefox. And so the people there started an effort to draft a standard with the requirements in mind that people had at this point in time when it comes to browser automation. This created a lot of confidence and traction in the ecosystem where a lot of new projects started to flourish and started to be created. We see the release in 2011, and we see other projects like Appium that bring the same principle into the mobile space. What then happened was quite interesting. The web kind of changes a lot, and also the way how web applications build. What has been before kind of a static server that was delivering static websites has now become more and more a dynamic JavaScript heavy web application that uses frameworks like React, Vue, Angular, or Sweld. That drastically has changed a lot of requirements that people had when they test applications. Suddenly, frontend and backend became more and more decoupled, and people really wanted to start focusing testing only the frontend application rather than deploying the whole stack. And with the continuous development of more web APIs that became available in the browser, people had more and more use cases to test. A lot of these use cases were not really in focus when said anywhere that developed. Luckily, during those times, we had companies like Cypress who stepped in and filled the needs for developers in a really extraordinary way. They tried to close the gaps, as well as a lot of other tools that started to pop up in the ecosystem. During all these developments, the standard that was supposed to solve these problems was finalized and became a so-called recommended standard. However, while it allowed you to run automation across all browsers, its original design was already behind, and it was clear that it wouldn't solve the problems that developers have building modern web applications today. So, almost at the same time, a new effort was started to develop a new protocol with the experiences and learnings that made creating the first one, and new requirements that developers have building modern web apps today. So, if we look into the ecosystem, we can pretty much group tools into kind of two buckets. We have on the one side the more conventional tools, like Selenium or, and we have on the other side the so-called non-standard tools. Both groups have some interesting characteristics. Starting with the conventional ones, they are, as you might expect, all using the WebDriver protocol, and therefore allow you to truly do cross-browser automation. Every command you can run in WebDriver is tested in every browser, like any other standard that you have in the web. However, given the way some front-end frameworks are built, it can still create some incompatibilities when testing web apps. So, as a design of this protocol, which was originally do everything a user would be able to do, it's not very suited for developers that like to introspect all areas of the application. These tools aren't really that popular among devs, and more used by QA folks. However, many of them are open-governed, open-source projects with a long history and a large community. With, that is, for instance, part of the OpenJS Foundation alongside Node.js, Marker, and Webpack. And then we have Selenium, which is the project of the Software Freedom Conservancy. Now, on the other side, we have the, I call them non-standard tools, which all have their own ways to automate the browser and their own set of advantages and disadvantages over each other. These custom approaches usually are based on some sort of JavaScript emulation or through the use of browser APIs. That makes them, however, all limited to a certain browser and provides them, however, with the capabilities that you would not have with WebDriver. And therefore, it's much more interesting for developers that like to introspect web apps, that like to introspect the network and the DOM, things like that. What is interesting that all these projects are paid by companies and multiple people working on these projects full time. Looking at all these projects together, we see that we have tools like Cypress and Taskafee that taking the approach of using web APIs for automation. We have Puppeteer and Playwide that rely on native browser APIs. And lastly, we have Selenium,, and many other tools that rely on WebDriver protocol. What's worth pointing out here that some tools like Cypress or actually use a mixture of two approaches. For some automation capabilities, Cypress needs to use browser APIs, for example, to take screenshots of the browser. On the other side, you have that uses browser APIs for its performance testing features or integration into Google Lighthouse. Let's have a little bit closer look to see how you can automate browsers these days in general. There are, as I mentioned before, only three common approaches. One through JavaScript, the other through browser APIs, and the last through a browser driver. I would order using JavaScript and web APIs as automation approach as kind of the first generation of browser automation. Even though more novel tools like Cypress or Taskafee are using this approach, it actually has been around since the creation of Selenium, which was like Cypress, a test runner within the browser. In fact, for some automation commands, like finding out if an element is visible, Selenium still uses JavaScript that it injects into the page. This is because it is almost impossible to define a visible state in the context of a web standard like WebDriver. The advantage of this approach is that you have full control about the execution environment of your application. It allows you to run the commands fast and reliable. However, you can only do as much as JavaScript and particularly the web APIs allow you to do, which is a lot, but you need some workarounds for certain situations, like, for instance, taking the screenshots. If we look how that works when using JavaScript automation as an engine, there are usually two common approaches that you can take to go with that approach. One used by Cypress where you run the application on a test in an iframe and have a test runner in the parent frame accessing the execution context. The browser loads pretty much a test runner that renders the application on the test in an iframe. The test runner then can exchange messages or access the iframe with JavaScript directly. TestCafé approaches this a bit differently. They use a proxy to proxy all the requests that come from the page and inject, therefore, their own JavaScripts into the HTML page. Again, the general limitation to this approach is that you're caught in the JavaScript sandbox and only have access and don't have access outside of it. The workarounds that tools like Cypress or TestCafé use are through browser APIs. Browser APIs are, I would say, the second generation of browser automation because it was originally used in the WebDriver project to automate browsers. Back then, it was common to abuse the browser extensions to trigger certain events in the browser. Other browsers like the Internet Explorer were automated by a native common interface that Microsoft provided in Windows operation systems back then. Today, the browser vendors have evolved and browsers are much more capable to introspect what's happening in the browser. We see a lot of those capabilities are used not only in automation space, but also for debugging browser through tools like Chrome DevTools. Again, the problem with these browser APIs are that they work very differently and aren't really well documented outside from browser teams. They actually change a lot from one version to another. Think about this as you have three browser friends that offer you help, but all speak a different language, with one friend being Zafari, who doesn't really like to listen to you and always wears headphones. To give all browser friends one task, you would need to understand and speak all the languages, which is why tools like Puppeteer or frameworks that are based on Puppeteer usually can only automate one type of browser, in this case, a Chromium-based one. Luckily, we see that browser vendors start to join forces to speak a more similar language, which is why we see, for instance, Puppeteer support in Firefox Nightly these days. We have, on the other side, former browser teams implementing custom adjustments to browser engines to allow you to do the same type of communication between all of these browsers. Lastly, we have the web cover protocol, which I would argue is the third generation, as it combines the previous two approaches to ensure a consistent automation experience across all browsers. It's a web standard, which means it's an actively developed protocol by all browser vendors. If we meet as a working group, we have people from Microsoft, from Google, and from Apple on one table discussing about that standard. All changes to that protocol are tested, like all web standards are. If we add a command, that command is tested in all browsers and continuously run to check that we are reliable across browsers. As I mentioned before many times, the original design of that protocol has failed the requirements of testing modern web apps today. The way how the web cover protocol works is that you have for every browser some sort of driver that can translate well-defined commands, like a click on an element, into an automation command that the browser understands. The protocol is often, and correctly, referred to as being very slow and outdated. One of the reasons is, for instance, that this translation step of the driver requires an extra request. You can think about the web protocol as if you are a developer or a test author, you're managing a browser factory. But of course, due to COVID, you have to work from home. In this case, your home does not have any telephone or internet, so you cannot contact your factory directly. You need to have an assistant to make commands in that factory. The assistant comes to you, asks you for a command, and then the assistant goes to the factory, executes the command, and at some point comes back with the result. This sometimes can take up to 200 milliseconds, if not more, particularly if you run in the cloud. On a busy day, you sometimes run over 1,000 of commands in a single test. This obviously is not really an effective way to run a factory. While a single transaction is slow, this approach, however, has the advantage to be really scalable. We have customers at SauceX that run efficient test tubes with 1,000 tests in parallel that can tremendously reduce the execution time of a whole test tube, while a single test can still be kind of slow. Another advantage is that you can automate not only factories with browsers, but literally all kinds of factories, being it maybe a mobile factory or an IoT factory. If you run, let's say, a browser factory, you have an assistant that understands the inner workings of a browser and can help you to make that command reality. If you have a mobile factory, you have an assistant that understands iOS or Android and can execute your commands in there. This idea of having this translator for executing a trivial command, like click on a button in a complex user agent, works quite well. As I mentioned before, your assistant here really is able to do all kinds of commands if it understands the factory. But the limitation is that it always has to go to the factory, execute the command, and come back. These limitations are very problematic for testing modern web apps, where you really need to introspect the whole factory itself. Luckily, the W3 working group has been kicked off a new effort to develop an updated standard that would overcome the problems that we see in automating browsers today. Based on this little thought experiment that we just did with this assistant, this new world of browser automation would look like this. You would be some sort of commander that would oversee not only a single factory, but all other domains of your ecosystem. Instead of telling one command at a time, you can give multiple commands to multiple actors at the same time while listening to different inputs from them. It makes you some sort of automation god, so to speak. It comes with quite a lot of interesting features. For one, it tries to address the capabilities by developers today who want to have introspection into network, dome, and console type of things. It will likely come with primitives to modify network data without having to use a proxy, or it will allow you to introspect the dome to handle elements flawlessly. In addition to that, it will provide you enough access so that implementers can be very creative when building test frameworks. For example, there's a bootstrap script plan that allows you to execute that kind of script before your app is loaded into the execution context. That allows you to monkey patch API or install your own automation extensions. It also gives you access to some extent to browse APIs to control the browser using native capabilities. You can finally talk to your Safari friend. But like all the web standards that we have right now, backwards compatibility is important. You will be able to continue to test legacy browsers and browsers across just one version. That said, this protocol is in the making. Like all web standards, feedback is important. If you automate a browser, which I think all of you do to some extent, come by and provide feedback. This is really important so that the new design of the protocol actually covers the use cases and requirements for everyone. Another interesting fact here is that with this effort, it will change quite a lot about how tests are being run in the future. Because instead of having one command sent and getting one response back, you will be able to send thousands of commands at a time and receive thousands of messages at a time. The amount of data exchange will increase a lot, which is particularly interesting and problematic for browser vendors like Sourcelabs, where the current business model is pretty much like you send a browser command through the internet to a VM in the cloud and get a response back. With the new web protocol, this probably won't work anymore. If you would run or would try to run this at the same scale with the same amount of browser, you could technically DDoS your own CI CDA system. Those browser vendors or cloud vendors, they need to find a solution to overcome this problem. There have been some interesting developments in this market. For one, we will likely see a change in the execution model. We won't send a single command through the internet anymore because your test lives on a different side of the world than the browser. These commands or components will be moved together. The browser vendor will likely either ask you to upload the test to the cloud or provide you a way to run the test with the browser in your own infrastructure. Another trend I've seen is that functional testing alone is not the only way how we will measure the quality of your application anymore. Vendors will be more apparent throughout the whole software development lifecycle of the application and will accompany you through development, testing, and release process. Information will be captured through all the sides of the spectrum and used to indicate what is actually wrong in your application. Lastly, with the increase of all these quality signals, we will also see an increase in new testing types. Over the last years, we already have seen development in that space. For instance, there have been development in performance testing with WebDriver. This year, we will likely see more accessibility testing tools. There will be more different testing types like that. There will be a lot of interesting developments. While the protocol will make us a lot of work as a browser vendor, it also provides a tremendous amount of opportunities to build the tooling we need to ship better application to the web. What's next? How will the future of browser automation look like? Honestly, I can't really tell for sure. I also will definitely not recommend you one framework over the other. I'm excited about all these developments that we have in this space. Even though I would like to see the WebDriver protocol as a standard to succeed, at the same time, I'm rooting for tools like Cypress or Playwright that really push the boundaries of automation and really provide so much value for the developer persona. Honestly, without these frameworks, I wouldn't see that kind of development that we have in the standard space right now. That said, if you want to improve the quality of the web, it doesn't go without standards, which is why I'm excited as well about the developments that happen in that space. For one, we have this new WebDriver protocol in the making that enables you so much more opportunities. It really combines the best of all three approaches and creates tremendous chances for implementers to build tools that effectively can test and automate web application. It is also very exciting to see that more and more web standards are shipped with WebDriver extensions. We have, for example, the web authentication API that allows you to create virtual authenticators with WebDriver. Very recently, we have seen changes in HTML that have caused the creation of a new WebDriver command to change the time zone. We have much more APIs, like the sensor API, browser reporting APIs, and permission APIs that already have WebDriver extensions. That said, creating any form of standards is usually a very slow process. You need to get people on board, you have to make sure that what you standardize makes sense for all users, and you have to write the spec. These changes won't happen overnight. It is really good to see, though, that all these developments in the tooling and standard space are taking place and are getting faster and faster. We are at the end of my talk, and I wanted to end with the following quote by Maya and James, who are developers at Mozilla. They say, an automation solution based on a proprietary protocol will always be limited in the range of browsers it can support. The success of the web is built on multi-vendor standards. It's important that test tooling builds on standards as well, so that tests work across all the browsers and devices where the web works. With that, thank you so much for giving me this opportunity to speak, and thank you all for listening. Hi, Christian. Thanks again for your talk. It was really good. What do you think of the result of the polls? I mean, I was expecting that answer. Every time you get a technical question, it depends on the right way to go. To answer it, in my opinion, yes, it depends. I guess there are a lot of ways how you can automate the browsers. Right now, the most popular ones are usually down to three, which I explained in my talk. Yes. Yeah, like I said, it depends. It's probably a much-used answer for a lot of questions. It just works. Speaking of questions, I saw we had a few questions in the Q&A channel already. Triderman, I'm so sorry if I'm pronouncing this wrong, asked, what are your thoughts on the low-code automation platforms like Tosca? Yeah, I've been exploring low-code solutions a lot recently as it becomes more and more interesting. I think it's a great way to introduce more people to testing, and these tools are also becoming more and more sophisticated that really provide valuable solutions for people that just don't code, and that's totally fine. We want to have those people also being able to test software. Yeah, in my experience, also having people with the knowledge about the product being able to help write the tests is invaluable. Okay. Yeah. Sudharsan, again, so sorry if I'm butchering the names, wrote, I prefer to run browser tests in headless mode in cloud run times like Lambda slash Google functions. I would like to know which framework has native support. Do you know anything about this, Christian? I've been experimenting with that in the Web Developer Project to be able to just run a bunch of Lambda functions in the wild and scale up to thousands without touching your own CPU. That's a really interesting thing, and I think that's something where we could explore to move forward, at least for people that might be a reliable solution that want to really test on one browser that runs in the Docker container efficiently. So let's see what the future brings. Yeah, I've never done anything like this, but let's hope the future brings a lot more to support this. The stock broker asked, how much time until native tools will eclipse these emulated and browser API reliant tools, and in what ways? I'm not sure if I get this question right. How much time until native? So I think the browser APIs will continue to develop themselves as they are usually a good way to help people debug the browser as well. So we will definitely see a lot of development there across browsers and across browser engines. Yeah, I guess maybe the question was about usage or something, but maybe the stock broker can clarify if this didn't answer his question. Guy T said, my experience with these frameworks is a lot of wait for X to exist. Is there a future where these tools can intelligently understand SPAs? Yeah, I've made that experience quite a lot myself. Yeah, I think that's a good question. I think the thing about automating a browser is that browsers in itself as a piece of software, it are really asynchronous. The idea of a browser is a really asynchronous idea. And those kinds of methods where you have to wait for an element to exist is something that you need to do to make this browser automation work. There are tools that try to help you with that. And I think tools that are really good in this will succeed as an automation framework because it's really difficult for people that don't develop browsers to have an understanding about that. And we should make this as simple as possible. And there will definitely be development like this. We see it across all the frameworks right now. I think that this kind of development will be continued as well. Yeah. Okay. Philly Fala said, are smart TV vendors and other more obscure browser vendors also part of this standardization? No, currently the web protocols focus on the web. There have been some efforts in the past to create some sort of mobile standard around automation. The problem is that we need to get these vendors on board to sit on the table and agree on those kinds of standards. And as long as that doesn't happen, it will be difficult to create a standard like this for the web. We have all agreed to like the laws of the browser vendors have agreed to work on those kinds of standards. I would love to see that in the IoT space as well. And we will see what the future brings. Okay. I think we have time for one last question. Joey Halfridge asked, will the web authentication extension allow basic ATP out? We use it to keep the bots off of sites and development, but have to turn it off to run tests. I unfortunately haven't looked too much into that web driver extension. I'm not sure. I don't think it will allow the basic off to happen, but it is something that is in the web driver by that roadmap for sure. Okay. I think all of the rest of the questions can be answered in the discussion room on where Christian will join you shortly. Thank you so much again for joining us. It was a pleasure. Thank you.
34 min
15 Jun, 2021

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic