Three Ways to Automate Your Browser, and Why We Are Adding a Fourth: WebDriver BiDi

Rate this content

A journey through overwhelming ways to automate browsers. Join Michael on a journey to see what happens behind the scenes of "await page.goto('');" et. al. See what pros and cons each of the three ways of browser automation have.

Understand why we are adding a fourth - WebDriver BiDi.

19 min
05 Jun, 2023

AI Generated Video Summary

This Talk discusses browser automation techniques, including the introduction of a new web driver. It covers the history of browser automation, different techniques for automating browsers, and the use of web APIs and browser extensions. The Talk also explains how automation tools communicate with browser drivers and the challenges of waiting for elements to appear on the screen. It highlights the differences between the WebDriver protocol and the Chrome DevTools protocol, and introduces the WebDriver Bidirection project that aims to combine the best parts of both protocols. Lastly, it mentions the WebDriver Bidi support for console monitoring and introduces WebDriver ByteEye as a stable automation choice.

1. Introduction to Browser Automation

Short description:

I'm Michael Hablich, a product manager on the Chrome team, working on reducing friction of testing and debugging web applications. Today, I'll talk about browser automation techniques and why we're adding a fourth one, web driver. Quality assurance and testing activities take up a big chunk of the software development cost, and test automation is a very good way to reduce the continuous costs of testing. Browser automation automates user interactions and pretends to be a user, with typical use cases including test automation, web scraping, and rendering part of pages like ads. Let's take a short tour of the history of browser automation, from the native APIs in the 90s to the complexities of Java applets and Flash in the 2000s.

Hi folks. I'm Michael Hablich, a product manager on the Chrome team. There, I'm working on reducing friction of testing and debugging web applications. I have the honor today to talk about browser automation techniques and why we're adding a fourth one, web driver by the way.

I spent around 20 years working in Tech already. A big chunk of this is building test automation solutions for enterprises. One can say I had a lot of fun automating browsers, .NET applications, and more niche technologies like Power Builder.

So, why I'm here? Well, the Chrome team periodically reviews the satisfaction of web developers and, surprise, testing, in particular, across browsers, is a top pain point for web developers. Quality assurance and testing activities take up a very big chunk of the software development cost, and you can't simply cut them away. QA is necessary because either your testing applications or your users are filled, and the latter has some risk attached to it. And test automation is a very good way to reduce the continuous costs of testing.

So, let's first define a bit what browser automation is about, and briefly skim how it works. Simplified browser automation automates user interactions, and pretends to the browser to be a user. Often such interactions are stored as source code, as seen on the left side. These interactions are then replayed, as you can see on the right side. Typical use cases of browser automation technologies are test automation, web scraping, or rendering part of pages like ads. Today, I'm focused on the first test automation. The previous slides showed the current state of browser automation, test defined in JSON and JavaScript. Fast and stable automation, and so on. Before we ended up in such a cozy place, a lot of history is happened. Let's take a short tour. The web was born in the 90s. People started using browsers in a limited set of big screen Testing in these decades is mostly done against data content. Browsers like Netscape Navigator or Internet Explorer were shipped. Browser automation at that time was done via native APIs. For example, I can still remember using Visual Basic 6 to automate Internet Explorer. In 1996, Java applets and Flash became a thing. They made automating webpages even more complicated because the browser automation APIs provided by the browser vendors did not work for Java apps and Flash containers. Manual testing or injecting scripts were the way to go for these technologies. In the 2000s, more browsers were joining the scenes, including Chrome.

2. Browser Automation Techniques

Short description:

Developers started building rich and interactive web experiences. Selenium and WebDriver were created to address test automation challenges, with WebDriver becoming a W3C standard. Multiple JavaScript testing libraries were introduced, using different techniques to automate browsers. We'll cover the WebDriver Protocol, Chrome DevTools Protocol, and Web APIs plus browser extensions. There are two major categories: high level, executing injected JavaScript, and low level, executing remote commands. Let's focus on the approach of using web APIs and browser extensions to build an automation layer.

Developers started to build very rich and interactive experiences on the web. YouTube and Google Map are some very good, early examples of this. With smartphones coming into the picture, needs for test automation increased because suddenly there was a requirement for cross-browser and cross-device compatibility. Selenium and the WebDriver project were born to solve the test automation challenges.

At that time it was common to write Selenium tests in Java. In 2009, Node.js brought JavaScript development to the backend. Also, it enabled running tests written in JavaScript. More JavaScript frameworks came into the picture. At the same time, Selenium and WebDriver merged into a single Selenium-WebDriver project. With the growing popularity, the project became a W3C standard in 2018, and we call it WebDriver Classic.

With more developers building richer applications in JavaScript, these developers also wanted to perform test automation in JavaScript as well. Multiple web-based JavaScript testing libraries are introduced to address the needs, and not all of them use WebDriver as the underlying automation technology. They are using different techniques to automate the browser, which we are going to talk about today. We will cover the WebDriver Protocol, supported by solutions like Selenium, Nightwatch.js, or WebDriverIO, the Chrome DevTools Protocol, CDP in short, powering Puppeteer, Chrome's own automation library, and PlayWrite, and Web APIs plus browser extensions, leveraged by Taskcafe or Cypress, for example.

Let's start and take a step back and talk about how tools automate browsers. I mentioned three major ways to automate a browser. Well, they fall into two major categories, too. Let's intensify the complexity a bit, because we have high level, which executes JavaScript injected into the browser, and low level, which executes remote commands. For example, Cypress utilizes browser extensions and Node.js to execute a test directly in the browser. To gain greater control of the browser, like opening multiple tabs, and testing for party iframes, we need to go deeper and execute remote commands. With other techniques, and let's call it simply protocols. The two common protocols are WebDriver, Chrome, and DevTools protocol, Cpp in short. We will explore all of this together shortly. No worries. I'm going to start with the approach to use web APIs and browser extensions to build your own automation layer. Essentially the solutions leverage and launch of web APIs, JS injection, browser extensions, proxies, etc., to build their very own automation layer. Going into detail here would burst the talk, size of the talk. So I'm going to stop here and segue over to WebDriver, the automation technique built upon standard. It's one of the low level protocols. So let's take a brief look how they work in principle.

3. Automating Test Cases with Browser Drivers

Short description:

Let's assume you're a web developer or a tester who wants to automate a test case. Your automation tools translate your scripts into HTTP and communicate with the browser drivers through web driver commands. The browser drivers then communicate with the browser via internal browser-specific protocols. Let's take a look at an example of a script in WebDriver I.O. Each action is translated into HTTP requests. The browser drivers handle the requests and send back the response over the same HTTP connection. However, waiting for elements to appear on the screen can be tricky, as the browser drivers are not able to notify the automation libraries. The libraries need to constantly send requests to check for the status.

Let's assume you're a web developer or a tester and that wants to automate a test case, the most common use case for browser automation. You pick a test automation tool and write tests in it. Then run these tests as part of your CI. Behind the scenes, your automation tools will translate your script and run them in the browser through some sort of protocol, web driver or CDP, for example.

Here you can see that there's an entity added between automation tools and browsers, the browser drivers. Typically these need to be installed separately to automate a browser via web driver. Your automation tools translate your scripts into HTTP and communicate with the browser drivers through web driver commands. Browser drivers then communicate with the browser via internal browser-specific protocols. Since WebDriver Classic is a web standard, it is well supported across all major browser vendors. For every new release, these browsers will update and publish a new version of the driver.

Let's take a look at some actual code. Chaseline, Chrome Tooling's excellent developer advocate, created awesome demos. Now I'm going to border them in the coming slides. So thank you, Chaseline. Let's take a look at an example here. Let's say you have a script to navigate to a page and click on a coffee to add it to your shopping cart. This is how the script looks like in WebDriver I.O. Each action will be translated into HTTP requests, for example. What happens when you set the window size is your automation tools send an HTTP post request to change the window. Here's a demo of setting a viewport size in the Safari browser. In our example, these are the three HTTP commands that happen behind the scenes. Most of the time, the browser driver handles the requests and sends back the response over the same HTTP connection. However, the second step is a little bit tricky. A lot of time you need to wait until the element is shown on the screen. In our case, the coffee is loaded from a network request. We need to wait for that before we can find and click it. Due to the nature of HTTP, the browser drivers are not able to notify the automation libraries when the coffee is ready. The libraries need to constantly send a request. Is the espresso there? Is it now? Is the element and so on. And the browser driver is going to check for the status.

4. Automation Protocols and Chrome DevTools

Short description:

The web driver is lower compared to other protocols because each classic command requires an HTTP handshake. WebDriver Classic has the best cross-browser support but has limitations on supporting some low-level controls. The Chrome DevTools protocol enables debugging and automation of websites. Puppeteer uses CDP under the hood to communicate directly with Chromium-based browsers. CDP supports more low-level controls and features like intercepting network requests and simulating device mode.

From the example above, we know that the web driver is lower compared to other protocols because each classic command requires an HTTP handshake. Obviously, this is just a simplification and there are techniques to mask and mitigate this to a certain extent.

In summary, WebDriver Classic has the best cross-browser support. However, it is lower and has limitations on supporting some low-level controls, which we will look into later.

Next, let's take a look at the Chrome DevTools protocol. From the name, you can guess that this protocol is designed to enable Chrome DevTools to debug webpages. It turns out that a lot of the features that are needed for debugging a website can also be used to automate a website.

For example, it doesn't really matter if we want to get the inner text of an HTML element in order to display it in Chrome DevTools or to verify the inner text in one of your tests. Since the protocol is fast and powerful, Puppeteer uses CDP under the hood for automation purposes. Different from WebDriver, CDP communicates directly with Chromium-based browsers. There is no browser driver needed, as they are essentially already included. Automation tools issue commands through CDP, and these commands are sent to the browser via WebSockets.

Here's a quick refresher on our demo. We're going to navigate to a page, and then click on the Espresso entry to add it to our cart. This is how Puppy The Script looks like for our coffee. It looks very similar to the previous example with WebDriver I.O. Each action will be translated into CDP commands. For example, what happens when you navigate to a page is the following. Roughly these commands here. First, we navigate to a page, then find a coffee, and third, add it to our cart. Here, for example, you can see issuing these CDP commands directly in DevTools with a protocol monitor. In case you want to try it out yourself, you will need to enable the protocol monitor in DevTools settings, but whatever.

Going back to Puppeteer and how it uses CDP to automate the browser. CDP actually uses WebSocket, therefore the communications are bidirectional by default. Once the browser completes the request, it sends the updates back to the tools automatically. No polling is needed to wait for the coffee, so there is no artificial delay. Also, since CDP is designed to cover all debugging needs, it supports more low-level controls compared to WebDriver. It supports features like intercept network requests, simulate device mode and geolocation, as well as getting console messages and so on. Back in time when the WebDriver protocol was developed, there was no need for low-level control,. However, times have changed and testing now requires more and more fine-grained actions.

5. WebDriver Bidirection Project

Short description:

Although CDP is fast and with low-level control, it only works in Chromium-based browsers, and it's also not a standard. WebDriver is relatively slow and not low-level enough. The WebDriver Bidirection project aims to combine the good parts of both protocols, offering bi-directional messaging, low-level controls, cross-browser support, and standardization. It complements WebDriver and reduces the need for automation libraries to directly use CDP. However, the WebDriver protocol is still a work in progress, with specifications and implementations being finalized by browser vendors and test automation library vendors.

Although CDP is fast and with low-level control, it only works in Chromium-based browsers, and it's also not a standard. So, what we can see is both browser automation protocols have their drawbacks. CDP is not a standard and browser-specific. WebDriver is relatively slow and not low-level enough. But both protocols also offer unique benefits.

CDP is fast and bi-directional. It provides low-level control over the browser. WebDriver is built for testing and the standard and it's supported cross-browser. So, what if we only take the good parts from both protocols? That is what the WebDriver Bidirection project is about. It is a new standard for browser automation, combining the good parts of WebDriver and CDP. It has bi-directional messaging, low-level controls, cross-browser support and standardization.

It's built for testing and not for debugging. This diagram looks very similar to what you have seen already for CDP and WebDriver. Well, the reason is because WebDriver Bidirection works like a combination of both of course. WebDriver Bidirection commands are sent via WebSocket connections. The receiver can be a classic driver like Chrome driver or the browser directly. A few things are important to note here though. WebDriver Bidirection should complement WebDriver. Bidirection commands and classic commands can be run next to each other. There's no need for a big migration. I will talk about this in a few minutes in a little bit more detail. Also, the need for automation libraries to directly use CDP should drastically decrease. In fact, we are working on making Puppeteer also use Bidi instead of CDP directly under the hood. As mentioned before, WebDriver Bidi combines the benefits of CDP and WebDriver. It's important to note that this new standard is a collaboration of various browser vendors, test frameworks, and test infrastructure providers.

Now, the bad news. There is always bad news, right? So, the WebDriver protocol is still a work in progress. Browser vendors and test automation library vendors are still working on finalizing the specifications and the implementations. If you are interested in more detail, check out the handy QR code, which will open a dashboard showing browser support for WebDriver Bidi. Enough of the bad news, though.

6. WebDriver Bidi and Console Monitoring

Short description:

WebDriver Bidi is still in progress, but parts of it are already being shipped incrementally. Automation libraries like Selenium, WebDriver IO, and Puppeteer have initial Bidi support. WebDriver Bidi allows monitoring of console messages, which can be useful for testing and error detection. A demo shows how to set up the browser, enable the web socket URL, and monitor log messages with ByteEye. This functionality works in both Firefox and Puppeteer. WebDriver ByteEye combines stable automation and is a good choice for browser automation. You can try it out today if your browser and test automation library support it, and provide feedback.

Let's talk about the good news. They are more important anyway. I just told you WebDriver Bidi is still in progress. That's true. We are shipping in parts of WebDriver Bidi incrementally, though, which means you can actually already start using it today. Automation libraries like Selenium, WebDriver IO or Puppeteer have landed initial Bidi support.

What does this mean in practice for you as a web developer? How can you benefit from this today? Well, if you look at the link I provided before, where one can track cross-browser support of WebDriver Bidi. It turns out that WebDriver Bidi has the capability to monitor console messages. That capability is part of the Red Circle log entry there. How is this going to be useful? Let's show it again with a demo provided again by Jesslyn. Let's go back to a previous example where we are ordering a coffee. So it's common that we call some additional analytics API when a user does some action like adding coffee to the cart. And we want to make sure that these APIs are called to the test. And we want to monitor for error surface to the console. So in order to find out if something is going wrong.

This demo here is going to use Web Drive AO as an example. First, we need to set up the browser and enable the web socket URL in the settings. In this case, I use Chrome here, but it works in Firefox as well. So after the setup, we can start to monitor the log messages with ByteEye. There are two parts of the code that make it happen. First, you need to subscribe to the log event in the session. And then you can listen to the event and handle it. In this case, we would just lock the data, but it can do a lot more with it of course. So this is a demo, which are running Firefox. Observe the output. The console message and the error from the webpage are captured successfully and processed. This also works in Puppeteer, which is also starting to use WebDriver, ByteEye under the hood. Here you can see the same thing done in Puppeteer. Launch Puppeteer with the WebDriver ByteEye protocol. Then you're set to monitor the event with ByteEye instead of CDP.

So let's wrap it up. Browser automation is hard. There's not the one way to automate the browser. WebDriver ByteEye combines the benefits of the most stable automation which in my completely unbiased view, make it clearly a good choice. You can try out the first slices of WebDriver ByteEye already today, if your browser and the test automation library supports this, of course, and if you're doing that, don't forget to give us feedback.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

TestJS Summit 2021TestJS Summit 2021
33 min
Network Requests with Cypress
Whether you're testing your UI or API, Cypress gives you all the tools needed to work with and manage network requests. This intermediate-level task demonstrates how to use the cy.request and cy.intercept commands to execute, spy on, and stub network requests while testing your application in the browser. Learn how the commands work as well as use cases for each, including best practices for testing and mocking your network requests.
TestJS Summit 2021TestJS Summit 2021
38 min
Testing Pyramid Makes Little Sense, What We Can Use Instead
Featured Video
The testing pyramid - the canonical shape of tests that defined what types of tests we need to write to make sure the app works - is ... obsolete. In this presentation, Roman Sandler and Gleb Bahmutov argue what the testing shape works better for today's web applications.
JSNation 2022JSNation 2022
21 min
The Future of Performance Tooling
Our understanding of performance & user-experience has heavily evolved over the years. Web Developer Tooling needs to similarly evolve to make sure it is user-centric, actionable and contextual where modern experiences are concerned. In this talk, Addy will walk you through Chrome and others have been thinking about this problem and what updates they've been making to performance tools to lower the friction for building great experiences on the web.
TestJS Summit 2022TestJS Summit 2022
27 min
Full-Circle Testing With Cypress
Cypress has taken the world by storm by brining an easy to use tool for end to end testing. It’s capabilities have proven to be be useful for creating stable tests for frontend applications. But end to end testing is just a small part of testing efforts. What about your API? What about your components? Well, in my talk I would like to show you how we can start with end-to-end tests, go deeper with component testing and then move up to testing our API, circ
TestJS Summit 2021TestJS Summit 2021
31 min
Test Effective Development
Developers want to sleep tight knowing they didn't break production. Companies want to be efficient in order to meet their customer needs faster and to gain competitive advantage sooner. We ALL want to be cost effective... or shall I say... TEST EFFECTIVE!But how do we do that?Are the "unit" and "integration" terminology serves us right?Or is it time for a change? When should we use either strategy to maximize our "test effectiveness"?In this talk I'll show you a brand new way to think about cost effective testing with new strategies and new testing terms!It’s time to go DEEPER!

Workshops on related topic

React Summit 2023React Summit 2023
151 min
Designing Effective Tests With React Testing Library
Featured Workshop
React Testing Library is a great framework for React component tests because there are a lot of questions it answers for you, so you don’t need to worry about those questions. But that doesn’t mean testing is easy. There are still a lot of questions you have to figure out for yourself: How many component tests should you write vs end-to-end tests or lower-level unit tests? How can you test a certain line of code that is tricky to test? And what in the world are you supposed to do about that persistent act() warning?
In this three-hour workshop we’ll introduce React Testing Library along with a mental model for how to think about designing your component tests. This mental model will help you see how to test each bit of logic, whether or not to mock dependencies, and will help improve the design of your components. You’ll walk away with the tools, techniques, and principles you need to implement low-cost, high-value component tests.
Table of contents- The different kinds of React application tests, and where component tests fit in- A mental model for thinking about the inputs and outputs of the components you test- Options for selecting DOM elements to verify and interact with them- The value of mocks and why they shouldn’t be avoided- The challenges with asynchrony in RTL tests and how to handle them
Prerequisites- Familiarity with building applications with React- Basic experience writing automated tests with Jest or another unit testing framework- You do not need any experience with React Testing Library- Machine setup: Node LTS, Yarn
TestJS Summit 2022TestJS Summit 2022
146 min
How to Start With Cypress
Featured WorkshopFree
The web has evolved. Finally, testing has also. Cypress is a modern testing tool that answers the testing needs of modern web applications. It has been gaining a lot of traction in the last couple of years, gaining worldwide popularity. If you have been waiting to learn Cypress, wait no more! Filip Hric will guide you through the first steps on how to start using Cypress and set up a project on your own. The good news is, learning Cypress is incredibly easy. You'll write your first test in no time, and then you'll discover how to write a full end-to-end test for a modern web application. You'll learn the core concepts like retry-ability. Discover how to work and interact with your application and learn how to combine API and UI tests. Throughout this whole workshop, we will write code and do practical exercises. You will leave with a hands-on experience that you can translate to your own project.
React Summit 2022React Summit 2022
117 min
Detox 101: How to write stable end-to-end tests for your React Native application
Compared to unit testing, end-to-end testing aims to interact with your application just like a real user. And as we all know it can be pretty challenging. Especially when we talk about Mobile applications.
Tests rely on many conditions and are considered to be slow and flaky. On the other hand - end-to-end tests can give the greatest confidence that your app is working. And if done right - can become an amazing tool for boosting developer velocity.
Detox is a gray-box end-to-end testing framework for mobile apps. Developed by Wix to solve the problem of slowness and flakiness and used by React Native itself as its E2E testing tool.
Join me on this workshop to learn how to make your mobile end-to-end tests with Detox rock.
Prerequisites- iOS/Android: MacOS Catalina or newer- Android only: Linux- Install before the workshop
TestJS Summit 2023TestJS Summit 2023
48 min
API Testing with Postman Workshop
In the ever-evolving landscape of software development, ensuring the reliability and functionality of APIs has become paramount. "API Testing with Postman" is a comprehensive workshop designed to equip participants with the knowledge and skills needed to excel in API testing using Postman, a powerful tool widely adopted by professionals in the field. This workshop delves into the fundamentals of API testing, progresses to advanced testing techniques, and explores automation, performance testing, and multi-protocol support, providing attendees with a holistic understanding of API testing with Postman.
1. Welcome to Postman- Explaining the Postman User Interface (UI)2. Workspace and Collections Collaboration- Understanding Workspaces and their role in collaboration- Exploring the concept of Collections for organizing and executing API requests3. Introduction to API Testing- Covering the basics of API testing and its significance4. Variable Management- Managing environment, global, and collection variables- Utilizing scripting snippets for dynamic data5. Building Testing Workflows- Creating effective testing workflows for comprehensive testing- Utilizing the Collection Runner for test execution- Introduction to Postbot for automated testing6. Advanced Testing- Contract Testing for ensuring API contracts- Using Mock Servers for effective testing- Maximizing productivity with Collection/Workspace templates- Integration Testing and Regression Testing strategies7. Automation with Postman- Leveraging the Postman CLI for automation- Scheduled Runs for regular testing- Integrating Postman into CI/CD pipelines8. Performance Testing- Demonstrating performance testing capabilities (showing the desktop client)- Synchronizing tests with VS Code for streamlined development9. Exploring Advanced Features - Working with Multiple Protocols: GraphQL, gRPC, and more
Join us for this workshop to unlock the full potential of Postman for API testing, streamline your testing processes, and enhance the quality and reliability of your software. Whether you're a beginner or an experienced tester, this workshop will equip you with the skills needed to excel in API testing with Postman.
TestJS Summit - January, 2021TestJS Summit - January, 2021
173 min
Testing Web Applications Using Cypress
This workshop will teach you the basics of writing useful end-to-end tests using Cypress Test Runner.
We will cover writing tests, covering every application feature, structuring tests, intercepting network requests, and setting up the backend data.
Anyone who knows JavaScript programming language and has NPM installed would be able to follow along.
TestJS Summit 2023TestJS Summit 2023
148 min
Best Practices for Writing and Debugging Cypress Tests
You probably know the story. You’ve created a couple of tests, and since you are using Cypress, you’ve done this pretty quickly. Seems like nothing is stopping you, but then – failed test. It wasn’t the app, wasn’t an error, the test was… flaky? Well yes. Test design is important no matter what tool you will use, Cypress included. The good news is that Cypress has a couple of tools behind its belt that can help you out. Join me on my workshop, where I’ll guide you away from the valley of anti-patterns into the fields of evergreen, stable tests. We’ll talk about common mistakes when writing your test as well as debug and unveil underlying problems. All with the goal of avoiding flakiness, and designing stable test.