1. Introduction to Selenium 4.0
Hello, everyone. Today, I'll talk about Selenium 4.0 and its new features, including relative locators, new window APIs, and event-driven code. Stay tuned for the exciting features in Selenium Grid.
Hello, everyone. And welcome to my talk for today. I'm going to be talking about Selenium 4.0 which was only released a couple weeks ago. But before we get into that, I would like to kind of just tell you who I am.
My name is David. I head up the open source team at Browserstack. I'm a Selenium Core contributor, I'm a co-editor on the WebDriver specification within the W3C. And I am the chair of the browser testing and tools working group.
So, whenever things want to be standardized, they tend to come through my working group where we try our best to make sure that all the browsers are going to support you, the end user. And here is our agenda for today. So, I'm going to be talking about what is Selenium 4. Some of the new features that have gone into it, like relative locators and new window APIs, there's the new ability to print pages, and then I'm going to spend a little bit of time talking through the new event driven code, which allows you to instead of having to poll for how your test should work with Selenium, you can now get events emitted to you and carry on with your tests. These are really cool new features. And at the end, I'll just finish off on some of the really great new features within Selenium Grid.
2. Selenium 4.0 Overview
Selenium 4 has been an amazing amount of work that took five years and over 4,400 commits from contributors around the world. The main change is that you'll be able to just drop it in and things should just work. There will be deprecation warnings, but they should not be scary. We're preparing for Selenium 4.1 and beyond.
So Selenium 4 has been an amazing amount of work that has gone into it. Think of it kind of like building a city. It took five years for us to get this out, and over 4,400 commits from numerous contributors around the world. We've rewritten large parts of the code base. We've deprecated large parts of the code base. And so whenever you start using it, it's going to be a big change, but the main change that you won't notice, really, is that you'll be able to just drop it in and things should just work. This is one of the key reasons spent so much time focusing on the little bits to make that easier. So, when you change, there will be a number of deprecation warnings that you have to work through, but none of them should be scary at all. It's all about kind of making sure that we're ready for whenever we go to Selenium 4.1 and beyond.
3. Relative Locators in Selenium 4.0
One of the really cool new features in Selenium 4.0 is relative locators. With relative locators, you can easily find other elements on the page based on the location of a known element. By using technology developed with Sahi and ThoughtWorks, Selenium looks at the bounding box of elements and their proximity within the DOM to perform relative locating. The returned list of elements starts from the closest to the furthest away.
So, with that, I'm going to start talking about some of the really cool new features. One of the ones that I'm really excited about is relative locators. The idea here is that, if you know where one element is on the page, you can start looking for other elements on the page quite simply. So, in this case, if you wanted to start in the center of the board here, it says an open process. Selenium is an open source project. We want to do everything we can in the open. And now we want to find anything to the top left oh, sorry to the top right that might have open in it. So, we would the new ability would allow us to find an element starting with open process and then find anything to the top right that might have a link text that says open. Or partial link text. And we start looking through the DOM to find what we can. We do this by using technology that was first developed with Sahi and another project that's ThoughtWorks managed. And it kind of just tries to do relative locating. It does this by looking at the bounding box. So, the little squares that each element on the page creates and sees how far away it is. It also does a little bit of looking within the DOM to see if it's close to you within the DOM itself. Just to try to speed up some of the returns. And then when you get back a list of elements, they normally start from closest to furthest away in case you have multiple.
4. Selenium 4.0 Features
Relative locators are a cool feature, but be careful with page reflows. Selenium 4 introduces new Windows APIs, allowing the creation of tabs and communication between them. Another feature is the ability to print pages, creating PDFs for automation purposes. Browser vendors also benefit from this feature for testing their print capabilities.
This is a really, really cool feature. But I would be very careful in using it in that if your page reflows, or kind of, especially if you're going from desktop to mobile and you've got a smaller area to test on, things might not be in the same relative location. They might move down, they might move around. So just be aware. But it's a really cool feature when you do start using it.
The next set of APIs that I want to talk about is the new Windows APIs. For years, people have wanted Selenium to kind of create new tabs. They wanted to be able to communicate from one tab to another, or the window to another. And Selenium's not really allowed that because we were never fully in control of the browser enough to be able to do this. Working with browser vendors, we found a way that we can do this. We learned a lot along the way. Windows, one of those weird terms in a browser that kind of means so many different things. This is one of the key problems that we had with the Selenium when trying to support these. If you ever did something and it should have opened a new tab, you would have noticed we tried to force it to open a new window. You would be able to create new tabs and windows and move around between them using the existing switch to window APIs. This can be really, really cool. We have the ability to create a specific window. So, that would be a window with its own tab. If you wanted to just be in a tab, you could go open a new tab. This is available only in Selenium 4, but it will work with anything ‑‑ any of the drivers that have it moving forward, which is definitely Chrome and Firefox. Safari, I know, will be having it soon if it doesn't have it yet.
The next feature that I ‑‑ we noticed a lot of people really wanting was the ability to print pages. Printing can be super hard to test against. And so, people were trying weird and novel ways to print it. They would use Selenium with a robot framework or kind of invoke new and wonderful ways to print it. But now, with the new page printing APIs, you can create PDFs. These will be returned from a driver using Base64 encoding so that you can compile them and save them to disk or if you want, just kind of reencode it into a PDF document and use it all in memory. We notice a lot of people wanted this feature just for basic automation. Not necessarily for their testing, but for basic automation. Browser vendors wanted this feature because they wanted it to test their new print features.
5. Event-Driven Code and New Features
And also test CSS, because they wanted to make sure that if something looks the same, they're already doing their kind of image comparison testing. But this was another way to kind of make sure that CSS worked the same between different browsers. So this is a really cool feature. I hope you use it. And if you hit any bugs, do let us know.
So learning from what users want, because at the end of last year the Selenium Project did its first-ever survey and we listened to what people wanted, and we've hopefully implemented some of the really cool features that they wanted. We've added the ability to get around basic authentication, digesting, so that whenever you have a website that's got basic authentication will handle the URL and allow you through. In the past, if people wanted to find wait for a certain element to have a mutation on the DOM, they would have to poll the driver and say, has this changed? Has this changed? Now, with the new APIs, you can set up a mutation listener. So, when it mutates, you can send a message back to your tests and you can carry on straight away. Which is a really cool feature. Hopefully it will make your tests a little bit less flaky.
6. Selenium 4.0 Features Continued
They're only in the slides because I don't have a lot of time to talk to that. So, I can't show you them. But hopefully, you'll get a sense of how simple kind of adding these features are to your code.
So, here's basic authentication. The main change for kind of Selenium users is that we have to create a CDP connection. The CDP connection is using the Chrome debug protocol underneath to be able to speak to the drivers. And in this case, we create our connection. Whenever we want to do basic authentication, we would register that we are likely to get something like this. So, this tells us internally to speak to the browser that if you hit authentication, these are the things that we want. So, in this case, the first argument is admin, and that is your username. In this case, the password is also admin. Not the best security, but anyway. And we've got to use that connection, which is a WebSocket connection into the browser. We do what we know, and then hey presto, we get through. So, it's not really a lot of change. And we need to do it in this way so that we can kind of make sure we set up the right connections along the way.
For DOM mutations, it's not that different. In this case, we're going to create our connection just as before. And then whenever we get a a log mutation event, it will be put into, kind of, into a listener, and then we carry on our tests. So, in this case, I wanted to make sure that whenever, after I clicked a button, that the display was no longer none, that it actually had a proper value. And so, in our test, we're just going to click, or find the element, click it, find the revealed element, in this case, and then make sure that we wait for it to be in the right state and then we can do our, all our assertions. So, again, we've tried to make this API as simple as possible.
7. Advancements in Selenium for Improved Testing
Adding listeners is the only thing you need to do. Networking interceptions allow you to modify HTTP responses. The Selenium grid has been re-architected for improved scalability and observability. Docker images and Helm scripts make scaling out easier. The code has been improved, and help messages are available for users.
And so, just adding your listener is the only thing you need to do. Otherwise, your test would just be as normal. And that's what we see at the last, like, three lines of code.
The next one is what I think a lot of people have been really excited about, which is networking interceptions. Again, we create our connection, and every time we have an intercept message come back, and so, this one you need to be aware of. If you have a page that's doing a lot of network calls, the driver in the code is going to be doing a lot of speaking to each other, and so, this might slow down your tests, but if it gives you what you need for certain tests, it's probably worth it. You would just create your HTTP response, of like, in this case, it will be what URL you're looking for that needs to be changed. You can add headers, you can change the body, you can even do redirects if you wanted, or if you wanted to insert images or anything, you can do that, and then, whenever it happens, you'd be able to change it. So, this one I think is really cool. I've seen a lot of cool things that people have been doing, and I've been playing with it quite a lot.
And then, finally, the Selenium grid, which I think a lot of people use to help scale up their tests, has been re-architected for the future. There's no longer weird wonderful HTTP calls going across to different nodes. It's been built with the future in mind. It's using modern technologies like event buses and things like that. It also has observability built into it. So, if you've been using Jaeger or things like that to be able to see how your code is, you'll be able to, using the documentation on the Selenium project, be able to integrate that with your tests. So, you can see when a test did a certain thing, how that responded internally in your system. And you'll be able to track it all the way through. So, if you ever get those weird and wonderful errors in the backend, hopefully the new tools from Selenium will be able to solve that with you. There's improved scalability with the new Docker images that have been created and Helm scripts. So, if you wanted to scale out really quickly, you can. And these are maintained by the Selenium project. Now, sometimes this is going to be a lot of work, but hopefully we've made it a lot easier to get started so that you can scale out to what you need. These should just work on kind of all your cloud providers like Azure or AWS and do whatever you need to. We'll be improving this as we move forward. So, if you do hit any issues, please do raise any bugs. Then there's a lot of little things that we've done in the background, of kind of just making sure the code is better. There's also a lot of ways that it kind of just helps users. So, if you ever get stuck, the best way to do it is just get the Selenium server to print out help messages along the way and you'll be able to hopefully get started.
Selenium Server Help Messages and Testing Tabs
If you ever get stuck, the best way to do it is just get the Selenium server to print out help messages along the way. I'm pleasantly surprised with the results and it's going to help me with the work I want to do over the next few months. It's important to recognize that a real mobile device is more than just shrinking a browser to the right size. We need to test multiple tabs open and moving between tabs was always forced into a brand new window.
So, if you ever get stuck, the best way to do it is just get the Selenium server to print out help messages along the way and you'll be able to hopefully get started. And, with that I think I've come to the end of my talk.
I'm looking forward to your questions, so send them in. What do you think about the results? I'm pleasantly surprised. It was what I was hoping for. And, because, like, I know a lot of people when they think mobile, they kind of think, like, shrinking a browser to the right size might be good enough. And, that's why I put the word real in there, as a real mobile device. And, so, hopefully people notice that nuance. But, I'm pleasantly surprised and it's going to help me with some of the work that I want to be doing over the next, like, three to four months. So, this is awesome.
That's really nice. And, I'm really happy that we could help in a way. And, definitely, I would add more input there that we really, really need to test. I do agree with you that it's not fun when everyone considers the browser just a window that opens so easily and not the real browser that we have around. And, yes, Nick Vick, you know I'm a big fan of the mobile browsers. I cannot lie. We do have already a few questions. I do want to remind you all that you can add your question on Discord. And, I'm going to take them as soon as I see them. And, let's start with one. Leas was wondering...
Sure. ...from the talk. So, we can use a test flow and test multiple tabs open. Meaning if I need to test something that opens in a new window and then proceed to test the new tab, it will work? Yes. Not window-tab, after I correct it. Yeah, yeah. So, it should work. And, one tab. So, historically, like with Selenium, if you wanted to move between tabs, it was always forced into a brand new window.
Handling Windows, Printing, and Endless Scrollers
The idea of a window in browsers can be confusing, but Selenium 4.0 aims to fix common misconceptions and make it easier to work with windows and tabs. In the past, automation had to be creative, such as creating full pages with just a button. Selenium 4.0 respects media print and ensures better interoperability between browsers. It only prints the currently visible part of a page, avoiding the complexities of endless scrolling. Endless pages, like those on Twitter, present interesting testing challenges.
Because if you've worked on browsers, like, Yohanna, you know this, the idea of a window is overloaded so much. Because a tab is a window. A window of tabs is a window. And, so, you get all these common misconceptions. And, so, a lot of that's been fixed. So, hopefully, it fits people's mental model around windows and moving between windows and creating new windows so that they can do it much easier. And, so, you can create those new workflows much better.
Oh, that's useful. And you mentioned, like, everything being a window. I remember when we started doing automation back, like, a long time ago when we didn't have proper tools, we would create a full page with a button to be sure that our browser would touch a button. We was loading just a button. Oh, my God. There was some creative automation days, yeah. Yeah.
Okay, going to Mark's question, and it's about the print pages. Is it rather a screenshot or does it respect media print? So, it should be respecting media print, and, if it's not, that would be a bug. And, so, the main reason why it was created was because browser vendors wanted to have better interoperability between what a print looks like, between different browsers. And so, in that way, like, all your styling and everything, so the media print should be exactly the same. And, if it's not, then there's interrupt issues, and that's why they needed to fix that. So, hopefully that answers your question.
Yeah, well, let Mark let us know in the discord again. And he had a second question. How does it handle endless scrollers? Or is it just the currently visible part? So, it's the currently visible part. So, like, if you were on, say, Twitter, and you press print, you would only see what is kind of been downloaded and rendered. It's not going to be continuously rolling. So, this is why it's also not using the screenshots, because in some browsers, we try to do full page screenshots. And the way we do that is by scrolling the screen and stitching it all together, and it's not pretty, especially on an endless scrolling page, just like Twitter. And so, it should just kind of what you see is what you get in the page. Yeah. The endless pages are really interesting to test about when you get to an end.
Event-Driven Code and Basic Authentication
Yeah. I'm not sure about interesting, but yeah. Yeah, exactly. Like interesting in the not funny.
INES ENVIDA SORRENTO Indeed. And you do a lot of work about this and your presence on social media, I know for a fact. So, don't hesitate to contact David and let him know what you find there. And definitely will probably help them too.
Okay. Refector, Eric, I think I identify the name, or just maybe a full sentence there. Those examples, like basic authentication, with the create this CDP connection call, do they work with any other browser than Chrome or Chromium?
DERMOT HAZLETT So, at the moment, it's Chromium browsers. The Mozilla team are trying to add new features as we go through. I don't think basic authentication works there yet, because it relies on a CDP domain called fetch.
Differences between Fetch and Selenium
There are slight differences between fetch in the DOM and Selenium. Browsers are working on improving this, especially Mozilla, Safari, WebKit, and contributors like Egalia. The goal is to use true browsers instead of manipulated ones.
So, I don't want it to be confused with fetch in the DOM. These are slightly different ideas. Again, conflating the same name everywhere is what browsers do well. And so, they haven't fully implemented that, but I know that the Mozilla team are working quite extensively in improving this. And I know that moving forward, the Safari and WebKit and then people like Egalia, who kind of contribute to WebKits, are looking at how they can improve WebKit for some of these. So, where possible, we're going for true browsers rather than kind of like browsers that we've manipulated along the way, like some of the frameworks do. It just broke my heart to be there, but okay.
Selenium 4 Mobile Testing and Cloud Services
Selenium 4 mobile testing can be used with cloud services like Sauce Labs and browser stack, but it is also possible to interact with Firefox on Android locally using a real device or emulator. Documentation improvements are ongoing, and while cloud services offer convenience with multiple devices, they are not essential.
You said you are working on it, I'll take that as a hope. Oh, yeah. It's my old team doing it and they're doing a fantastic job. Oh, that's nice. That's even more emotional.
Is the Selenium 4 mobile testing only used with cloud services like Sauce Lab and the browser stack? Sorry, could you repeat that question again? Is the Selenium 4 mobile testing only used with cloud services like Sauce Labs? So, I can't talk for Sauce Labs. Obviously, I don't know where they are in their release cycle. But Selenium 4 should be available on browser stack. You can use it there. However, having worked through the Selenium 4 release, Mozilla were very keen to kind of improve the how you interact with Firefox on Android. And so, that should be handled you could do that locally if you had like a real device or like if you had an emulator, you just kind of tell Gecko driver roughly where to go and it should do it. I appreciate we need to improve the documentation around here, and this is partly why, where my question was coming from is that like the initial parts are there, but we are going to be improving it along the way. Cloud services do make life easier because you can get multiple different devices, but you don't necessarily need them.