Challenges of Testing and Monitoring WebRTC Applications


The pandemic accelerated and ushered in the era of digital transformation. We’ve all been indoctrinated in conducting video calls with others from virtually any device and location.

WebRTC is a centerpiece of this world, enabling users to reach out to one another directly from the comforts of their browsers and applications.

In this session, Tsahi explains what exactly is WebRTC, how it is different from other web technologies and how you can assure that your WebRTC application gets adopted and successfully used by your clients.



Hi, I'm Tzachi Levent Levy, CPO at Spearline. I want to talk to you about the challenges of testing and monitoring WebRTC applications. Now if we want to talk about that, we need to start with the question, what exactly is WebRTC? Especially in this type of a conference where what we do is talk about testing and JavaScript testing in web environments. And for me, WebRTC, this is usually what I would use as a definition. WebRTC is an HTML5 specification that's used to add real-time media communications directly between browsers and devices. Okay, so if I'm going to communicate with someone through the Internet, inside the web or inside the web browser, I'm going to use WebRTC in order to send and receive media in real time. This is what WebRTC is for. It is a set of APIs that are standard-based that are available in each and every browser out there today. How are calls exactly made with WebRTC? We've got two people here. One is using a browser, and the other one might even use a mobile application or a mobile browser. There is a website of the application that they use in order to communicate with each other. The guy on the left here is going to send a message to the application server, and he's going to say, you know what? I want to invite that person to talk. Here is my offer. The server is going to look at the message. He knows where to find Brownhead on the right, and he's going to forward that message to him. Along the way, he can make changes to this message if he wants to. Now, the guy on the right here, Brownhead, received that message. He knows that someone is inviting him to a call. He also knows that this is Redhead on the left, and he knows what that person wants to do and what type of codecs, for example, to use in the call. Codec is what we use to compress, to encode and decode audio and video over the network. So he's going to accept the call by sending an answer. This answer, again, is going to go through the server to the other browser. Applentine, now, everything here is not really related to WebRTC, and it's just how things happen on the web. The magic of WebRTC comes next, and this is where actual real-time media is being sent directly from one user to another, and vice versa in real-time interactively. Now they can communicate directly with each other just by running these few messages before starting the WebRTC session. This is the only time and instance that the browser can send a message directly to another browser and not go through a server. Now, WebRTC sits somewhere between VoIP, voice over IP, and the web, the Internet as we know it today, web pages. And that causes a lot of issues. That's because we've got here two different disciplines that are competing with each other, and it begs the questions for people like us that deal with testing. And that's, what exactly do we test and monitor? Are we going to use tools for VoIP testing and monitoring, or are we going to use web tools for that? And the answer is, both, and it depends, or it's complicated. So let's see what are the challenges that we need to deal with. And we start by understanding who the lead actors are and understanding that they're out of our control as developers or as testers. First and foremost, we have the application. We've written it. We know everything about it. We're in control of that application and every piece of line of code in it. Then there is the users. The users use their own devices and their own peripherals, for example, to actually do this session, I'm using a Rode microphone, which is not the built-in microphone in my machine because it's better for these things. And I've got headsets because I need them to communicate with you guys. Then there's the network. You don't control the network. You don't own it. You don't know where the user is. In my home, because I do that all the time, I'm connected directly to Ethernet with fiber to the home, so I know that I've got a very good network. Other people might actually do these calls or try to do them from an elevator, a basement, or on the way riding a car. And that's going to affect what we're going to do. Last but not least are the browsers. You're already used to the fact that browsers are out of our control, but for people doing voice over IP, this is new. It's different, and they need to deal with that. Let's see how we're going to deal with each and every one of these challenges, and we'll start with the browsers. As you know, browsers update quite frequently, most of them every month. In that cadence, we're going to get a new release of Firefox, of Edge, and of Chrome. Safari ends up being updated once every one to three months. As a developer, you can't go to Google and tell them, look, I've got this bug that I need to solve that I know that you're going to introduce in the next Chrome release. Can you please delay that release until I fix that bug? Not going to happen. They're just going to dump a new release of Chrome into the market, and that's going to be the end of it. Now the problem with that is that with WebRTC specifically, each and every browser release is slightly different. They're still working on the implementation of WebRTC and of how that works over the network. That means that it is going to break your application probably once or twice a year, especially if you're going and using a lot of different features of WebRTC itself or taking it to the Edge, to the extreme abilities of WebRTC in browsers. So the first thing we need to do is to decide or agree that we need to do browser automation in terms of testing. Why? Because we want to be able to repeat tests all the time and very frequently. In order to do that, we're going to use either Selenium or Puppeteer, which allows us to test and simulate and automate the scenarios that we have using real browsers. In Voice over IP, for example, most of the testing is done using testing tools that are built to fit and are proprietary in nature. Most of the testing tools available in WebRTC are tools that are actually running on top of browser automation like Selenium and Puppeteer. Then we need to talk about networks. And networks are complicated. This is true for everything running on the web, but WebRTC is different because things happen in real time. If I send media, I expect it to be received on the other side in a sub-second latency. I'm sending it. I want it to be there on the other side in 100 milliseconds, 200 milliseconds, 500 milliseconds at most. Otherwise, it can't be interactive. Doing it at so low latency and at high bit rates, especially if it's video, means that there are things that are going to affect me that don't really affect web pages as much. That's going to be the bandwidth, the packet loss, and the jitter. I need to understand how much bandwidth I have available and then play with the bit rate that the codecs use to encode and send the media in order to meet these limitations that I have. I need to be able to deal with packet losses that happen on the network. If I send a web page to someone from a server, if he misses some of the packets of that web page, that's not a big deal because then there is a retransmission that is taking place anyway. You're not even aware of that. You might get the web page 200 milliseconds later. But if I'm trying to actually talk to someone and there is packet losses, I can't even retransmit. There's not enough time to that. So how I deal with packet losses is vastly different with WebRTC. And then there is jitter, the rate at which packets arrive versus how they are sent over the network. The more jitter there is, the harder it is going to be to actually play it back on the other end. Now, all of these things that we need to deal with on the network need to come with another aspect in mind. The first one is geographies. If I have two people on a call, one of them is in Paris, the other one is in the United States, it would be wise if the server that connects them would be somewhere in between there and not in India. If we start playing with geographies, we need to understand where our users are and we need to test where our users are. Then there's the part of the last mile. And I already alluded to that briefly in the past. If the user is far away from the access point, there is going to be a reduction in the bit rate, an increase in the packet loss and in the latency, and the quality is going to degrade. We need to test for that and see what happens there. If the last mile is ADSL or a 3G cellular network or LTE network or just Wi-Fi, it has different characteristics to it. And we need to understand how our application works there. Why? Because we need high bandwidth sometimes in large group calls, for example, and we need to see what happens if we don't have enough bandwidth on the network. How our application is going to behave in these cases, how is it going to degrade the actual look and feel for the users? Is it going to remove some of the people from being viewed or is it going to reduce the bit rate for everyone and reduce the quality for everyone in the same way? So these are things that we need to be able to test for. Last but not least, all of these changes are dynamic in nature. I might have high bandwidth now and low five minutes into the session that I'm doing using WebRTC. Why? Because my son decided that he wants to play Fortnite in parallel to me doing this call and I don't have enough bit rate on the network. Or because someone else is happening in a shared environment, in a shared network, either locally or remotely. So what are we to do with all of these things? We simply need to take them into account. And that means that in our testing of WebRTC applications, we need to think about them globally and locally at the same time. We need to test for multiple regions and multiple locations, not only for multiple calls. So I want the calls to be in the US now and now I want them in Europe and now I want them in India. No, I want to do a group call with two people joining from the US, two from Europe and one from India and see what happens now with such a large group call. The other thing is that we want to have controls over the device that is being tested for its local network behavior. I want to be able to limit it by bit rate. I want to be able to inject packet loss or jitter into that device, the time testing. And I want to do it dynamically. So saying things like, let's start with no limits, wait for two minutes. Now we're going to reduce the bit rate to 100 kilobits per second. See what happens for two minutes. Then I'm going to remove the limit again. And I'm going to check how much time it is going to take the service to ramp up again to the highest bit rate it can go in the network. Is that going to take five seconds, 30 seconds, one minute or never? Some of it are bugs like never, and some of it can be optimized further. And we want to have the tools and the means to be able to run such tests in a way that is smart. Then there's the user and its devices. And devices are different. There are different operating systems, different CPUs, different hardware acceleration, which we need for encoding and decoding video. That reduces the load from the CPU, for example. Different cameras, microphone, and display resolutions. The call will be very different if I do that on my desktop here on the table or if I take it from my phone. It's just a different CPU. The screen resolution is different. The camera resolution is different. All of these things are going to affect the type of test that I'm going to do. And there are several things that you need to understand and plan for. The first one, we need to manage device automation. We need to have virtual machines that we run with Selenium, for example. And they should be configurable with the number of cores that we have on them, the amount of memory we have available. We also want to be able to have raw camera and microphone simulation as part of it. Think about it. If I'm going to use a machine, even a mobile device, running in some data center, and what I'm going to see in the camera is a static image all the time, the encoder isn't going to work much, and I'm not going to get the actual experience that is when someone is actually talking with a camera facing him. So it's important to deal with such use cases and be able to inject raw camera and microphone data into an actual test. Now last, but sadly true, you are going to do manual testing with WebRTC. You can't automate everything around WebRTC testing. Some of it will be manual at the end of the day, especially the part of using browsers that are not as easy to play with. Safari. OK. Or when what we're trying to do is running on mobile. Because doing mobile testing automation is hard enough, doing it for something that is as generic as WebRTC is even harder. Because we need to deal with microphone and camera and the audio environment of the device and where it is and what the camera is facing and the network of the device, and it's a lot different than doing just website testing on mobile devices. Then there's your application. And you can say, well, what's so different between testing the WebRTC application than any other web application? And the answer is, first of all, it runs specific workflows that you have. If you look at Voice over IP, most of them use the same protocols, the same workflows, the same testing. So it was easy to build testing tools for that. With WebRTC, we need to do it as dynamic as the web, which is why we're using browser automation with Selenium and Puppeteer. The other thing about it is it needs to scale properly from one call to multiple calls in parallel or from a call with two users to a group call with 50 users in the same service. How exactly do you do that with your testing? And then there's the whole people of the whole fact that we're connecting people. I'm not connecting a person to a website and he needs to go through a login phase, and then he needs to go to the e-commerce site, pick and choose two or three items, and then go fill out a form and click the Buy button on the checkout cart. What I just described is something that a single user is using in front of a server. But think about it. If you're doing something with WebRTC, it's going to be interactive. There's going to be a call taking place. I am the teacher. I create the lesson. I join. After I join, the students are going to join the room, not before. If someone mutes, I want to see what happens to the others as he mutes. So it's kind of like it takes two to tango. I need to be able to synchronize between different browsers running in the same test inside the same session because it needs to be orchestrated. I need to understand exactly which browser does what thing at what point in time, and one browser needs to wait for another in order to interact and do the things that he needs to do. And that means that you need to build synchronization mechanisms into the automation tools that you are going to use in order to test WebRTC. It starts with asking the question, who joins when, and being able to answer that. The answer is usually going to be, let's use out-of-band message passing. So I need to be able to send messages between the browsers that are not part of the application that I'm testing, saying something like, you browser number two, please wait. And then once browser one ends and finishes something, it can send a message saying, well, I'm done so that browser two can start his part of the process in there. OK, so this out-of-band message in passing is really important. Then you need to be able to spawn and orchestrate these tests at scale. It's not one browser or two browsers at a time. It's usually 10, 20, 50, 100, or thousands of browsers running in parallel. And these browsers require a lot of CPU and a lot of memory. OK, usually you'll need two to four cores, so vCPUs per browser, to run a test that runs WebRTC to do a simple video call. Just bear that in mind. So the costs are going to be high in orchestrating, allocating so many machines and getting the quotas for them in the cloud is not that simple. And then on top of that, you need to add expectations and assertions. I'm not going to just check that the web page came to the page that I wanted to. I need to actually validate that there is video. I need to validate that the packet loss is low, that the bitrate is at a certain threshold that I need it to be. So all of these kinds of assertions need to be added that are not part of the traditional web testing. You need to look at WebRTC metrics and from there understand and deduce if the test succeeded or failed. Now something we don't talk about a lot, and this is the elephant in the room when it comes to WebRTC testing, that's going to be the predictability. If I'm running a test, I want to be able to run the test again tomorrow and get the same results. OK, think about it. You're doing a manual test of a group call. You need to get 10 people to join a room. How are you going to get 10 machines joining a room? Through browsers. Are you going to ask people in your company to join? What is that testing exactly? Their network connection or your application? What are you going to do with the false positives? What happens if you try to run it tomorrow and one of the people is unavailable or is on a different network? You're not going to get the same results. So you need to bear that in mind and take that into account. You need to be able to build that predictability into your own application or into your own testing tools so that once the QA runs a test, the R&D can run it again to get the same results and the QA will be able to run it again to validate that the problem was fixed. The other thing is visibility. You want to be able to actually look at the results and understand them. You can't just take screenshots and say, well, it looks OK, the video. It doesn't work that way. You need to understand what goes in the underlying protocol of WebRTC itself. And that means that you first of all need to make your test repeatable. So if you run multiple times, it gets you the same results. And you need to collect and visualize everything. If we're used to doing things like machine performance and events from the browser and console logs, WebRTC means that we also need to collect WebRTC API calls and getStats calls. getStat is an API call in WebRTC that gives you internal metric statistics of the actual peer connection object, which is the connection, which is the object that is used in WebRTC. So you need to be able to collect all of these and then analyze them and visualize them properly. Now you can do this on your own. There are open source tools to do that, or you can come and I invite you to do that, to come to Spearline and look at TestRTC. We have products that do testing and monitoring for WebRTC based applications, and that can speed up the work that you need to do. So thank you and see you again.
21 min
03 Nov, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Workshops on related topic