1. Introduction to WebRTC
Hi, I'm Tzachi Levent Levi, CPO at Spearline. I want to talk to you about the challenges of testing and monitoring web RTC applications. Web RTC is an HTML5 specification used for real-time media communications between browsers and devices. It enables sending and receiving media in real-time through standard-based APIs available in all modern browsers. Calls in web RTC involve sending messages between users through an application server, with the server forwarding the messages and allowing changes. The users can then communicate directly with each other using real-time media.
Hi, I'm Tzachi Levent Levi, CPO at Spearline. I want to talk to you about the challenges of testing and monitoring web RTC applications.
Okay, so if I'm going to communicate with someone through the Internet, inside the web or inside the web browser, I'm going to use web RTC in order to send and receive media in real-time. This is what web RTC is for. It is a set of APIs that are standard-based, that are available in each and every browser out there today.
How are calls exactly made with web RTC? We've got two people here. One is using a browser, and the other one might even use a mobile application or a mobile browser. There is a website of the application that they use in order to communicate with each other. The guy on the left here is going to send a message to the application server. And he's going to say, you know what? I want to invite that person to talk. Here is my offer. The server is going to look at the message. He knows where to find Brownhead on the right. And he's going to forward that message to him. Along the way, he can make changes to this message if he wants to.
Now the guy on the right here, Brownhead, received that message. He knows that someone is inviting him to a call. He also knows that this is redhead on the left. And he knows what that person wants to do and what type of codecs, for example, to use in the call. Codec is what we use to compress, to encode and decode audio and video over the network. So he's going to accept the call by sending an answer. This answer, again, is going to go through the server to the other browser. App-Lint-In-Low, everything here, is not really related to WebRTC. And it's just how things happen on the web. The magic of WebRTC comes next. And this is where actual real-time media is being sent directly from one user to another, and vice versa in real-time interactively. Now they can communicate directly with each other just by running these few messages before starting the WebRTC session.
2. Challenges of WebRTC
WebRTC sits between VoIP and the web, causing issues and raising questions about what to test and monitor. The answer is both, depending on the situation. Let's explore the challenges.
This is the only time and instance that a browser can send a message directly to another browser and not go through a server. Now, WebRTC sits somewhere between VoIP, voice-over IP, and the web, the Internet as we know it today, web pages. And that causes a lot of issues. That's because we've got here two different disciplines that are competing with each other. And it begs the questions for people like us that deal with testing. And that's, what exactly do we test and monitor? Are we going to use tools for VoIP testing and monitoring or are we going to use web tools for that? And the answer is both, and it depends or it's complicated. So let's see what are the challenges we need to deal with.
3. Managing External Factors in Web Development
We have the application that we control, users with their own devices and peripherals, the unpredictable network, and the ever-changing browsers. Browsers update frequently, and each release can potentially break WebRTC applications, especially if using advanced features.
And we start by understanding who the lead actors are and understanding that they are out of our control as developers or as testers. First and foremost, we have the application. We've written it. We know everything about it. We're in control of that application and every piece of line of code in it.
Then there is the users. The users use their own devices and their own peripherals. For example, to actually do this session, I'm using a Rode microphone, which is not the built in microphone in my machine because it's better for these things. And I've got headsets because I need them to communicate with you guys.
Then there's the network. You don't control the network, you don't own it, you don't know where the user is. In my home, because I do that all the time, I'm connected directly to Ethernet with fiber to the home so I know that I've got a very good network. The people might actually do these calls or try to do them from an elevator, a basement or on the way riding the car. And that's going to affect what we're going to do.
Last but not least are the browsers. You're all already used to the fact that browsers are out of our control. But for people doing voice over IP, this is new. It's different. And they need to deal with that. Let's see how we are going to deal with each and every one of these challenges.
And we'll start with the browsers. As you know, browsers update quite frequently, most of them every month. In that cadence, we're going to get a new release of Firefox, of Edge and of Chrome. Safari ends up being updated once every one to three months. As a developer, you can't go to Google and tell them, look, I've got this bug that I need to solve, that I know that you're going to introduce in the next Chrome release. Can you please delay that release until I fix that bug? Not going to happen. They're just going to dump a new release of Chrome into the market and that's going to be the end of it.
Now, the problem with that is that with WebRTC specifically, each and every browser release is slightly different. They're still working on the implementation of WebRTC and of how that works over the network. That means that it is going to break your application probably once or twice a year, especially if you're going and using a lot of different features of WebRTC itself or taking it to the edge, to the extreme abilities of WebRTC in browsers.
4. Testing Challenges and Network Considerations
We need to do browser automation in testing using Selenium or Puppeteer to repeat tests frequently. WebRTC requires low latency and high bitrates, which are affected by bandwidth, packet loss, and jitter. Dealing with packet losses and jitter in WebRTC is different from web pages. Geographical location and the last mile also impact the quality of the call, so testing in different locations is important.
So the first thing we need to do is to decide or agree that we need to do browser automation in terms of testing. Why? Because we want to be able to repeat tests all the time and very frequently. In order to do that, we're going to use either Selenium or Puppeteer, which allows us to test and simulate and automate the scenarios that we have using real browsers.
In voice over IP, for example, most of the testing is done using testing tools that are built to fit and are proprietary in nature. Most of the testing tools available in WebRTC are tools that are actually running on top of browser automation like Selenium and Puppeteer.
Then we need to talk about networks and networks are complicated. This is true for everything running on the Web, but WebRTC is different because things happen in real time. If I send media, I expect it to be received on the other side in a sub-second latency. I'm sending it. I want it to be there on the other side in 100 milliseconds, 200 milliseconds, 500 milliseconds at most. Otherwise, it can't be interactive.
Doing it at so low latency and at high bitrates, especially if it's video, means that there are things that are going to affect me that don't really affect Web pages as much. That's going to be the bandwidth, the packet loss and the cheater. I need to understand how much bandwidth I have available and then play with the bitrate that the codecs use to encode and send the media in order to meet these limitations that they have. I need to be able to deal with packet losses that happen on the network.
OK, if I send the Web page to someone from a server, if he misses some of the packets of that Web pages, that's not a big deal because then there is a retransmission that is taking place anyway. You don't even you're not even aware of that. You might get the Web page 200 milliseconds later. But if I'm trying to actually talk to someone and there is packet losses, I can't even retransmit. There is not enough time to that. So how I deal with packet losses is vastly different with WebRTC. And then there is jitter, the rate at which packets arrive versus how they are sent over the network. The more jitter there is, the harder it is going to be to actually play it back on the other end.
Now, all of these things that we need to deal with on the network need to come with another aspects in mind. The first one is geographies. If I have two people on the call, one of them is in Paris, the other one is in the United States, it would be wise if the server that connects them would be somewhere in between there and not in India. OK, if we start playing with geographies, we need to understand where our users are and we need to test where our users are.
Then there's the part of the last mile and I already alluded to that briefly in the past. If the user is far away from the access point, there is going to be a reduction in the bit rate, an increase in the packet loss and in the latency and the quality is going to degrade. We need to test for that and see what happens there.
5. Testing Challenges and Device Automation
In WebRTC testing, we need to consider the characteristics of different networks and how our application behaves under varying bandwidth conditions. Dynamic changes in bandwidth and shared network environments also need to be taken into account. It is important to test globally and locally, including multiple regions and locations. Controlling the device's local network behavior, such as limiting bitrate and injecting packet loss or jitter, is essential for testing. Different devices have varying operating systems, CPUs, hardware acceleration, cameras, microphones, and display resolutions, all of which impact testing. Managing device automation is a crucial aspect of WebRTC testing.
If the last mile is ADSL or a 3G cellular network or LTE network, or just Wi-Fi, it has different characteristics to it and we need to understand how our application works there. Why? Because we need high bandwidth sometimes in large group calls, for example, and we need to see what happens if we don't have enough bandwidth on the network. How our application is going to behave in these cases, how is it going to degrade the actual look and feel for the users? Is it going to remove some of the people from being viewed? Or is it going to reduce the bitrate for everyone and reduce the quality for everyone in the same way? So these are things that we need to be able to test for.
Last but not least, all of these changes are dynamic in nature. I might have high bandwidth now and low five minutes into the session that I'm doing using WebRTC. Why? Because my son decided that he wants to play Fortnite in parallel to me doing this call and I don't have enough bitrate on the network. Or because someone else is happening in a shared environment, in a shared network, either locally or remotely. So what are we to do with all of these things? We simply need to take them into account. And that means that in our testing of WebRTC applications, we need to think about them globally and locally at the same time. We need to test for multiple regions and multiple locations, not only for multiple calls. So I want the calls to be in the U.S. now and now I want them in Europe and now I want them in India. No, I want to do a group call with two people joining from the U.S., two from Europe and one from India. And see what happens now with such a large group call.
The other thing is that we want to have controls over the device that is being tested for its local network behavior. I want to be able to limit it by bitrate. I want to be able to inject packet loss or jitter into that device, the time testing. OK, and I want to do it dynamically. So saying things like, let's start with no limits, wait for two minutes. Now we're going to reduce the bitrate to 100 kilobits per second, see what happens for two minutes. Then I'm going to remove the limit again and I'm going to check how much time it is going to take the service to ramp up again to the highest bitrate it can go in the network. Is that going to take five seconds, 30 seconds, one minute or never? OK, some of it are bugs like never and some of it can be optimized further and we want to have the tools and the means to be able to run such tests in a way that is smart.
Then there is the user and its devices and devices are different. There are different operating systems, different CPUs, different hardware acceleration which we need for encoding and decoding video. OK, that reduces the load from the CPU, for example, different cameras, microphone and display resolutions. The call will be very different if I do that on my desktop here on the table or if I take it from my phone. It's just a different CPU. The screen resolution is different, the camera resolution is different. All of these things are going to affect the type of test that I am going to do and there are several things that you need to understand and plan for. The first one, we need to manage device automation.
6. Virtual Machines and Raw Data Injection
We need to have virtual machines that will run with Selenium, for example, and they should be configurable with the number of cores that we have on them, the amount of memory we have available. We also want to be able to have raw camera and microphone simulation as part of it. It is important to deal with such use cases and be able to inject raw camera and microphone data into an actual test.
We need to have virtual machines that will run with Selenium, for example, and they should be configurable with the number of cores that we have on them, the amount of memory we have available. We also want to be able to have raw camera and microphone simulation as part of it. Okay, think about it. If I am going to use a machine, even a mobile device running in some data center, and what I am going to see in the camera is a static image all the time, the encoder isn't going to work much and I am not going to get the actual experience that is when someone is actually talking with a camera facing him. Okay, so it is important to deal with such use cases and be able to inject raw camera and microphone data into an actual test.
7. Manual Testing and Application Workflows
You can't automate everything on WebRTC testing, especially when using browsers like Safari or testing on mobile devices. WebRTC applications have specific workflows and need to scale properly. Testing WebRTC requires synchronizing between different browsers, using out of band message passing, and building synchronization mechanisms into automation tools. Orchestration and scalability are also important, as tests often involve multiple browsers running in parallel and require significant CPU and memory.
Now, last, but sadly true, you are going to do manual testing with WebRTC. You can't automate everything on WebRTC testing, some of it will be manual at the end of the especially the part of using browsers that are not as easy to play with, Safari. Okay. Or when what you're trying to do is running on mobile, because doing mobile testing automation is hard enough, doing it for something that is as generic as WebRTC is even harder. Because we need to deal with microphone and camera and the audio environment of the device and where it is and what the camera is facing, and the network of the device. And it's a lot different than doing just website testing on mobile devices.
Then there is your application. And you can say, well, what's so different between testing the WebRTC application than any other web application? And the answer is, first of all, it runs specific workflows that you have. If you look at Voice over IP, most of them use the same protocols, the same workflows, testing. So it was easy to build testing tools for that. With WebRTC, we need to do it as dynamic as the web, which is why we're using browser automation with Selenium and Puppeteer. The other thing about it is it needs to scale properly from one call to multiple calls in parallel or from a call with two users to a group call with 50 users in the same service. How exactly do you do that with your testing?
And then there's the whole fact that we're connecting people. I'm not connecting a person to a website and it needs to go through a login phase and then it needs to go to the e-commerce site, pick and choose two or three items and then go fill out a form and click the buy button on the checkout cart, okay? What I've just described is something that a single user is using in front of a server. But think about it, if you're doing something with WebRTC it's going to be interactive, there's going to be a call taking place. I am the teacher, I create the lesson, I join. After I join, the students are going to join the room not before. If someone mutes, I want to see what happens to the others as he mutes. So it's kind of like it takes two to tango, I need to be able to synchronize between different browsers running in the same test inside the same session, because it needs to be orchestrated. I need to understand exactly which browser does what thing at what point in time, and one browser needs to wait for another in order to interact and do the things that he needs to do. And that means that you need to build synchronization mechanisms into the automation tools that you are going to use in order to test WebRTC. It starts with asking the question who joins when, and being able to answer that. The answer is usually going to be let's use out of band message passing. So I need to be able to send messages between the browsers that are not part of the application time testing. Saying something like, you browser number two, please wait. And then once browser one ends and finishes something, it can send a message saying, well, I'm done so that browser two can start his part of the process in there. Okay, so this out of band messaging passing is really important. Then you need to be able to spawn and orchestrate these tests at scale. It's not one browser or two browser at a time, it's usually 10, 20, 50, 100, or 1000s of browsers running in parallel. And these browsers require a lot of CPU and a lot of memory.
8. Challenges of WebRTC Testing
To effectively test WebRTC applications, you need to consider the costs and complexities of orchestrating multiple machines in the cloud. Additionally, you must add specific expectations and assertions to validate video, packet loss, and bit rate. Predictability is a significant challenge, as running tests with multiple participants and maintaining consistent results can be difficult. Visibility is crucial, requiring the collection and analysis of WebRTC API calls and metric statistics. TestRTC by Spearline offers testing and monitoring solutions for WebRTC applications.
Okay, usually you'll need two to four cores, so vcpus per browser to run a test that runs web RTC to do a simple video call. Just bear that in mind. So the costs are going to be high and orchestrating, allocating so many machines and getting the quotas for them in the cloud is not that simple.
And then on top of that, you need to add expectations and assertions, I'm not going to just check that the web page came to the page that I wanted to, I need to actually validate that there is video, I need to validate that the packet loss is low, that the bit rate is at a certain threshold that I need it to be. So all of these kinds of assertions need to be added, that are not part of the traditional web testing, they need to look at WebRTC metrics and from there understand and deduce if the test succeeded or failed.
Now something we don't talk about a lot and this is the elephant in the room when it comes to WebRTC testing, that's going to be the predictability. If I'm running a test I want to be able to run the test again tomorrow and get the same results, okay? Think about it, you're doing a manual test of a group call, you need to get 10 people to join the room. How are you going to get 10 machines joining a room through browsers? Are you going to ask people in your company to join? What is that testing exactly? Their network connection or your application? What are you going to do with the false positives? What happens if you try to run it tomorrow and one of the people is unavailable or is on a different network? You're not going to get the same results. So you need to bear that in mind and take that into account. You need to be able to build that predictability into your own application or into your own testing tools so that once the QA runs a test the R&D can run it again to get the same results and the QA will be able to run it again to validate that the problem was fixed.
The other thing is visibility. You want to be able to actually look at the results and understand them. You can't just take screenshots and say well you know it looks okay the video. It doesn't work that way you need to understand what goes in the underlying protocol of WebRTC itself and that means that you first of all need to make, you need to make your test repeatable. Your test repeatable okay so if you run multiple times it gets you the same results and you need to collect and visualize everything. If we're used to doing things like machine performance and events from the browser and console logs WebRTC means that we also need to collect WebRTC API calls and getStat calls. GetStat is an API call in WebRTC that gives you internal metric statistics of the actual peer connection object which is a connection which is the object that you is used in WebRTC. You need to be able to collect all of these and then analyze them and visualize them properly. Now you can do this on your own there are open source tools to do that or you can come and I invite you to do that to come to Spirline and look at TestRTC. We have products that do testing and monitoring for WebRTC based applications and that can speed up the work that you need to do.