Performance can make or break a website, but how can you quantify that? In this session we will look at the Core Web Vitals as a way to measure performance on the web. Specifically, we'll go through the history of web performance measurements, where the new metrics come from and how they are measured.
Core Web Vitals - What, Why and How?
Transcription
Hello and welcome to my session about the core web vitals, their what, why, and how, more specifically. This is a testing conference and I'm always a little humble to speak at testing conferences because I'm not that much into the testing space anymore. I do write tests when I write my code, but you all are probably more experts here than I am. Nonetheless, testing your website performance is an important thing and the core web vitals are a tool to accomplish exactly that. So I think it makes sense to discuss these things. I'll look with you at three different things tonight. The first thing is we'll talk about web performance or what website performance actually is. We'll talk about the core web vitals and then we also will talk about how the core web vitals will contribute to Google Search in the form of the page experience signal launching in May. So there are some seo implications or search engine optimization implications from this as well. So let's start with what is website performance. Well, intuitively, we all know the answer to this question. Is a website fast and delightful to use or not? But if you want to compare that between sites and maybe even between different versions of the same site, it becomes a lot more tricky because you want something that you can compare and track over time. And intuitive measurements don't really help and don't really tick that box. So the goal is to quantify it, to have some sort of number or metric that we can get that tells us if a website is fast and delightful for a user to use or not. And as we will see in this talk, this has evolved over time and continues to evolve even today as our understanding of what makes a website fast and performant and delightful for users changes over time. Also, the web changes and the kind of websites we build are changing. So there won't be an easy answer. That's kind of like the spoiler alert. But let's have a look at this. So how could we quantify web page performance? Well, one of the earliest metrics has probably been the time to first byte. So we would measure how long it takes for the first byte from the server to come back to our computer or device. And actually, the browser can then start parsing and then eventually rendering the page. And historically, this has made a lot of sense. So a classical website like here, this example.com case, our browser would make a request. The web server would respond with the html. And then the content would be visible in the browser. There are huge differences. And there are a few things and factors that we can influence as website owners and developers to make sure that this is still fast. We make sure that our server is fast, has enough memory, has enough capacity, has good network bandwidth. We can also make sure that the server is close enough, physically close enough. Because it just physically takes time for data to, or electrical or light impulses, to travel. If I'm here in Switzerland, the server is in Australia, then this might take a while until the data has made its way to Australia and comes back. It might be lost on the way and then has to be retransmitted. So this can take a significantly longer time than when the server is, for instance, in my own city. I'm living nearby a data center. So maybe if it's located there, then it's literally just taking basically no time at all. It's going to be really, really quick. And thus, the time to first byte will be a lot shorter than it would be with a server in Australia. But is this an exhaustive good metric? Is this all we need to quantify if a website is fast and delightful? No. And that's partially because the website architecture has changed over time, but also because bandwidth and connection speeds are not necessarily the biggest bottleneck anymore. So let's look at two websites. I open both websites on the same machine at the same physical location at the same time. And I have maybe like I have two machines next to each other going through the same internet connection. It doesn't really matter. I go to a.example.com and b.example.com. And we assume that these are completely separate servers and completely separate web applications. So these requests go out, and a.example.com takes a while. Maybe it's like a classical PHP or Java or Python or Ruby program that needs to run on the server. Maybe it is a server-side rendered application that needs to talk to a bunch of backends and APIs and databases to actually fetch the data and then compile the html before sending it over the wire. Doesn't really matter. The point is, it takes a moment. It doesn't matter how long this moment is. It just takes a moment. Whereas b.example.com, on the other hand, has received the request, immediately responds back, and the time to first byte has arrived. And it's html that says, load this piece of javascript. And now, the next second, a.example.com responds with the full html. It has done all the things that it needs to do on the server. And my browser shows me the website. Whereas with b.example.com, we are now at the stage where we get the app.js, which then comes back and then probably starts rendering or starts running the javascript. Once the javascript starts running, it discovers, oh, we need to make these bunch of api requests, then makes api requests, and these come back, all while the browser still has nothing substantial to show to the user. Now, which of these two websites is better, more delightful, and faster, according to a user looking at both browser windows? Well, very clearly, a.example.com. But if you remember, originally, according to the metric of time to first byte, a.example.com was the slower one. It took longer until we received the first byte of the response. But as we received it, the response was more complete than the other response. So time to first byte is not good enough these days. And it has not really been a useful metric. It is still relatively useful because it helps you if you see, oh, my website is slow. And you see, actually, the rendering itself is really fast. And we don't have to wait that much until things are being painted. It's just the connection time and the time it takes for the data to go over the wire and come back. Then that's the bottleneck that you need to fix. And you can fix that by using a CDN or something. But point being still, that metric is not sufficient. If you just look at time to first byte and say, what? My server responded in 0.1 seconds. And the data was there in 0.5 seconds. How can this be slow? You might miss out. And that's why we have looked at many, many metrics. For instance, speed index, where we try to figure out, OK, so not just when is the website there, when is the network part of it mostly done, but when do things start to pop up, and how long does it take over time to near visual completion? Then we looked at first content full paint. How long does it take to get the first bit of the content actually visible on the browser window? Excuse me. And then we had tons of other metrics over time to track different aspects. And it's not just about when does stuff start to show up. Imagine you are having an online shop for pizzas. You want a pizza delivered to your place. It shows up really quickly. The menu is there in no time, in the blink of an eye. Fantastic. But then you're like, I want this pizza. I want this pizza. Hello, I want this pizza. You click, click, click, click, click. And after five seconds, suddenly you have 100 pizzas in your cart. That's not delightful either. So we also looked at other metrics like time to interact. When can I actually start to interact with the content, and a lot of other metrics. And this has evolved over time in random intervals, it seems like. Some parts of it was like, actually, you know what? This metric doesn't really reflect what we looked for or what the users experienced. So we came up with this new metric. And then someone else was like, that's a great metric, but it also needs to basically take into account this other aspect, as I said, for instance, interactivity. And thus, we have not only one metric that we look at, but we look at a set of metrics, which unfortunately also makes this more complicated, not only to understand, but also to communicate with others. So if you are communicating with users, and you say, you have a Lighthouse score of 100 out of 100, that really doesn't mean that much to them. However, they might say, oh, it's only 80 out of 100 possible points. That still does not really say much, because are these last 20 points really a problem, or is that just like cosmetic? So measuring performance is actually a big challenge, as it turns out. One thing is that we want metrics to be relatively stable, but also fine enough and sensitive enough to really spot problems when they occur. We had a metric called first meaningful pane that tried to figure out when the meaningful part of the content was showing up. And that usually was very janky. So that means that you would have the same website, you would measure three times, you would get three different results without changing anything in the circumstances. So that's not really helpful. You want a metric that is comparable over time and more or less stable. It will never be 100% stable, but you don't want to pick two sensitive ones. You don't want to pick two rough ones where you don't like time to first bite. It's a very, very broad, rough metric. Doesn't really reflect things either. And also, these metrics really need to reflect the actual user experience. I said that already with the time to interact. We needed a way to actually track these things as well. Also, we would like data to be able to be generated in lab settings where we can run these things automatic and with things that are not necessarily already public. If we can only gather real user data, that is a little tricky. But it would also be nice to get real user data to get a feeling for how this actually looks like on real people's machines and connections. Because we, with our high-end computers and good, stable, fast internet connection land, might not be the target group of our website. And then people on phones, on flaky connections, might have a very, very different experience. So it would be cool if our metrics could be measured in both contexts. Another big thing was that as we were changing the metrics, because our understanding of performance changed over time, you might end up working on improving a metric and then finding that this metric fell out of favor. And then we have looked at other metrics. And now you're like, ooh, our score is constantly increasing. And then it drops. And you're like, why? Have we done something wrong? Which also puts you in the uncomfortable position that whoever you're reporting to might ask for these metrics. And then if your metrics are now lower than they have been as you started the initiative to make things faster, that might not be very comfortable. So we needed to change a few things. And we figured out that one approach would be to basically get vital signs for the web. These are called the core web vitals. But what are these? Well, they are basically three metrics for web developers and testers and SEOs to look at to figure out how the user experience in terms of performance is on each page and to measure that and to work reliably on improving them as well. The data is already in pretty much all the tools that we provide at Google. So it's in Lighthouse, which is in the Chrome devtools. It's in PageSpeed Insights. It's in the Google Search Console Web Vitals report. There's WebPagetest as well. And there's also the javascript APIs in Chrome that you can use to measure these things in the real user setting. The goals for the thresholds that we are using to judge if websites are doing well, need improvement, or are doing poorly are based on field data that we gathered and analyzed. So these metrics can and are already being achieved by lots of websites. Even though you might not necessarily hit the targets yet, you can achieve them. That is definitely possible. And also to fix the goalposts, moving goalposts issue, we will roughly update them every year. So we published them in May last year, 2020. And we will probably give an update on them at Google I-O. This year as well. And then it will, again, roughly be a yearly cadence for us to review the thresholds and the metrics. So what are these three metrics that make up the core web vitals? The first thing measures visual completeness. So how long does it take until I actually see what I care about? What's the main content? How long does it take to show up? That's measured by the largest contentful paint metric. It's basically the visual loading time that we had measured with other things beforehand. And the good measure would be less than 2.5 seconds. If you are getting your main content visible within four seconds, that is in the needs improvement area. Everything that takes longer than four seconds has an impact on how users perceive your site. So we would recommend to make the website faster then. The second metric, as I hinted at with the pizza shop example, is how long does it take until I can actually interact with the web application? We measure that in the first input delay. You want to make sure that the page response to user inputs, you can encompass that by using less javascript or shifting it off the main thread or delaying it otherwise. And you want to be under 100 milliseconds because that feels basically immediate to the human brain, whereas a 300 millisecond delay feels like slightly sluggish but still relatively instantaneous. Everything that takes longer will eventually cause a disconnect or a feeling of disconnect between the action and the reaction. So that's something that you want to keep an eye on as well. Last but not least, we also want to make sure that the page content is visually stable. What does that mean? Well, it means how much does it move around? You probably know this if you're on your phone or on your computer and you see a website shows you a button and you want to interact with that button. But then before you can click on that button, something else moves. And then a new thing is there. And you click on that, and you're like, oh, no, I didn't want to interact with this. Why did that happen? We measure that with a new metric called cumulative layout shift. And it's basically how much of the content shifted and by how much did it shift. That value should be below 0.1. Because everything between 0.1 and 0.25 is considered OK. Everything that is more than that is definitely considered a problem. What are these values? To be fair, I don't really have found the unit that I should use because it's not really percent. But it's basically the way you calculate it is how much of the page is affected by the shift and by how much does it shift. So in this case, for instance, we have a website that has two halves, the gray half and the green half. After a while, a button pops into the middle of the page, which means the entire lower half shifts. So 50% of the page is affected by the shift. The button and some spacing that it introduces is roughly 14% of the page. So it is affected by 14%. So we can multiply the 50% that's the affected area by the 14% that it shifts. And that gives us 7%. It's not really percent, but 0.07 is the value we get if we multiply 0.5 with 0.14. So 0.07, that would be within the acceptable range, actually. Assuming that this button would now be on the top of the page, everything on the page would shift. So 100% of it would shift. It would probably still be shifting by 14%. So that would be 0.14. That is above the threshold already. So you can see that either large amounts of space that are taken after the fact or after the first rendering, or a shift that moves everything on the page, both of these are being taken into account as problematic shifts anyway. What can you do in regards to the core web vitals? Web.dev slash vitals has lots of information. Learn what you want to learn about these metrics. Understand what they do, how they are measured, how you can improve on them. It's very, very useful to know how this works. Test all your different page versions. If you have a mobile desktop and AMP page version of your content, then test all three of these or any combinations. And do look into integrating it into your automated flow, because that's really, really helpful. It also has impact on Google Search, or will have impact on Google Search. There will be a new signal called Page Experience. In May 2021, we want to launch this new ranking signal that is comprised or composed from existing ranking signals. So we take things like mobile friendliness, safe browsing, HTTPS, and intrusive interstitials, which are already ranking factors, and remove the proprietary page speed variation that we use to measure in ranking with the core web vitals measurements, and combine these into a signal called the Page Experience signal. Page speed and mobile user experience are not new ranking factors. They have been beforehand, so it's not something that you need to super much worry about. It is just something to be aware of. And both the page speed using the core web vitals and the others that I just mentioned will form this new Page Experience signal that will happen in May. One of the upsides is that once this Page Experience signal has launched, we know how fast different pages are, and we will no longer require AMP to be a requirement to show up in the top stories carousel. So if you're a new site and want to have your articles in the top stories carousel, once the Page Experience signal has launched, AMP will not be a requirement to show up in there anymore. What can you do for these things? What if people in your organization are worried about the Page Experience update? Don't be worried about it. It's not a huge update. It is an update, but it's not the biggest we've done. Do check your pages. Make sure that there's no core web vitals issues, that there are no mobile friendliness issues, no safe browsing issues. You can use the Search Console. That's a free tool we put out there. You can go to search.google.com slash console. Sign up, get a feeling for how your pages are doing. And use the core web vitals report, as well as the mobile friendliness test, to figure out where there's areas for improvement and work on those. If you want to learn more, feel free to ping us on Twitter at Google Search C, or ping me on Twitter as GeekorNod. You can also check out our documentation at developers.google.com slash search, which has a load of information. And we also run a YouTube channel with regular office hours on youtube.com slash Google Search Central. With that, I'd like to say thank you so much for watching and listening, and bring on your questions. I'm really excited to hear what you are up to. MARTIN SPLITT Hey, thanks for joining us. How are you doing? MARTIN SPLITT Hi there. Oh, I'm doing pretty well. I mean, all things considered, still doing well, I guess. How are you doing? MARTIN SPLITT Good. Happy to hear that. Yeah, very well. Very well. I can't complain. I mean, it's been a lovely two days here at TestJS Summit. So anything that happens in life, you forget with such nice conference days. We're going to go into the questions from our audience. So back yourself to your seat. We're going to go. The first question is from our guest, Yanni. And he's asking, visual completeness within 2 and 1⁄2 seconds, but first input within 100 milliseconds. Is it really feasible to have any sort of effective input if the main content is not loaded yet? Or does the first input metric measure the time between the user clicking and the input showing up? He doesn't quite understand what exactly is measured there. MARTIN SPLITT That is a brilliant question. So the general idea with the FID is roughly, so what problem are we trying to address? The problem we're trying to address with this is the fact that lots of pages have javascript that blocks the main thread, which effectively means that you can't actually use the main thread for anything else than actually executing the javascript that has been loaded into the document. So basically, the FID 100 milliseconds is looking at main thread activity and how early on we could potentially get the main thread to do stuff. You have to understand that also the painting and the layouting and all of that stuff does not happen on the map. Actually, with the layouting, I'm not even 100% sure. And I'm also not sure if every browser does it the same. But at least the painting part, which has a lot of influence on the largest content for paint, for instance, that happens separately, especially the painting, because it's on the GPU. And I'm pretty sure most of the browsers are actually doing layouting on a separate thread as well. But any javascript can potentially block the main thread, which is also where the interactions are processed and the inputs are processed. So you want to get off the main thread as early as possible. Or you might want to buffer things out. Or you might want to offload lots of work into longer-running work in javascript into web workers, for instance, which is, by the way, a heavily underutilized technique. So that's pretty much what we are trying to figure out here. How fast does your website potentially respond to inputs? Not necessarily from an interaction to something happening, but basically, when would you be able to start processing inputs? If you think about it, if you have a website that starts displaying things, and maybe it doesn't display things yet, but you still have a bunch of sections that are roughly taking up space already because just the painting hasn't happened yet, and you want to scroll because you know you want to be somewhere in the middle of the page, and you scroll, and nothing happens, that's not necessarily a great experience. But generally, for what you are trying to do, I think it is fine to not necessarily worry too much about this, worry more about the delay between. And that's hard to measure, unfortunately, automatically, which is why this metric is potentially going to be improved in the future, between actually interaction happening and your code responding, because that's where the psychology really is. Again, not really easy to express that in the metrics, unfortunately, yet. Well, you'll get there one day, Martin. I know you will. I'm not smart enough for that. That's something that they need to work on, the smart people. Just act. Just play the role. Sure, sure. I'm on it. Next question, and I think that's the last question we have time for, is from Auto Gibbon. Is the CLS metric time-bound? In example, I've seen some websites shift all the time while they're using it, and others only spend the first few seconds of loading and balancing stuff around. That's actually, it is not time-bound, as far as I'm aware. And that's actually one of the biggest complaints about this, because single-page applications currently have a bit of a hard time with CLS, because technically, when you navigate from one view to another, you have a huge layout shift, right? Pretty much everything on the page is affected, and it shifts by a lot. So as the CLS is measured throughout page lifetime in the user's browser, you might see high CLS scores when it isn't really visual instability. It's just the way that single-page applications happen to work. There is currently, if you go to Chrome Devs, I think is the account name. So basically, the Chrome developer's Twitter account. You will find the link to a survey where you can give feedback on problems with CLS, because I know that that metric is definitely in for a major rework, because there is no time-boxing or time-bounding on CLS that can actually cause high CLS values where they are not really user experience problem. Yeah, it feels like cheating. Yeah, like I said, that's all the time we have for our face-to-face Q&A. Oh, we have one more. One more. But a quick one, Martin. The question is from Seth. How does page shift handle things like privacy slash cookie notifications, banner and stuff? So I had a look at that, and it depends on how it's implemented. If it is implemented outside of the rest of the layout flow, so if you're basically absolute positioning things on top of it without other things shifting, that's not a layout shift. Oftentimes, many solutions that are on the market unfortunately implement it in a way that it does shift things around, and then that is a problem. Mm-hmm. So don't shift things around if you want to have a good score. If you want Martin's approval, don't shift things around. Well, thanks, Martin. For the rest of the questions, you're going to have to go to Martin's speaker room. He's going to go to the Spatial Chat. Click the link below in the timetable, and you'll find Martin there. Martin, thanks a lot. Yes. Lovely seeing you again. Thank you very much. Hopping over to Spatial Chat. Thanks a lot. Bye-bye. Bye-bye. Thank you.