These days web performance is one of the most important things everyone wants to optimize on their apps, and it's clear to everyone how dramatic the impact of a poorly optimized website is on business. Yet we as an industry completely fail in recognizing its complexity and widely misuse the most common tool to measure it — Google Lighthouse. If you’re one of those people thinking that good performance equals a good Lighthouse score, you’ve also fallen into this trap and this talk is for you.
You’re Probably Using Lighthouse Wrong: How We Got Tricked by a Single Magic Number
AI Generated Video Summary
1. Introduction to the Talk
Hello. Hey, hey, how are you doing? Are you having fun? I'm happy, because I'm having a lot of fun. It's great to be here again. The title of my talk is, You're Probably Using Cloud House Wrong. My name is Filip Rakovski. I'm a chief developer experience officer and co-founder of Vue. I'm also a technology council member of Maha Alliance. Maha Alliance is an alliance of the biggest enterprise vendors that are modernizing the e-commerce landscape and I'm extremely proud to represent it. I work in the e-commerce industry. Building e-commerce storefront is harder than it seems. And I can guarantee that you will feel physical pain once you learned what faceting is.
Hello. Hey, hey, how are you doing? Are you having fun? Are you having fun? Okay. Good, good. I'm happy, because I'm having a lot of fun.
It's great to be here again. I think it's the first time I'm here in Vue London, in Vue.js Live, this time. And the title of my talk is, You're Probably Using Cloud House Wrong. And I know it sounds a little bit provoking. That was my intention. But I don't assume you're using it wrong. I really hope you're using it in the right way. But just in case you're using it wrong, here's my talk.
My name is Filip Rakovski. I'm a chief developer experience officer and co-founder of Vue. I was introduced as a CTO because I used to be a CTO, but we hired a better CTO. So, right now I can move to the things that I'm best at, and that I enjoy a little bit more. So, yeah. I'm also a technology council member of Maha Alliance. Who here heard about Maha Alliance? Okay. Okay. So, Maha Alliance is an alliance of the biggest enterprise vendors that are modernizing the e-commerce landscape and I'm extremely proud to represent it. And who heard about Vue Storefront? Please raise your hand. Nice. Nice. It's getting better every year.
So, you know, I work in the e-commerce industry. And I work in the e-commerce industry literally all my life. And building e-commerce storefront is harder than it seems. Like if you're powerful after displaying, you know, the first product, the first data on your website from one API endpoint, but the road from there to production is very long and it's often painful. And I can guarantee that you will feel physical pain once you learned what faceting is.
2. Importance of Performance and Mobile Consumption
So, the goal of Vue Storefront is actually to provide tools that save you from this pain. And Vue Storefront is open source, so you can check it on GitHub and give a star if you like it. I'm not encouraging, but, you know, it would be nice.
And in the e-commerce industry, performance is one of the most important things to look at really. The fact that the way how people look at this is often completely wrong is another topic, but that's what I'm going to address in this talk. So, Amazon did a study on that topic. And what they learned is that every 100 milliseconds in added page load costs 1% of revenue. For Amazon, it's millions of dollars, really. 100 milliseconds.
And speaking about numbers, if you need a good source of arguments for your boss, for example, to take care of performance because you know it's important but you need the argument, check out this website, WPO stats, which stands for performance optimization stats. And it will give you great, great insights on how optimizing performance helps other companies to grow their revenue.
And, you know, as long as we're using PCs or laptops as our primary machines, which believe me, like seven years ago was a normal thing to consume the web, no one seemed to be concerned with the growing size of websites. Both CPU and internet bandwidth, they were growing faster and websites were growing in their size. It all changed when mobile phones started to become the preferred way of consuming the web. And according to Google Research in 2017, into it's took on average 15 seconds to fully load a webpage on a mobile phone. Imagine 15 seconds. If I wouldn't have only 20 minutes, I would just wait to give you, you know, this perception. At that time, the awareness about the impact of this poor mobile performance on their business started to emerge. But we're still lacking an easy way to actually link those two components.
3. Understanding Google Lighthouse Metrics
Performance and business metrics. Google Lighthouse, introduced in 2018, quickly gained popularity along with progressive web apps. However, the understanding of these concepts was limited to Google's marketing. Lighthouse's simplicity, represented by a single number, is both its strength and weakness. Web performance and user experience cannot be reduced to a single number. There are many nuances around Lighthouse, and this talk aims to explore them. Lighthouse measures page quality in terms of performance, accessibility, best practices, and SEO. While it provides valuable insights, it cannot definitively determine user experience. That is ultimately determined by your users.
Performance and business metrics. And everything has changed when Google Laptop started to gain popularity. So I remember when it was first introduced in 2018. It became super popular, super rapidly adopted. The same way as progressive web apps and at that time, everyone started to be obsessed about performance. Everyone started to be obsessed about progressive web apps. But you know what? They really didn't know much about those. It was all marketing from Google. And unfortunately, not that much has changed since then. So what makes Lighthouse so widely adopted? I think it's simplicity. You run a test, you get a number between 100 and 1. That tells you how good or how bad the performance of your website is. Everyone, even those without any technical background can understand that. And honestly, that's the root of the problem. Because the reality is not that simple. In web performance or user experience, it cannot be represented by, you know, just a single number on the screen. In addition, there's tons of nuances around Lighthouse. So how it works, where the numbers comes from, et cetera, et cetera. And we will navigate through all these nuances and at the end of this talk I hope you will feel that you know how Lighthouse works, where these numbers comes from, when to trust it and when not to trust it. But let's start with a simple question. What does Google Lighthouse measure, really? Can anyone tell me? What Google Lighthouse measure?
Okay. I have a feeling that a lot of people doesn't try to answer this question. So, they assume that the number has to be high to be right and when it is low, it is bad and that's it. That's all they need to know. And as we can read on the Google Lighthouse website, the goal of Lighthouse is to measure page quality. So, the audit divides quality into four different metrics. Performance, accessibility, best practices, and SEO. All of those combined should give you a very good perspective on the quality of the website and by that, try to accurately predict the user experience on this website. And try to predict is super important here because no audit will give you any information and definitive answers if your user experience is good or bad. Guess what? Your users will.
4. Understanding Google Lighthouse Score Calculation
Google has always promoted performance as the most important factor, but there are many other factors influencing user experience. The example of BMI illustrates how relying on a single metric can lead to incorrect conclusions. The Google Lighthouse score is calculated from various metrics, each with its own weight. It's important to optimize the high-weight metrics, but the algorithm changes with each version, so comparing scores from different versions is not accurate.
So, Google has always promoted this performance core as the most important one. And I think in the heads of general audience and all of us really, that has core equals performance core. So, quality page means page with, you know, high performance core. And don't get me wrong here, performance is definitely major factor influencing the user experience but the truth is this Sony was that even all four metrics of Google Lighthouse, they won't tell you if it is good or bad. In reality, there are just so many factors influencing good user experience that it's impossible to tell that just by using any tool.
So, without knowing the context, it is very easy to make bad decisions. But logically, it could seem correct. So, give me, let me give you an example. BMI. Who knows BMI? Do you know what it stands for? Body mass index. And I think body mass index is exactly like Google Lighthouse, and I will tell you why. So, does 30 BMI mean that you're obese? If you look at the chart, you can say, yeah, with full confidence, right? But when you dig a little bit deeper into those details of how the results should be interpreted, we learn that this scale doesn't work for a very large amount of people. Older adults, women, muscular individuals, the interpretation for them is different. But the list doesn't end here. The interpretation for children and adolescent is also different. So actually, the only group that comes to mind that actually fits into BMI are, I don't know, non-muscular, middle-aged males. That's it. So the initial results could lead you to decisions that are bad. And if you don't dig into those specific details, it will just make bad decisions. So this kind of thinking could lead us to real disasters. For example, here on the chart, you can see that we can quickly jump to the conclusion that we can put an end to really horrible things just by banning worldwide cheese consumption.
So let me quickly explain how the Google Lighthouse score is calculated to make sure that we are all on the same page. The Lighthouse score is calculated from a bunch of other metrics. Each of them has their own weights. Some of them are more important, some of them are less important, and it's important to know that because when you want to optimize the score, you should start by optimizing the things that have the biggest weight, but not because it will give you the best results, but just because they are the most important ones. And the algorithm is changing with each version. So it's a very important thing to acknowledge because if you're optimizing a website or if you're doing, I don't know, a migration or something like that, never compare the previous Lighthouse score with the new Lighthouse score. This is not the right thing to compare because most likely the scoring algorithm has changed during that time. And you know, the score could decrease, increase mostly because the algorithm has changed, not because you have changed something. Okay.
5. Understanding Lighthouse Score Calculation
We know how the Lighthouse score is calculated, but the environment in which the test is run is extremely important. Using home dev tools is the least reliable way to measure performance as external factors like network, CPU, and extensions can influence the score. Running Lighthouse in cognitive mode and using proper throttling can mitigate their impact, but results will still vary on different devices. Implementing Lighthouse CI or using PageSpeed Insights, a website by Google, can provide more consistent results. PageSpeed Insights performs quick Lighthouse audits remotely on Google data centers, usually using the closest one to your location. Keep in mind that websites with multiple APIs can have slightly different results.
So we already know how this score is calculated. This is like 50% of truth. But what we don't know yet is where this number is exactly coming from. So we know how it is calculated. We don't know the environment. And the environment you run the test in is extremely, extremely important.
So most people use home dev tools to measure the performance, so to run the Lighthouse Audits, and I will tell you this. This is the least reliable way of doing that. Okay, so raise your hand if you're doing this. So don't, please. Because there are multiple external factors that are actually influencing your score. One of them is your network. Another one is your CPU. Another one are extensions. Yes, if you have any extensions, they are also included into this Lighthouse Audit. For example, if you have grammarly, you can take a look, it is adding something at the very bottom of your page. So we can decrease the impact of those by running it in cognitive mode, by applying proper throttling, et cetera. But this is still going to be different on different devices.
So let's say, we have 20 people in the company, everyone would run the Lighthouse Audit, the results would be completely different. And you have much more, you can have much more consistent results by if you implement something like Lighthouse CI, it's Lighthouse CI that is running on every pull request. Or use PageSpeed Insights. What is PageSpeed Insights, who heard? Good, good. Good. Because this is, that would be my recommendation. This is a website made by Google itself and you can use it to perform very quick Lighthouse audits on a website remotely on one of the Google data centers. So PageSpeed Insights usually uses the data center that is the closest to your location. Sometimes you could use a different one. For example if the one that you want to use is under heavy load. So you could have slightly different results between runs. In addition, just keep in mind that when you're having a website, on this website you probably multiple APIs.
6. Challenges with Lighthouse Usage
Nitpicking on the Lighthouse score is not the point because it will always be different. PageSpeed Insights score is not a good representation of how users experience your website. Core Web Vitals are important for SEO. Lighthouse is just an algorithm and can be tricked. Cheating Lighthouse scores is pointless and deceptive. Lighthouse is a wonderful tool, but it can be misused.
They could also have inconsistencies. So really, nitpicking on the Lighthouse score is not the point because it will always be different. There will always be some differences between runs and those differences are coming from the, you know, dependencies for example.
And so even though the PageSpeed Insights score will be more consistent, it's still far from being, you know, a good representation of how your users are actually experiencing your website. Because this score is running on the emulated budget Motorola. So unless all of your users are using budget Motorola, then it's not a very good representation of how they are actually experiencing your website.
The good news is PageSpeed Insights will also tell you how your website performs in the real world. So at the very top of every audit, you will see three metrics called Core Web Vitals. That has to be green to positively impact your SEO results. Yes, you can get an SEO boost if you have green Core Web Vitals and the boost is individual from each of the metrics. And the three other at the bottom are also quite important, especially interaction to Next Paint that will become a new Core Web Vital in March of the next year.
And another problem with Lighthouse is that it is just an algorithm. So you just learn how it works, right? And if you know how something works, then we could treat it. It gets an input, it gives an output, right? And can easily find a lot of articles that are actually showing how you can build the least accessible website in the world with 100 Lighthouse score. But you can do even more. It's equally easy to trick the performance score. You can detect the Lighthouse user agent and serve a completely different version of your website or auditing tool.
So after hearing my presentation, you can get the impression that, in my opinion, Lighthouse is completely useless. And this is definitely not my point. I think Lighthouse is a wonderful tool, really. Wonderful tool. And, you know, it contributes a lot to a faster web. But the problem is not in the tool itself, it's more in the way how we are using it. Or misusing it. Because the name Lighthouse has its purpose.
7. Using Lighthouse and Understanding User Experience
Lighthouse is a tool that guides you in improving page quality, but it doesn't provide definitive answers. Good performance is a tradeoff, and sacrificing certain elements may be necessary. Lighthouse is useful for comparing different versions of a website and auditing competitors. While a score below 100 may be a disgrace for a blog, a score of 60 is considered good for an e-commerce website. However, Lighthouse is not a reliable tool for measuring actual user experience. To understand how users experience your website, talk to them directly.
Its goal is to guide you on improving page quality, not on giving you definitive answers if it is good or bad. And I've seen websites with great user experience and low Lighthouse score and the opposite. Good performance is a tradeoff. You have to always keep that in mind. So 100 is never a goal. Because you always have to sacrifice something to make the performance better. Sometimes it's an analytic script, sometimes it's a feature. And it's not always a good business decision really to get rid of some of them. So remember it's a tradeoff. Don't treat performance as the answer. Treat performance as the ultimate goal because you might end up with a website that is only displaying text without CSS because that would be the perfect website for Lighthouse, right?
And now, when is it worth using Lighthouse? So to me, Lighthouse shines the most when we want to quickly compare different versions of our website to see if there was any improvement or maybe decrease in its performance. So it's definitely worth implementing Lighthouse in your CI-CD pipelines using Lighthouse CI. And you should also audit websites with similar complexity or your competitors to get a realistic perspective of what is a good score and what is a bad score. Because for a blog, everything below 100 is a disgrace. Really, it's a shame. But for the e-commerce website, 60? It's a good score. It's much more complex. It requires much more analytics. It's just something, it's just this type of website where performance shouldn't be the highest priority. It should be extremely high, because as we learned, we are losing money. But at the same time, the user experience is not only performance. And Lighthouse is definitely not a good tool to measure the actual user experience. Because it's synthetic data. It has nothing to do with how your users are experiencing that.
To understand how your users are experiencing your websites, well, try talking to your users. You could be surprised about how they are experiencing it. And you really don't have to set up any additional monitoring tools to check that. If you out the page on PageSpeed Insights, at the very top, you will see how it is scoring against those four most important metrics, three. So you could think, okay, where is this data is actually coming from, right? This is an old picture of the previous results of PageSpeed Insights. Right now, we have three at the top but I think it's irrelevant.
8. Data Collection and Lab Data vs Real-World Data
This data comes from the last 30 days and it's collected on the devices using Chrome. If your website is scrollable, you will see the real-world data. The data comes from the Chrome user experience report (CRUX), which collects performance metrics from devices using Google Chrome. You can access the metrics history in the CRUX dashboard in Google Data Studio. Lab data measured in a specific environment should not be optimized over real-world data. A high synthetic score does not necessarily indicate a good user experience. That's all for today. If you're interested in e-commerce, check out my newsletter.
So where does data comes from? This data comes from the last 30 days and it's collected on the devices using Chrome. It's very important technology. So it doesn't include other browsers. And if you are using Chrome, actually, you are automatically sending this data to Google. So this is how they know how the website is used. And the only requirement for actually collecting this data from any website is to make it scrollable. So if your website is scrollable, then you will see the real-world data. Nothing else you have to set up.
And this data, if you go a bit deeper, it comes from something that is called Chrome user experience report, CRUX, in short, which collects performance metrics from all of those devices using Google Chrome. And you can also get access to this, to the history of your metrics, in the CRUX dashboard in Google Data Studio. It's all free. It can generate a really nice report that will show you how your performance was changing over time, how it is changing, you know, depending on the device, et cetera, et cetera. Extremely useful. And again, the only thing that you need to do is make your website scrollable, then the data is collected.
Now, before I finish, I keep saying about this difference between so-called lab data that is measured in a specific environment and the real-world data of your users, and it's important to not optimize the former. And here is why. If you do a PSI benchmark on a few websites, you could often see this picture. What is wrong in here? Who can tell me? Anyone? Okay. So, this is an e-commerce website. And we see that the synthetic score is 81. I said 60 is good for an e-commerce website. 81 is amazing, right? But if you look how it translates to the real user experience, we see that it is terrible. Like, Orphe core web vitals are not passed. And if we look at things like interaction to an extent, it's terrible. So, this is just an example of what could happen if you only optimize for the layout health score. It could be completely missed with how your users are experiencing your website. And that's all I have prepared for today. So, I hope you learned something. If we have any eCommerce geeks in here, I'm running a newsletter about this. So, you can check that.
AI and Measuring Performance
And if we have any web performance geeks, I'm posting a lot of tweets on Twitter if you're interested. Thank you for your time and for having me. I've learned a lot about The House today. The top question right now is what about a machine learning approach to evaluating a website? AI algorithms can be used for A-B tests and to correlate business and performance metrics. There may be more tools in the future. Measuring the performance of pages requiring authentication is a challenge, but it's less important in dashboards or B2B applications.
And if we have any web performance geeks, I'm posting a lot of tweets on Twitter if you're interested. So, thank you so much. Thank you for your time. Thank you for having me. Thank you so much. Please step into my office. Really, really enjoyed the talk. I've learned a lot about The House today as well, myself. It's really funny, actually, because I thought we'd be able to get through the rest, the whole of today without really talking about AI. I know we did copilot. But the top question right now is what about a machine learning approach to evaluating a website? That's a broad question. Do you see maybe any uses for... Here's the thing. I mean, it's not that much connected to performance, but it's actually connected to the things that they're working on in the e-commerce space. So you could use AI algorithms, for example, for the A-B tests. You could use AI, for example, to see how your business metrics and how your performance metrics are correlating. And by doing that, you can predict that, OK, so perhaps I should increase performance a bit because historically it gave me a better revenue increase, or things like that. But yeah, that's, I think that's all I can say about that.
I agree. I feel like there's going to probably be more tools that will come out as time goes on. We'll see. And this is another good question that came in from Alvaro, which is, how can you measure the performance of pages that require authentication first? We currently rely on DevTools Lighthouse because page speed insights does not support this case. Yeah. And so for that, I would suggest to just have a separate version of the website that you can measure synthetically. And unfortunately, if you have pages that require authentication, they just cannot be measured that much. But at the same time, if you have websites that require authentication, these are probably like dashboards or B2B applications. And for those, the performance is usually not that much of a concern. I mean, it's much less important than, for example, in e-commerce. But yeah, it's an issue. It's an issue.
You can do some things around that. You can set up an alternative version of your website, even have some traffic on it, like synthetic traffic, but it's just hard.
Nice, nice. And we have another question as well, which is what about the use of webpagetest.org and the role that that could play for performance testing in the real world? I am absolutely a fan of web page test. So I was talking about Lighthouse, but this is not my go-to tool for measuring performance. My go-to tool is actually a web page test and just try it out. You will see how many insights you're getting, how many very actionable tips you're getting from it. Also, another one I would recommend is Yellow Labs. So Google it, Yellow Lab, Yellow Lab, actually. So Yellow Lab, I think is the most actionable one. If you run an audit, you will quickly see why I'm saying that.
So yeah, nice, nice. Another one as well is, any ideas on how to automate the synthetic measurement of the UX, of the user experience? Of the user experience, I have no idea how we measure synthetic user experience, but if you meant performance, then I mentioned I think twice a tool called Lighthouse CI. So this is basically a very simple GitHub action. Implementing it is really not hard. Also Jakub Andrzejewski, who was talking like 30 minutes ago, he has a very nice article on how to add that into your CI CD pipelines. So check that out. And this is actually great to compare how the performance was changing from one deploy to another. But again, like the numbers that the people are getting, like the real users are getting could be completely different.
Optimizing Third-Party Libraries for Performance
Google Tag Manager, Analytics, and other third-party libraries can impact performance. To mitigate this, consider using tools like Party Town or NextScript to load analytic scripts asynchronously or in parallel, separate from the main thread. Avoid loading analytic scripts together with other elements in the main thread to improve user experience.
Nice. We're going to go to the last question. I know there's a couple of more. So remember, if you are online, there is the speaker Q&A room and if you are here, you can go upstairs and talk to Philipp as well. But for the last one, Google gave us Lighthouse, but maybe if they're not optimizing Google Tag Manager, Analytics, and some of their other sort of third-party libraries and stuff, do you have any advice for third-party libraries and performance and how to maybe still increase performance when you're depending on these other libraries?
Yes, so first of all, there is a very nice tool called Party Town. Who heard about Party Town? Yes, from builder.io. So what Party Town is doing, it's actually allowing you to run those analytic scripts on a Web Worker. This way, you're not impacting the main thread. So they're just loading in parallel. But sometimes, it could also impact your analytics data, so make sure that the analytics tool that you're using is on the list of the Party Town. And also, we've heard today that Daniel will be proposing an RC of NextScript. NextScript, like NextScript, is actually a way to also load your scripts, either in a Web Worker or in an asynchronous way. So take a look at that. Basically, the worst thing you can do is loading your analytic scripts together with everything that is in the main thread. Because they will impact the user experience. Try to have them asynchronous. Try to have them in parallel, but never in the main thread.
That's amazing. I love when talks connect to each other. Thank you so much for sitting and talking with us today. And thank you so much. Give him a round of applause.