Production errors can be tricky, especially when you can’t reproduce them easily or they don't happen very often. In this talk, we’ll go through a case study of a quantity mismatch Shopping Cart issue and what troubleshooting steps we took to solve it. Then drawing from that issue some lessons that we can all learn from as frontend developers.
Lessons Learned From Troubleshooting a Shopping Cart Issue
AI Generated Video Summary
This Talk discusses lessons learned from troubleshooting a shopping cart issue in a restaurant marketplace app. The bug was difficult to reproduce but occurred more frequently as the app grew. The investigation involved checking frontend logs and using tools like Sentry and Fullstory. The solution involved using the customer's view at checkout as the source of truth and emphasizing the importance of testing and financial responsibility.
1. Lessons from Troubleshooting Shopping Cart Issue
I'm going to talk about lessons learned from troubleshooting a shopping cart issue. Today we're talking about React and working with an ecosystem. I'll share a bug I encountered in a restaurant marketplace app. It had a standard eCommerce flow, but a strange problem occurred.
I'm going to talk about today about lessons learned from troubleshooting a shopping cart issue. So, most of us have used shopping carts. My name's Hussein. I'm a staff dev at Shopify and doing full stack for about 10 years now. React for seven. I've made every possible mistake with React. There's my Twitter there, if you want to give me a follow. Chelsea fan, unfortunately.
Why this talk? So, today we're talking a lot about React. A lot of us use React. We code in React. But the reality is, you always work with an ecosystem. Every time. So, whether it's the browser you're using React on, the web APIs, like you saw a lot of event listeners. You deal with customers, if you take a step back. All of us deal with customers in our code. And we're always having a business domain. Specifically me, I'm in eCommerce now. So, we all deal with those. So, you deal in an ecosystem, it adds complexity to your app, which in turn adds bugs to your app. And today I'm talking about one of those bugs that I had.
Just a brief background about this bug. It wasn't at Shopify, it was at a startup I worked at before in 2019. It was a restaurant marketplace app, built in React and Redux, hundreds of users, millions in GMV, not that big, compared to Shopify, of course. We had about 50 employees, like about 10 to 15 were in tech developers. So, there's a happy path to the app, which is pretty standard in eCommerce. You log in, add items to your cart, provide shipping info, pay, get that money, and then receive your items. Pretty standard, right? Like this is what most eCommerce sites do. So that's what we had. But then we had a very strange problem up here.
2. Troubleshooting Shopping Cart Bug
One month or less, a customer would report a bug where they received fewer items than ordered. We checked the order on the back end, server logs, and emails to the supplier. Our data in the database matched everything, so the customer made a mistake. It's important to consider assumptions when troubleshooting bugs.
One month or less, like every once a month or less, a customer would report a specific bug, and they said they received less items than what they actually ordered. So what does that mean? So, the app is kind of like very different now. So I had to do a little bit of screenshot work. So you can see here, five cases of pineapple for example, is what they ordered. So a customer, for example, would say they actually got six or seven cases, not five. Very strange, very bad. So what do we do? In the startup life, we did the same thing any developer would do. Check the order on the back end. Make sure the numbers were correct. Checked our server logs, see if there were any errors, and that the numbers matched up. Check the e-mails that we sent to the supplier, were they a correct number. And what we saw is it turns out our numbers in the database match everything. So we said, customer, you're wrong. Our data is right, too bad. You know what I mean? So you made a mistake, essentially. And that's why it's important to, kind of, talk about our assumptions when we have bugs. So this gives you some context into what we were thinking at the time.
3. Challenges with Shorts and Missing Items
In the restaurant industry, shorts and missing items are common. Customers often order large quantities, providing ample opportunity for mistakes. The bug we encountered was rare and difficult to reproduce. However, as we grew, it started happening more frequently with bigger customers. We had to investigate it seriously, trying both backend and frontend approaches.
So in the restaurant industry in North America, we have this concept of shorts. So when you order from Amazon, you order two or three items, you're going to get those items. Nobody tells you after you order, they say, hey, too bad, I can only get you two things out of those three. In the restaurant industry, it's different. I show up with five cases of pineapple that you ordered, I only have four. So I say, hey, customer, I only had four this morning. I'll give you a credit. Or, you know, sometimes the customer sees five cases, and they say, this one looks terrible, I'm not taking it, so they give it back on the spot. So this is common in the industry. The shorts and missing items is common. The other thing is, like, when I order from Amazon, I order, like, two or three things at a time. Like, I never order, like, you know, like 50 coffee bags, even though I want to. But in this industry, you know, people order, like, 20 cans of tomatoes, 50 heads of lettuce, 60, whatever. So there's a lot of opportunity for mistakes in those items. A lot of times our customers weren't tech savvy, so a lot of times the errors were not actual errors. So this is something we were used to. And the bug didn't even happen that often. It was like less than once a month, and we couldn't reproduce it at all. So to us, you know, this is the mindset we approached the problem. And it was very, like, dismissive. And that's the big reason why I think it stayed in there. So the problem is, as we grew, it kept happening with bigger customers once a week now. So we had to start investigating it very, very, very seriously. So again, the backend approach we took, this is the one we did before. We tried it again. Try the same thing. Nothing's wrong. So what's going on? But we know it's a problem now. Okay. Frontend approach.
4. Investigating Shopping Cart Bug
We checked our frontend logs, used tools like Sentry and Fullstory to investigate a shopping cart bug. We saw the bug happening in production, but couldn't reproduce it. The issue was related to requests not reaching the server, possibly due to poor Wi-Fi in restaurant kitchens.
Check our frontend logs. We were using Sentry at the time. Still nothing. We continue to produce it in different browsers and mobile devices. QA team. Still couldn't find anything.
Finally we looked through a recording software called Fullstory. So if any of you are familiar with Fullstory, it's kind of an analytics tool that allows you to record your user sessions. Very nice app. LogRocket is another example that I love for frontend developers especially. So with this recording, I was actually able to see the bug happening in production. I saw a customer have like, order six items in their shopping cart, go, and then it's all of a sudden five. So I saw it with my own eyes, no denying it now. But still couldn't reproduce. And if you can't reproduce, how do you fix it?
So now we have to reproduce it. So how could this happen? So this is an example. Like shopping cart, you see this a lot with the plus minus kind of buttons. So what happened was a customer was clicking plus plus plus plus plus plus. And then every time we send the request to the server and you know, and we were debouncing and things like that. But every time they press the button, they were sending a request to the server. So somehow, this request to change the quantity somehow was just not getting to the server. Right? Somehow it wasn't. So by the time you came to the checkout page, you didn't see it. So I had a hunch at the time, because I was working a lot with restaurants in person. One thing I noticed is that a lot of times restaurants, the person themselves, they put their order in their kitchen. And in the kitchen, the Wi-Fi is usually pretty bad. So I was like, maybe the request is not making it because the Wi-Fi sucks. You know. So this could be the reason. So what I did, you might be familiar with this.
5. Lessons from Troubleshooting Shopping Cart Bug
We found the solution to the shopping cart bug by using what the customer sees at the time of checkout as the source of truth. We don't care about what the server says if there's a mismatch between the shopping cart and the checkout. The lessons learned include starting by assuming the bug is your fault and using screen recordings to help front-end developers.
Do some throttling, slow 3G. I'm like, let's give it a try. I mean, we've tried everything. And what do you know? That was the problem. Right? So we were able to reproduce it now. So how do we fix it? Well, we knew it didn't make it in time to the checkout page, which is why it happened. So you see the different quantities. So what we end up doing, and this is kind of an example here, is we're using what the customer sees at the time of checkout, or the time of the shopping cart, as the source of truth, not the server. What the customer sees. Because that's all that mattered. So if there's a mismatch between the shopping cart and the checkout, right? Like there's six of the shopping cart and then five at the checkout. We don't care about what the server says. It's what the customer saw. So we change it back. And that's what we did. Because these are two different pages. So that way, the version that the customer sees is always what they get. And we never had that problem again.
So just to keep this brief, the lessons learned. The source of truth is not the server. Not always the server. Sometimes it is. Start by assuming that the bug is your fault. There's a saying about it's called select is not broken, which basically means, like, you know, the database itself is not broken. You broke something. It's you that caused the problem. Come from it from that mindset. Screen recordings are very helpful for front-end developers. If you can do it, make sure you block any personal information. A lot of them do out of the box, but sometimes depending on your industry, it might be more strict.
6. Importance of Testing and Financial Responsibility
Don't test your app in the best conditions. Wrong assumptions can cost you money. At Shopify, our scale is massive with millions of dollars in sales per minute. I share these lessons to prevent repetition and pass knowledge to other developers.
And don't test your app in the best conditions. We do that a lot as developers. You're like, on my, you know, Mac book, 64 gigs of RAM, and best Internet in the world. It was perfect. So what's the problem, right? So why does this matter?
Wrong assumptions can and will cost you money. I personally cost the company a few thousand dollars. No big deal, right? Now at Shopify, our scale is very massive. $3.5 million worth of sales per minute on Black Friday. So I can't do that. But if I do that at Shopify, it's a lot bigger of a financial responsibility. So I take these lessons as a way not to repeat them at Shopify. And I pass this knowledge down to other developers to make sure that we approach bugs and with a mindset similar to what we did after we solved it.
So that's my talk.