AI Generated Video Summary
1. Introduction to Web Bots
Hey, everyone. I'm Adam. I'm super happy to be here, and I'm here to ask what's going on with bots on the web. I'm not talking about the nice ones, the testing. I'm talking about the bad ones. We'll talk about simple detections, how the bots got better. We'll talk about what's possibly the best bot out there cheating on most detection solutions. And we'll lastly get to my favorite part, which is how you can find it anyways.
But before all that, one reason I'm here is because I always like packing stuff, and now I'm the reverse engineer for DoubleVerify. They measure ads. But my job is playing hide and seek with these bots, so advertisers can avoid them. But it's not just advertisers and the games. It's going to be social media, concert ticket sellers, a lot of people facing this issue because the internet was not designed with bot detection in mind. Seriously. The only real standard is bots.txt telling bots what they're allowed and disallowed to do. Basically the honor system asking good people to play nice. When you do that, yeah, real story, when I was 16, high school product projects may or may have not dropped service to some site. But some people actually do this on purpose and at scale, denying service to real users, using what they have to steal, sneakers, sneaking around social media with fake users. I practice that part. So to make the internet better, we want to detect them.
3. Detecting Bots with User Agent and Behavior Tests
User agent, we talked about that. That property on the window navigator is going to be read-only. So bot markers, they're going to hide that with object.define property. You look at the property descriptor, you see somebody did funny stuff there, trying to hide a user agent. That's going to be suspicious image here being how you have a bad attribute identifying you. You fix it into something perfectly fine as the bot maker accidentally leave behind an artifact that can be used to incriminate you that's going to be used for detection.
This is going to be a common theme. The cat and mouse of bot detections. Another example is the bot maker can override the to string on something they're trying to hide. So you look at the to string, the to string, they hide that too. There's a fun game established there. Clever ways around this key take away being here the cat and mouse theme that's going to repeat vectors for more detections.
4. Bots, Caches, and Automated Browsers
How much each user does it? And you're going to look in the edges and one side you have zero, which is going to be my father clicking absolutely zero times after a long day of work. But on the other edge, you have people clicking 172 times per second. Caches were originally used to distinguish humans from bots, but they slow the bots down. The evolution of bots and the advancements in automated browsers have made them more flexible and harder to detect. Puppeteer, Google's automated browser, is the kingpin in botting, making it accessible and difficult to detect.
How much each user does it? And you're going to look in the edges and one side you have zero, which is going to be my father clicking absolutely zero times after a long day of work. But on the other edge, you have people clicking 172 times per second. OK, we're getting somewhere there.
And also, caches. That's what we came up with originally to distinguish humans from bots. You might be asking, hey, why don't you start with that? The reason is that the bots train, they absolutely demolish humans in simple ones. And this is for complex captures too. So, captures aren't there to prevent bots. What they do currently, the reason you see them, is that they slow the bots down.
Moving forward, let's talk about how the bots got better. Getting closer to the advanced detection, that part, the bot makers haven't been sleeping on their guard all this time. They got better, they keep getting better with every little patch they evolve. Eventually, the game became written in their favor. Ten years ago, they were struggling with Python scripts. In recent news, it's now publicly retweeted, so here's some guy complaining about this to Elon Musk. Ping me in the Q&A if you want to talk about the seekers and the hatters about this some more. I'm right about that one. Point being that the evolution thing is really good at gradual improvement and problem solving.
5. Detecting Bots: Canvas Fingerprinting and Beyond
Let's talk about canvas fingerprinting and how it can be faked. Chromium's headless mode makes it easy to fill objects with fake values for detection. Easy bots can be detected, but hard bots require techniques like hardware concurrency, behavior tests, and data analysis. By analyzing user agent data, you can identify even the best bots. The internet needs more smart people to combat the evolving bot problem.
I want to take the time to explain just one of these before I fly through the others, the Chromium here, when it's headless, it does not end the Chrome.csi, Chrome app, all that performance stuff, to think, okay, can it detect puppeteer when it's headless with this? All of the sudden, not so fast, buddy, they fill every single object with fake values like this, navigator Chrome, load times, runtime, app, so on and so forth, anything that can be used for detection, it's making it stupid easy to use, annoyingly elegant, remember, CAPTCHA's two lines here, a little bit of money, and they solve that. And all of this is going to be just with these two lines here. Everything we talked about, super easy to use, bot tests failed to find it, they just hang the spot detection on their repo, but I promise that, get to what actually works, that part, I made it within 10 minutes, all right.
What we'll hold at the time is behavior tests and session level data analysis. Behavior tests that still work usually look at window context discrepancies interacting with the DOM, and data analysis can take many shapes, for example, let's say you pick the user agent perfectly, the question is what value you put there, look at this graph, that's the user agent along navigating to an app, each point is how many people navigated with that specific user agent, you'd say there's some variance here, but this is what it's supposed to look like. Blue line's almost flat, so that's probably because the bot foster got right how the user agent is varied, they got the weight part entirely wrong. They're probably just producing these at random. Normal sites don't do this, and here's one way you can detect even the best bot.
So at the start I asked you what's up with bots on the web? I can't tell you for sure, but what I do know is that they're getting better, we need more smart people like you to be aware so that the internet becomes a better place.