In 2018, a new attack vector against JavaScript codebases has been published: Prototype Pollution.
At first glance, it seemed pretty limited in impact: it would basically be a good way to crash some code. However, multiple cases of Remote Code Executions have happened based on this vector.
In this talk, we will clarify what are prototype pollutions, their real impact and of to prevent them from happening in your codebase.
Prototype Pollution in JavaScript
From:

Node Congress 2023
Transcription
Thanks a lot. So we'll talk about prototype pollution in javascript, but before we do that, let's talk about something very important. Me. So the bio is a bit outdated and that's on me. I don't work at data anymore, like since two weeks ago. And I'm working at a company named credit tools. That is not even a company I need to tell the lawyer to incorporate. And it's not even an association because it's basically myself. So if you want to feel sad for me, you have to know that I feel so lonely that I do my standard meetings with chat GPT in the morning. And speaking of the logo of the company, that's another AI that designed it. So yeah, that's basically me being unemployed on sweet French unemployment money, trying to build a company. But let's talk about prototype pollution. So first of all, before we pollute prototype, let's learn what prototypes are. Even if it's a javascript conference, it doesn't hurt to go back. Also, you can notice there is no design in my slides whatsoever, because designing these slides consisted of removing the data log logo and color scheme late yesterday in the plane. Then we will talk about the impact of prototype pollution and avoiding prototype pollutions in javascript. And this talk has been designed for 25 minutes and I only have 20 of them. javascript is prototype based and somewhat typed. Okay, what do I mean? Let's start with the type part because it's probably the most trolly. Well, you've got the type of operator and you can check the types of variables. So types of true is a boolean. Type of null is an object. That's called a billion dollar mistake, but that's not the topic of this talk. Type of 10 is number, type of 10n is bigint, and so on and so on. We even have the undefined type for stuff that are undefined. javascript is so great. And pretty much everything else is an object. So objects are objects. String objects are objects. Regex are objects. Null is an object. And objects are objects, obviously, again. And objects have methods. Like if you create the object foo1 and call hasOnProperty on it, it will return true. And if you create the object foo1 and check hasOnProperty, hasOnProperty, it will return false because the method hasOnProperty does not belong to the foo object. So where does it belong? And let's use the best tool in history of programming, the debugger. And we can check our object and see that the method hasOnProperty exists on something that's called a prototype. And the prototype can be accessed from the object directly. So what are prototypes? In javascript, objects have prototypes. When a method or a property is not found on an object, it's looked up on the prototype. But prototypes are objects. So if a method or a property is not found on a prototype, we check it on its prototype. But prototypes are objects. So if a method or a property is not found on the prototype, we can check in the prototype. And that's what we call the prototype chain. It means that we will recursively check the whole prototype chain until we get to null to find a method or a property. And if it's nowhere in the prototype chain, it's undefined. And undefined is not a function. So prototype chain, basically it's a tree. So on the left-hand side, I defined a prototype named myProto. I use that to create objects. I defined a class with a constructor, oldStyleClass. I give it a prototype directly. And I even define a class with the class keyword. And basically, when I create objects on the right-hand side with these, they will share prototypes or not. Okay, let's go a bit deeper. Etem3 has a prototype, myProto, because on line 8 on the right-hand side code, we set object.setPrototype on this object. So we have a method to arbitrarily put a prototype on an object. But Etem1 and Etem2, they have the prototype of the class 1 because we created them with new and called the constructor of the class at the bottom of the left-hand side. But since this class extends oldStyleClass, well, the prototype of oldStyleClass defined on line 8 to 13 on the left-hand side is still on the chain. So if I want to check the method bar on Etem1 or Etem2, it won't be available on cl.prototype, but it will be available on oldStyleClass.prototype. Okay, is that clear? I hope. Please don't throw stuff at me. Okay, thanks. So how do we access the prototype of an object? Yeah, but the people telling me it was clear know everything about prototypes. So how do we access the prototype of an object? We have multiple ways. So let's have a class named MyClass, because I'm very original in the way I name my classes, and create two items, MyItem and MyItem2. If we check if showProp, which is a method on the class, is available on this object, they are not, because hasOwnProperty on line 13 tells us it's not available on the object directly. But if we do object.getPrototypeOfMyItem, this will return the prototype of the class, and this has the ownProperty showProp. That's what we had on line 14. We can also access the prototype with __proto__. So if we do MyItem__proto__.hasOwnProperty showProp, it will return true. And if we console.log MyItem.constructor.prototype.hasOwnProperty showProp, it will return true too. And what is worth noticing is that there is a single instance of this prototype in the heap, meaning MyItem__proto__ is exactly the same thing as MyItem2__proto__. Pretty sure nobody said prototypes that much in their life in a short amount of time. So what's a prototype pollution? Well, a prototype pollution happens when an arbitrary payload handled by the javascript codebase can overwrite properties or methods somewhere on the prototype chain of one or multiple objects. This usually happens when we use a merge function, and we will see in detail why. So what do I mean? Let's take an example. Let's take the Herc library in version 4.1.2.0. And we have something called malicious payload on line 2. And that's a string that contains __proto__. Oops, it worked. And then we create an object named A. And this object has nothing to do with malicious payload. In this whole codebase, it's never called in the same function as malicious payload. It doesn't know about malicious payload. So we console.log A.oops, and it's undefined, because oops is not present on the A object, nor is it present on its prototype chain, because it inherits from the default prototype chain of any object in javascript. Then we call Herc.merge on a brand new object, string to see with A. And JSON.parse malicious payload. And after that, when we call A.oops, it will return it works, because the prototype of all objects, that is, I mean, the prototype of objects, the main prototype of the heap of this javascript codebase, now has the oops property. How come? Well, it's because of a recursion. So this is the merge function that we've seen on line 5 on the previous slide. And this function is actually performing recursive merge, meaning that if you want to merge two objects and they have properties that might be in common on their subobjects, it will recursively check for them. But because of __proto__ being an accessible property on most objects, well, the function will say, oh, I've got something on __proto__ to write on this object, and this object has __proto__, so we'll take the value of the prototype of the object and start writing stuff on that. So what I mean is that through the merge function, because of this recursive call on line 18, we go up the prototype chain and we write oops. And that works in Lodash too, because I want to terrify everyone. I mean, in older versions of Lodash. You've got another payload, constructor, prototype, isAdmin. You remember I showed you we can access the prototype of function of an object by going through constructor.prototype. Well, we do exactly the same thing. We have an object B, B.isAdmin isn't defined. We use Lodash.merge on an object that has nothing to do with B and payload as a second argument, and then B.isAdmin is true. And Lodash is insanely popular. This slide is not for javascript code, it's for security code. You all know that Lodash is insanely popular and cool. So what are the impacts of prototype pollution? Because it sounds like a good way to mess with someone's code base, but can it be evil? So since 2018, there have been more than around 200 CVEs. CVEs mean public vulnerabilities disclosed for everyone to know. And not every vulnerability gets a CVE. So that's probably the top of the iceberg. There have been a few remote code execution in Kibana and the past server. And no, I'm not working at Datadog anymore, so I can't say what I think about Kibana without looking somewhere. So expect some slides about that. And KTH University published a very interesting paper on this topic last summer. So the Kibana case, CVE 177609. Basically, it's a very, very fun one. I love it. So Kibana uses child processes from node.js for certain stuff. Like if you want to do some computation in Kibana, it will spawn a child process. And when you spawn a child process from node.js, it will share the environment from the parent process to the children process through a javascript object. You see where this is going? There's an environment variable in Node that's named NodeOption that you can use to pass command line arguments to the Node binary. So instead of doing Node dash dash something, you can do NodeOption, put your dash dash something in it, and then when you start Node, it will catch this up. And there is also the dash dash e or dash dash evil command line option in Node that enables you to run code passed as a string in the command line. So the example here is Node dash e console.log hello that actually starts a Node process and runs console.log hello. Let's prototype pollute this. Well, oh, I do have, oh yeah, I'm missing something. Okay, I'm missing a slide. Sorry about that. So basically, there was a prototype pollution in Kibana, and you were able to write Node option in the prototype chain, you know, you would prototype pollute and every javascript object in this heap, when you check if Node option is defined, it will respond with, oh yeah, there is a dash e execute by evil record execution with the string. And that's basically what happened to Kibana. So if you had access to a Kibana server, to a Kibana UI, you could run arbitrary code on the server, and then spawn another child process with your shell arguments on everything you want. So we fixed that. Dash e is not allowed in Node option anymore, but there is a bypass and it's public, but I won't talk to that. And Kibana has fixed the prototype pollution. The original paper is very interesting, very accessible, and there is also a talk on that topic. Very, very interesting to watch. Parse server. So Parse, who is familiar with Parse here? It was very, very popular a while ago. It's basically a backend project for mobile applications that exposes an api in front of a mongodb server being acquired by Facebook, shut down by Facebook, and knows there is only an open source version, not only Google shut down, probably people love. And basically that's an api in front of mongodb. You can store objects, you can require objects from mongodb through a web api. It's vulnerable to prototype pollution before it was fixed, of course, and it uses a library that's named bson.js. So bson is a format to store objects in mongodb. It stands for binary JSON or something like that. But bson allows you to store functions that will be stored in mongodb and you can un-serialize. But by default, they are not un-serialized. Because un-serializing a function that would come from a database would basically mean let's do eval on that string that comes from the database I have no idea about. And you don't want that to be a default. But if the option eval function on the object used an option for the bson library is true, well, you will be evaluating those arbitrary functions. And because sparse allows you to write pretty much anything you want in your database from the network, because that's sparse, you could actually run arbitrary code when the object is retrieved, which is, oh my God, that's because they checked the deep version. How to prevent prototype pollution? Because I'm a responsible person, I don't want you to feel scared and say, yo, let's use a language without prototypes like Python. How to prevent? Well, let's filter out in our merge functions. You see, for instance, on line 9 here, line 9, line 3, on line 4 here, that we filter out __proto__. And that's what we've been fixing a lot of libraries. Lodash has been having more prototype pollution than any other library I know. And they have been all fixed one by one. If you find a new one, feel free to responsibly disclose it to their maintainer, whatever libraries it is. Sometimes you will know that your code path is critical and you want to make sure that you're using as-on property. Well, as-on property can be tampered with with third-party attacks, but that's something else. So make sure that if you expect a property to exist on an object, you make sure that it exists on the object and not on its prototype chain. So this one I like. It's what I call building defensive objects. I don't know if that's the academic term, but you can use object.create() that will create a new object with its arguments as prototype. Well, null is an object, so you can do object.createNull. These objects won't have all the methods you expect them to have, as-on property, as-on symbols, get own property descriptors. But this object will be safe from prototype pollution because it doesn't have any prototype. Sanitization. Make sure that stuff that gets in your process from the outside are safe. Do data validation. I love the Joy library because I'm a happy thin boy, but there are a lot of amazing libraries to do data sanitization. Use them. They are very cool. And anyway, you should use them if you're building a web server with node.js. As mentioned, that will also probably remove your surface of attack to no SQL injection, so go for it. Conclusions. Oh my god, I'm on time. What's now? Monitor incoming objects. node.js has an option to disable proto__. It might break some code, so be warned that it might break some code because of the internet. But you can use it. And for sanitization and prototype-less objects. Oh, you remember why I told you you should use Python? That was a joke. In January, someone published a paper. There is no proof of actual use in the wild for malicious attacks, but there's been a paper about class pollution saying that, oh, basically Python is vulnerable to that too. So, there's no way safe. Some links. The slides will be on Twitter. Just shout on Twitter if you want the slides. Let's stay in touch. You can find my Twitter with this short URL. I hope you enjoyed this presentation and you have questions that I can answer. Thanks so much for being an amazing crowd. We will come to this question in due course. There was a lot of suggestions there on how you can mitigate the risks brought about by prototype pollution. If our audience were to go home and go to their codebases that they're currently working on, and with the concerns that they may now have, what would be the first thing that you encourage people to do? Maybe it's a more simple low lift action or starting off a more significant piece of work. That's a very good question. I think the first thing to do after being at a conference talk about security and Node all together, when you come back, is check where your objects come from. And that's true for mitigating prototype pollution, but also all sorts of injections. So, when I mean that, I mean, at some point, if you have a web application, it might accept objects from the outside. It can be the query strings, it can be the body. At some point, there is a library in your codebase that will pass a text HTTP request and return a javascript object. That's probably the main source of malicious inputs. Because your application, it can be vulnerable to third-party attacks from your npm modules or for the inputs coming from the network. And me, I prefer to think about the things coming from the network. Talk to ZB if you're interested in the other threat model, the other part of the threat model. So, check the object that comes into your app and check what you are doing with them. Are you sanitizing them? Do you know their shape? Is there an error in your code when they don't look what you expect? That means properties that you're expecting of the object, but also, are you sure that the object doesn't have property you don't expect? So, objects that get into your app, you must know what they look like. And that will be the first way to know that nobody is injecting some kind of underscore underscore proto or constructor elements on your object. Cool. Thank you so much. So, some questions from the audience. Is there an easy way I think you touched on this, but I'll ask the question explicitly regardless. Is there an easy way to check if my service is vulnerable to this attack? As someone who uses a lot of third party npm modules. Sneak audit, npm audit, you will already know the vulnerable methods if they are known. Check if you're using merge methods in your own code. But yeah, basically, making sure you don't have known vulnerability in your code base is the first step. Ideally, framework should be able to handle these kinds of things for us. Are they? Like, is there something in something like Express or fastify or equivalents that prevents prototype pollution? As far as I know, no. But I'm not up to date on fastify documentation. Express documentation, I'm up to date because it hasn't changed in five or six years. But I don't think so. That's a very good point. I guess Matteo will say PRs are welcome. So, feel free to PR fastify. The next question is something you mentioned in your talk, but it's had a couple of thumbs up. So, I want to ask it regardless. What is the dash e argument in the node options bypass? Okay. So, dash e is short for dash dash eval. And basically, it gives you the opportunity to pass a string as argument instead of a file and run this string as if it was a javascript file. So, instead of doing node index.js, you do node dash e and you put a string with your old javascript code and that will execute it. Great. Thank you. Does object.assign also create polluted prototypes? That's a good question. I want to say no, but I'm not sure of it. So, that's your homework for tonight. I don't think so, but worth checking. Didn't think you were coming to Node Congress for that homework. Are there other ways of doing RCE with prototype pollution without node options, like in the Kibana example? Well, there was the Bison example with a function being serialized. So, I guess so. In a way, that really depends on the application you're attacking. Does this application have string evaluation at some point, whether it's through environment variables, through eval, through the end of transcode? In that case, yes, but it's very business logic dependent. Thank you. Got a few more to get through. Should we, based on your talk, should we therefore be using maps more instead of objects to avoid these problems? I mean, yes. Yes. I mean, just make sure that you're if you're very, very cautious about that. You need to ensure that there is no intrinsic pollution, meaning that someone overrides the map based methods. They can't do that, as far as I know, with prototype pollution, but they can do that with a malicious third party. The node.js code base is actually very defensive against that. So, you can check. But yeah, maps is probably one of the smartest things you can do in your web application. As far as I'm concerned, I love maps. If you've been to James' talk about asynchronous storage, the first version I proposed of asynchronous storage would force you to use a map as the store. So, as a map lover, I would say yes, but I'm biased. Just a tad. What if I merge an object with a native spread operator instead of the old version, an old version of lodash? I think you're safe from prototype pollution, but it's like object data side. I never tried it. So, I can't I can't be assertive on that. I think with a lot of these questions, it is probably the case of like as you're trying to solve for this risk, that you are trying all the solutions and seeing what the outcome is, right? Also, I kind of hope if these ways were vulnerable to prototype pollution, we would vastly know it as a community. That's why I tend to think we are safe, but as a security person, I don't want to end up saying we are safe. And you don't want to assume either. Exactly. Are there or could there be some kind of ES lint rules that detect these problems? Good question. It's hard because it needs a bit of taint tracking. There could be at least a semgrep rule that will check that you're merging based on incoming objects. So, semgrep, for those who are not familiar, is a static analysis code. It's open source. It's written in OCaml, built on the west coast. It's really, really cool. And it's designed to find vulnerabilities. But it's a bit smart. It's more powerful than most linting tools because it has some kind of symbolic execution engine and can basically run part of your code in a VM. I mean, executes part of your code and decide if it's vulnerable. So, I'm not sure about linter, but I'm pretty sure about semgrep, yeah. Cool. Awesome. The last question that we have in this list. What about merging constructor.prototype instead of underscore, proto, underscore? Would that get through the underscore, underscore, underscore, proto, underscore, underscore filter and work? Say that again? I'm not sure. What about merging constructor.prototype instead of using underscore, proto? It depends on the vulnerability. So, if I understand the question properly, that's how it has been fixed in lodash. It kind of depends how these merging functions are actually working. We want to be able to merge prototype to build mixins anyway, so we don't want to prevent that altogether. Awesome. Thank you ever so much. Thank you, everyone, for wow. We went from like just, oh, we should ask some questions, like, boom, huge list of questions. And that's awesome. On behalf of our whole audience, thank you for, again, a really thought provoking and tactical talk about how to mitigate the risks in prototype pollution. Thanks so much for having me and thanks so much for being an amazing crowd. Let's do a big group clap. A ten!