The omnipresence of open-source software and low-barrier of entry on npmjs are serving as a catalyst for supply chain security incidents that are continuously impacting JavaScript developers. What can we do to protect ourselves?
Security Controls in the JavaScript Supply Chain
From:

JSNation 2022
Transcription
Thank you, everyone, for joining me today. My name is Irantel, I'm a developer advocate at Snyk. I would like to talk to you about several stories happening in the JavaScript ecosystem in which I'm involved in several parts, of course, through my work at Snyk, trying to help developers, you know, me and yourself, all of us, build secure software, ship it, whether it's your CI, your IDE or whatever. It's a really great way to just interact and engage developers. But through that work, I also do a lot of things with the community, which is through the OS security project, or maybe through things like the Node Foundation, security, triage of vulnerabilities, and a lot of work around open source. And that kind of, like, helps me get a clear picture of, like, what is going on, where things are going. So with that said, today, I would like to share with you some real world stories and tell you how developers, like yourselves, play a very fundamental and key role in the security ecosystem and even in security incidents that have been happening recently. Also, what is the current state of affairs with the security and the supply chain security of open source and JavaScript ecosystem? Now, I realize that this is probably everyone kind of, like, relates to this in a very emotional state, right? When you go and do an NPM install. Yes? So I'm here to tell you that this is okay. You are filling something that every one of us fills before we do an NPM install. And this whole talk will basically be about why you feel that way, but also give you some preventive measures, some security controls that you can have and add tomorrow in your team to be able to mitigate the risks around things that happen there. So that feeling that you have, if you can, like, relate to that meme, is basically very based on some, you know, foundational scientific research. One of those cases, a couple of years ago, have shown us how when we install the average NPM package, we put a lot of trust out there into maintainers and third party dependency that we're bringing in. Installing the average NPM package just by that, you're probably trusting about 79 third party dependencies and 79 and then 39 maintainers. That's a lot. That means that there's going to be probably a lot of noise and potentially pain to maybe also remediate some of this. But this is the truth of the things. And I'm also here to tell you that this isn't a new concern. In fact, this whole thing about where do we put our trust as developers and how much should we trust, what should we trust exactly is something that's been talked about actually almost 40 years ago. This person called Ken Thompson, he's an award winning Turing award developer and he actually went on to create this essay called Reflections on Trusting Trust. I highly recommend reading it, but just giving you the gist of what it actually means. So this person went off and said I want to show you what it means to trust people. And then he added a back door to the Unix login program. Of course people review codes, right, on open source. So then he went on and continued this chain of adding the back door to the compiler that then compiles the login program and then it will inject it. But people also review the compiler codes. How do you compile compilers? You need one entry point to begin with. And so he actually went on and added that back door, that thing that he wanted to show us as an experiment how this works, added this to the compiler that then compiles the Unix login program. So if you've reviewed the Unix login program and if you reviewed a compiler, at that point you will not see it anymore because you still need a binary compiler to then compile all of those. And that is where things are happening. That is where the back door is inserted. A very interesting insight in revealing how software kind of like has traits and how it spawns them onto other specific programs that it gets generated out of. And so I highly recommend reading this, but it shows us like why trust is important and how much further we need to go in order to put that trust somewhere. So still, you know, open source is great and we can't deny the fact that to build software today, we need to use open source software, even when maybe the program that we build is not open source itself, maybe half of it or whatever. But that's kind of like the reality. And of course, why not? Why not use open source software, right? Because essentially what we really want is not to reinvent the wheel. We want to use work that great people have done and then we can take that work and use it to practice. And this is a great productivity tool. So by now I'm pretty sure we're hitting that two million mark on NPM. So I don't know, amazing to us, to all of you here helping us promote open source software. But at the same time, we kind of like need to understand and recognize this gift that we're given, that open source has been given to the world and what it actually means. So all of those packages there are essentially the supply. This is part of the supply chain security story. And it is relatively an easy thing to think for us that although supply chain security is maybe NPM dependencies, but it's not really just that. In fact, if we go back all the way to the basics of how software is being built, we can see that we have several connection points along the way. So you're a developer, you're building something, maybe pushing it to GitHub. That's basically your source control. Then there's like a build getting triggered. Then there's some output out of that. Maybe that's essentially maybe a package or maybe that's getting thrown onto like some CDN or whatever. And you're using some open source through the build process. So all of that is essentially how we're building software. But here are the integration points of what really supply chain security means at the very basic level. It's essentially anyone at this intersection point can actually go ahead and insert bad code, which we've seen. For example, the Linux hypocrite commits that have been happening, I think that was like last year with an incident from a university that had actually inserted that to make the points of Ken Thompson, if you want to relate to that, to begin with. And compromising source control is something that's happening. For example, the PHP source code control was not managed on GitHub and someone was able to get access to the PHP Git servers and actually potentially modify the code. That's like millions of servers running off the internet and getting the back doors or the Trojans out of that. And there's more and more. Someone can modify your code, your build might be compromised, maybe you're not building those GitHub actions correctly with the best practices. Maybe you're using a bad dependency like we've seen with Event Stream. Maybe the actual result of what you do build, what your consumers actually get does not actually go through the formal CI CD processes, which is a very related security story for us in the ecosystem because CodeCub was part of this problem where the binary was actually changed behind the scenes. So all of this is like how software is getting built today and this is the whole supply chain security story. NPM packages is a part of it, but it's not all of it. Still, I think we're seeing that developers are being right now targeted and for a few years already, if not more, as malware distribution vehicle or just targeted as spearheaded attacks to steal all of our tokens for NPM and for GitHub and for everything else because the stuff that we have on our laptops is, well, we have secrets for production, right? And access keys for staging and all of those things. So if you install an NPM package, that's kind of like if it does something bad, you should be worried about that. So with that said, this kind of like intro, let's go and talk about some preventive measures. Like what can we do as security controls for this ecosystem? Starting off with something that I've actually done in the past, which is lock file tampering and Myle, who has been here opening the session today about Yarn 4, has actually talked about this and the security aspect of package managers and how this is now getting mitigated. So back in 2019, I actually disclosed research on potential security problems with lock files. And it has to do with lock files on Yarn and NPM and whatever. It's basically how they are managed. So let's take a practical approach and see what this actually means. This is a screenshot of how I opened a pull request to a repository. And I'm pretty sure you can all recognize what's going on here. There's essentially no code change that I've proposed. But this pull request still includes a malicious package. There is a Trojan hiding here in this pull request. And this is the entire code for the pull request. Just this. Package JSON and the lock file. So what's really going on here? Because it looks like this isn't the type of squatting attack because those name of packages are legitimate. The versions check out. And if you were running something like sneak on like a git integration, it would tell you that upon pull request it does a check and none of these specific libraries and versions introduce new vulnerabilities. So essentially everything looks to be okay here, right? Okay. Well, there's a Yarn lock file here. Which we all kindly ignore, right? Because who wants to code review this? I don't. Just as much as I don't want to review regexes. This is not supposed to be human readable. Not supposed to be human consumable. Still, this pose a threat. Let's see what it is. So I expand this and this is my lock file. And if you were to start reviewing this, and you can see there's a line change of 5,000 or whatever, this is pretty long. So I'll scroll down a little bit and try to review it together. Scroll, scroll, scroll. Okay. Do you see it? Do you see the issue? Not yet. Almost. There we go. Okay. So all I need to do when I give that pull request to a project is use this really cool feature of package managers on, you know, like NPM and Yarn, which essentially allows us to install packages off of really weird things, like a gist of GitHub. Like the tarball, like essentially the head commits of a source control repository. So I can do that. And I can once I have the integrity check and everything else checks out, I can go ahead and push this into my pull request, or I can change the MS source, not from being on NPM, but rather from being from my own GitHub, and it will install it. And I'm saying this is a malicious package, because once you install it, I may introduce for you some post install scripts that will run some commands that I can install whatever I want on your machine. How do we mitigate this? This is where I was researching, disclosing this research and came up with the idea of linting lock file. So one of those things, we all use linters for different things, like yes, lint for your code quality and clean code or whatever you want to use it for, which is great. This is another one you should probably think about adding, because it's essentially giving you an ability to say your lock file needs to have specific trust policies. For example, even not related to like the origin of, you know, where something is from, the allowed hosts here, but rather maybe some software is getting installed, some tarball is getting installed out of an HTTP connection, which enables people to perform in the middle of attack. So you kind of want to have this kind of trust policy, and this is how you do it. So use it in your CI or your pre-coding talks or whatever you're using, but essentially you want to have more mitigations measured. So besides of maybe using this, you should figure out two things here, right? First of all, probably you do not want to allow or receive any contributions to lock files because of this issue, because realistically none of us is going to really review those lines of code of a lock file, so let's not, you know, open this door to begin with. And also what relates to how we manage dependencies is you essentially want to be able to have all of these dependency management spawned off to some bots, because they're good at it and they can, you know, raise those automatic PRs for us. So that's another thing to just realize. So continuing on, arbitrary command execution for all of us, that's like a feature of package managers, so it's amazing. Let's see. NPM install callers, I'm going to go and copy paste that into my terminal, but yeah, maybe I should take a few seconds before I run that command. And I should, because this actually happens. NPM allows any, as a package manager, allows any dependency along the tree, no matter how big or small, to execute commands before or after something in that tree is installed. And so if I went on and did NPM install callers on the day when there was actually a malicious callers version submitted to the NPM registry, I would actually be exposing myself to malicious, you know, packages or maybe compromised modules, maintainers and things like that. So this really did happen in January 2022, not so long ago, in case you missed it, if you had installed NPM callers, that's something that would have happened. But let's drill down a little bit to just realize what's going on there. So callers has been kind of like sabotaged by its own maintainer to run some, I won't get into this, but that's kind of like has been happening. You can see that it hasn't had any downloads in the last two years. No, like, sorry, no new versions in the last two years. But suddenly a batch new version has been, you know, released. And at this point in time, very, very quickly, you know, just the last seven days, it gains something like 100K of downloads for end users downloading this version. What's going on with this version that all of those 100,000 users have been downloading, maybe me and you? Well, let's see. Essentially that sabotaged code introduces a denial of service into this package. So if you've used callers in some way, your app might have been broken due to this. This is you can also go ahead and watch the GitHub repository, they actually suggest people were chiming in and suggesting more efficient ways of doing infinite loops. Highly recommended. Open source, right? Amazing, we get a lot of feedback on this. And you might think, like, callers, I'm not using this in my project, like, this is I've never seen it, right? Except you're living in the JavaScript ecosystem. You're not using it as far as you know. But have you checked if any of your dependencies are actually using it? Well, here's an example, real example why this is so impactful. If you were using Amazon's AWS CDK, you would have actually been, you know, targeted by this. Because AWS CDK uses it. Not only. A lot of other packages use this. Some of you I guess you might know, like Salesforce, you know, Occlave, like, prompt, like, play right, like a testing framework. All of those use somewhere down the tree this callers package. And if you had, it would have broken your workflows. The whole dependency tree is really big and callers is impactful. I won't go into this, but I'll tell you that this is a very severe issue that happened in the ecosystem. And so what can we do to mitigate this? Well, one of the first things is, of course, if you do an NPM install, please, please, please, add minus minus ignore scripts. Add those dashes possibly to every NPM command you're running or to your.npm or C so that this, you know, no one is able to run arbitrary commands on your machine. It might, the caveat is it might break some stuff. Like, if you're installing Node Sass or something like that that has need to run some native compiling, it might get broken. So you need to work around those things. But most of us probably do not need to trust everything and everyone by default have this insecurity. How about avoiding blind upgrades? Another thing that I've been seeing happening, you know, talking to developers all the time. Like they have in their CI things like running an NPM update, running a NPM check update command. And essentially they are running that in CI because they want to be able to in CI always update the latest version and test that none of the packages they depend on, they were dependent on had broken their code. Which is, I mean, understandable why they're doing it. But it is exposing you again to like a plethora of issues that could be happening. Security incidents like dependency confusions and a ton of other things. Like why you want to be there. If you had done that in your CI and that CI were running in those days where colors was out, where node IPC was out, all of those security incidents, you would be getting those malicious versions automatically. So why would you want to do this to yourself? So you need to think of like how do you do this well, right? So upgrade but with context. Which essentially means, please, again, use those automated bots. Can use GitHub, you can use Snyk, whatever you want. But use this in order to streamline those package upgrades. Not through a way that actually gives them all of this access to your machines. In fact, actually some of them can protect you. Like with Snyk what we've done is you have we are doing NPM upgrades for your packages. Not just for security but also just because of their out of date. But when we do those, if node IPC or colors 141 just gets out yesterday, we do not immediately rush to give you those updates. We've actually went and looked at a bunch of security incidents that happened in the past and, you know, what was actually happening there and how much time did it take the ecosystem to go ahead and remediate them. And that is why we kind of like have this inherent delay of about 21 days before we can like suggest a new upgraded package. So if something malicious is going on right now, you're not getting that malicious package next day before everyone had a time to react to this. Number four is what you see is not what you execute. Which is a very favorite of mine. I know how many of you have heard about Trojan source attack. But let's drill into a bit of code. Here's a bit of code I took of a Fastify middleware thing. It's a Node.js example. You can go ahead and tell me for a second, look at it, tell me where do you see the vulnerability coming in? Is it the first paragraph, the second paragraph, the third paragraph? Is it all okay? Of course, you're developers, what am I thinking? I'll highlight this and you'll find it right away. Found it? Okay. Let's see what's going on there. In fact, this code has an issue. The issue is not with the code itself. The issue is with how the code is written. I'm not talking about logic bugs or bad checks. Basically what you're seeing here is not what the compiler is going to end up saying. Or the JavaScript interpreter. In fact, if I specifically want to zoom on this and give you a different way of, like, if you were running this, I just copy pasted it and put it into some VS code, test.js and run it. Here is an example. Same code. What changed? Nothing. But if I run it, it's going to give me you're an admin. But why? Like this is semantically incorrect. Why would this happen? Well, it's happening because what you're not seeing there, what your IDE is not telling you and when you review maybe a source code on GitHub, what you're not seeing is this. You're not seeing that it has control characters in UDF that actually are shifting the way that strings are displayed. And so what you're actually seeing is a comment hidden within the code. And that changes the entire logic of how the code is. So imagine if you were getting this as a code contribution and you know a legitimate code and you are approving it because it's okay. But we were missing this whole thing. So luckily this whole Trojan source attack research which was happening in the past year has been responsibly disclosed. It means at this point in time, actually if you review things in VS code or if you use like the speak extension or if you use if you review code on GitHub, all of those have been already adopted that when things like that happen, they show you, you know, like a warning message at the top that tells you there is potentially, you know, control characters here, take a look at this. And they do make it that viewable. But before this wasn't the case. And I just downloaded like an old Atom version from like a year ago and showed you the exact copy paste of how this actually works. So some idea of mitigating it? Oh, there we go. I like that dog. I like dog in general. So it fits. So Trojan source attacks, right? We have some ways to mitigate it. And essentially what we want to do is be able to mitigate them as fast as possible. Again, you have this already in VS code IDEs and things like that. And so you can do this and either use that or you could use any ESLint plugin I wrote at it and no matter what VS code you're using, versions or IDEs or whatever, it will just detect them and tell you about this. Next up, avoiding dependency confusion. Well, let's see what this actually means. I'll run through this pretty quickly. I'll come back towards the end of it. So this has been a research that's been going on in the ecosystem for quite a bit. And a lot of actual pen testers have been using this to try and get inside companies' internal systems because of the way they are incorrectly managing dependencies and the configuration around them. So there's a bunch of this. I won't go into like how this whole thing works. But essentially dependency confusion kind of like is rooted with the private packages hosted internally for a company are not found in the NPM registry. They're like that space is open and free to register for everyone and then potential misconfiguration could allow the fact that someone is able to take that namespace, add malicious code into it, like an NPM install command and run that in your machine. How do you mitigate it? There's a bunch of tooling that are pretty simple today. But if you do things like NPM update or manually manage your dependencies, there's a lot of chance that you will be prawned to those dependency confusion attacks. So again, do not do this. You see it's like repeating theme across different types of attacks. Mitigating this is pretty easy if you kind of like want to use one of those tools. So like we created it back then. It's called a SNNK. The idea is it scans your package JSON. It even scans your git commits to understand when you inserted a private package and how was that in terms of the time frame relatable to what's an NPM. It will give you this kind of warnings where like potentially like you are right now vulnerable or maybe like a suspicious way of some package that exists, but you're not entirely sure if it is malicious or not. So with all that said, thank you so much. We're on time. Thank you for coming to my talk and I hope you all write secure code. Thank you so much. I think everybody was taking notes. And otherwise if they were not. I'll share the slide after that. Thank you. Asking for a friend. Yeah, share the slides. If you still have questions, definitely put them on Slido so I can read them from the screen here. So to pick one, what are your thoughts on feeling safety introduced by features like NPM ignore scripts when a node module can access DFS at any time during at runtime? Yes. Well, great question. There is some talks. There's actually done, there's the node foundation or today the OpenJS foundation has a security ecosystem working group. So you could also join that and this is, all of that is managed very transparently and open on GitHub so you can actually join the discussions, the monthly calls, et cetera. There is now a recent discussion about establishing a threat model for node applications. Kind of like this is related to the whole Deno versus Node.js in terms of like the security aspect. So this is being discussed here. One of my thoughts, of course, I'd be happy if there's like a way for us to essentially be able to compartmentalize or kind of like department the whole capabilities of maybe specific modules, maybe the whole app or whatever. But this is a real threat anyway, regardless of the NPM. So like this is a good thing we need to fix still. All right. And does Snyk plan on, whoa, stuff is moving around. Can Snyk be used to automate security checks on CI, CD? Yes, that's actually a pretty cool thing where if you simply, so I didn't say this before, but Snyk is free, you can just use it with like the security and whatever you want. And essentially when you connect your Git repository with Snyk, that's going to be probably the most productive experience you can have because at that point you don't need to like run it in CI yourself. Like we add the check hooks, we specifically scan for like new vulnerabilities, if existing pull requests add existing vulnerabilities or not. If you want to fail it, we'll monitor it for new vulnerabilities coming in. So the whole thing and you don't need to like, there's like a CLI and whatever, but you don't need the whole things. We'll also like automate pull requests to fix things for you. So that's kind of like going to be the ideal integration. So a good question as well. I missed something from your presentation. I don't know, the XKCD call make about dependencies. Yeah. Being managed by one single person in Nebraska. By themselves for years. That's for a different talk about open source maintenance. I think it ties into security as well, right? So beyond like checking our code and making sure that we're not doing anything less smart, to also support open source in a way that it can be more sustainable. What are your thoughts? We should do it. We should do it. I like I want to have open source support in any way I can. So I'm all for that. Does Snyk have a special program for open source maintainers so they can get credits or something? Yeah, we actually just wrapped up in May, kind of like this Snyk Love JavaScript campaign where we looked into a lot of the JavaScript dependencies that we use and went to like the GitHub profile for them and sponsored them. So that's what we did. That is really exciting. Thank you. Where can we find out more about that? Just Google Snyk JS Love. Awesome. Thank you so much for your talk. Thank you to the people asking questions. There's a Snyk booth. You can just come and ask me at the booth. And talking about the booth, you're also joining the little black box thing where you can ask more questions. Oh, true. Yes. Yes. Maybe our remote peeps are also dying to ask you questions. Hopefully join that, maybe not dying, but still interested. So please also join the Q&A booth there. Yes.