Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
AI Generated Video Summary
The Talk discusses the recent compromise of the UA parser.js package and the need for supply chain security in the open source community. It explores the reasons for security risks in open source and the need for a new approach to detect and block malicious dependencies. The different attack vectors and maintainer vulnerabilities are also discussed. The speaker emphasizes the importance of evaluating packages and protecting your app, as well as the need for a mindset shift in how we view open source. The Talk concludes with an introduction to Socket.dev, a tool focused on supply chain attack detection.
1. Introduction to Node Modules and Open Source
Hello and welcome. I'm Ferras, an open source maintainer with experience in creating npm packages. Let me tell you a story about a popular package called UAParserJS and its journey from being published on GitHub to becoming widely used.
Hello and welcome. Thanks for coming to my talk. It's a jungle out there. What's really going on inside your Node modules folder?
I'm Ferras and I'm an open source maintainer. I started WebTorrent, which is a peer to peer file transfer protocol and standard JS, a linter that catches bugs and enforces code style. I've been doing open source since 2014 and have created over a hundred npm packages. In the past, I volunteered on the Node.js board of directors and I also teach a class on web security at Stanford University. Now I'm the founder of a startup called Socket, which helps protect the open source ecosystem.
Before we get started, let me tell you a story. On January 13th, 2012, over ten years ago, a developer named Faisal Salman published a new project to GitHub. It was called UAParserJS and it parsed user agent strings. Now, lots of people found this project useful, and so over the next 10 years, Faisal continued to develop the package, along with the help from many open source contributors. He published 54 versions as the package grew in popularity. It eventually grew to 7 million downloads per week, eventually being used by nearly 3 million GitHub repositories.
2. Compromised UA Parser.js Package
Now, let me tell you a different story. On October 5th, 2021, a hacker offered to sell the password to an NPM account controlling a package with over 7 million weekly downloads. Two weeks later, UA parser.js was compromised, resulting in the publication of three malicious versions. These versions contained malware that executed upon installation, leading to the theft of passwords and the mining of the Monero cryptocurrency. The package was reported and removed after four hours.
Now, let me tell you a different story. On October 5th, 2021, on a notorious Russian hacking forum, this post appeared. A hacker was offering to sell the password to an NPM account that controlled a package with over 7 million weekly downloads. His asking price was $20,000 for this password.
Now, this is where the two stories intersect. Two weeks later, UA parser.js was compromised and three malicious versions were published. Malware was added to these packages that would execute immediately whenever anyone installed one of the compromised versions.
So, now let's take a look at what that malware does. So, this is the package JSON file for the compromised version. And you'll see that it uses a pre-install script. So, this means that this command will run automatically anytime this package is installed. So, now let's look at what that script does. So, the first thing you'll see is that it splits based on the operating system of the target. On Mac, nothing happens, which is lucky for Mac users, but Windows and Linux users aren't so lucky. And you'll see here that command prompt is spawned for each of these platforms using child-process.exec.
So, now let's take a look at what that pre-install.sh script does. The very first line fetches the user's country and figures out whether the user is coming from Russia, Ukraine, Belarus, or Kazakhstan and stores that in a variable. Now if the user comes from one of those countries, then the script exits without doing anything further. However, if you come from any other country, then the script proceeds to download an executable file from this IP address, mark that file as executable, and then run it. And now based on these command line flags, you can see here that this program is a Monero miner, which is going to be used to mine the Monero cryptocurrency for the attacker.
Now this is the script on Windows. It's very similar. So it starts off with downloading that same or similar Monero miner, but it also downloads a DLL file as well and runs that. And then here you can see it just starting up the Monero miner and registering the DLL file on Windows.
Now, what does this extra DLL file do? Well, it steals passwords from over one hundred different programs on the Windows machine, as well as all the passwords in the Windows Credential Manager. So yikes, this is a really nasty piece of malware. And anyone unlucky enough to run this lost all their passwords and had to do kind of a complete reset of their online accounts. Not a fun time. So this is kind of the aftermath. So this package was published for about four hours, and the open source community was pretty diligent and reported it.
3. Compromised Package and Supply Chain Security
4. Why is this happening now?
We're downloading code from the internet written by unknown individuals and executing it with full permissions on our devices. It's a miracle that this system has mostly worked for this long, but not everyone is good. Let's explore why this is happening now.
So one question you might ask is why is this happening now? I want to start by just pointing out that what we're trying to do here is kind of crazy. We're trying to download code from the internet written by unknown individuals that we haven't read that we execute with full permissions on our laptops and our servers where we keep our most important data. So this is what we're doing every day when we use npm install. And I just have to say really quickly that I personally think it's a miracle that this system works, and that it's continued to mostly work for this long. It's a testament I think to how good most people are. But unfortunately not everyone is good. So let's dive into why this is happening now.
5. Reasons for Security Risks in Open Source
90% of your apps code comes from open source. Open source allows us to quickly build powerful web apps without being experts in various domains. The abundance of transitive dependencies and the complex dependency tree in node modules pose security risks. Many developers don't read the code they execute, relying on others to find vulnerabilities. Malicious packages can remain undetected for months. Popular tools provide a false sense of security by scanning for known vulnerabilities.
The first reason is that 90% of your apps code comes from open source. So we're really standing on the shoulders of giants. And open source is the reason why we can get an app off the ground in hours and days instead of weeks or months. And it's the reason that we don't need to be an expert in cryptography or in time zones or the virtual dom to build a powerful modern web app. It's also the reason why your node modules folder is one of the heaviest objects in the universe.
Another reason is that we have lots and lots of transitive dependencies. The way that we write software has changed. We use dependencies a lot more liberally. So installing even a single dependency often leads to many, many transitive dependencies that come in as well. A 2019 paper at the conference found that installing an average NPM package introduces an implicit trust on 79 third party packages and 39 maintainers creating a surprisingly large attack surface. So what we have here is a visualization that my team at Socket created that shows you what Webpack looks like if you kind of go into the node modules folder and really look at what's inside. So each gray box here represents a package and each purple box represents a file or files inside of a package. And so as you take away each layer of the dependency tree, you'll see that you just keep finding more and more packages nested inside the top level package, you know, until you eventually get down to the bottom here. But this is just an insane number of files and just a lot of modules flying around here.
The next reason is that no one really reads the code. You know, there are some people who do but by and large, people don't look at the code that they're executing on their machines. One big reason is that NPM really doesn't make this very easy. If you go to the package page for UA parser.js and you click on the explore tab here, you'll see that you can't even see the files of this package. So, people have to resort to clicking the GitHub link and going and checking GitHub and hoping that the code on GitHub matches the code that's on NPM which is not necessarily true. But that's okay. That's okay. We can rely on Linus' law that given enough eyeballs, all bugs are shallow. So, if there is a security issue in a package or malware in a package, we can rely on others to find it. Right? But if everyone does that, then who is finding the malware? And so, maybe this is the reason why on average a malicious package is available for 209 days before it's publically reported. This comes from a research paper by Omatal. So, that's 209 days during which the wrong NPM command can end extremely badly. And I find this number, personally, very shocking. A 2021 paper at NDSS, a prestigious security conference also found similar results, including that 20% of these malware persist in package managers for over 400 days and have more than 1,000 downloads.
And the fourth reason is that popular tools give a false sense of security. A lot of popular tools scan for known vulnerabilities.
6. The Need for a New Approach
In 2022, scanning for known vulnerabilities is no longer sufficient. It can take weeks or months for a vulnerability to be discovered and reported. Vulnerabilities are accidentally introduced by maintainers, while malware is intentionally introduced by attackers. Fast development and quick updates increase the risk of supply chain attacks. We need a new approach to detect and block malicious dependencies.
So in 2022, I believe this is no longer sufficient. We can't just scan for known vulnerabilities and stop there. And yet that's what the most popular supply chain security products do, leaving you vulnerable. The thing is, it can take weeks or months for a CVE or a known vulnerability to be discovered, reported and detected by tools. And so it's just not fast enough. So it may be worth taking a minute here to just quickly distinguish between known vulnerabilities and malware because they're very different. Vulnerabilities are accidentally introduced by maintainers, by the good guys, and they have varying levels of risk. So sometimes it's okay to intentionally ship a known vulnerability to production if it's low impact. Even if you have vulnerabilities in production, they may not be discovered or exploited before you update to a fixed version. So you have some time to address these kinds of issues usually. Now malware on the other hand is quite different. Malware is intentionally introduced into a package by an attacker, almost never the maintainer. And it will always end badly if you ship malware to production. You don't have a few days or weeks to mitigate the issue. You need to really catch it before you install it on your laptop or on a production server. But in today's culture of fast development, a malicious dependency can be updated and merged in a very short amount of time. And so unfortunately, this leads to increased risk of supply chain attacks. Because the quicker you update your dependencies, the fewer eyeballs that have had a chance to look at the code. So I really think we need a new approach to detect and to block malicious dependencies.
7. Supply Chain Attack Vectors
Before we dive into the mechanics of a supply chain attack, let's explore the different attack vectors. The most common one is typo-squatting, where attackers publish packages with similar names to legitimate ones. Another vector is dependency confusion, where attackers register packages with the same name as internal versions, leading to accidental installation. The third vector is hijacked packages, where criminals infiltrate popular packages to steal credentials or abuse resources for cryptocurrency mining.
But before we get into that, let's look a little deeper into how a supply chain attack actually works and the mechanics of it. So we downloaded every package on npm, and we spent a few weeks poking around. The download was 100 gigs of metadata and 15 terabytes of packaged tarballs. And as we poked around this metadata and all these packages, we noticed a few trends in the types of attacks we saw. So I'm going to go over these attacks. These are what we found.
So there are attack vectors, which is sort of how the attacker tricks you and gets you to run their code in the first place. And then there are attack tactics, which are what the attack code actually does or the techniques that the attacker uses to hide their code. So let's talk about attack vectors. The first and the most common attack vector is typo-squatting. So typo-squatting is when an attacker publishes a package which has a very similar name to a legitimate and popular package. And so you can see here, there are two packages here with very similar names and one of these is malware and one of these is the real package. But I would guess that it would be hard for you to know that without actually cracking open these packages to see what's inside. So let's open up the malware package and take a look at what it's doing. So you can see here, again, it's using an install script which is a very common technique that malware uses. And if you open up this install script to look at the code, you'll find that the file is heavily obfuscated. But I can tell you, even without knowing exactly what this code is doing, you can bet this is not something that you want to run on your machine.
The next attack vector that we saw is called dependency confusion. So this is pretty closely related to typosquatting. Dependency confusion happens when a company publishes packages to an internal NPM registry and uses a name that hasn't been taken yet on the public NPM registry. And so later, an attacker can come along and register a package with the same name as the public version and confuse internal tools so the internal tools will accidentally install the public version. So this is why it's called a dependency confusion attack. So looking through the recently deleted NPM packages, we were able to find a bunch of likely dependency confusion attacks, and most of these packages had malicious code in them. So all these packages have names which appear to conflict with internal company package names. And you can see here, a whole bunch of different organizations, including governments were affected by this. And here are a bunch more, clearly targeting these specific companies here in this list.
And finally, the third vector that we see a lot is hijacked packages. So these are the ones that you usually see in the news quite a lot. You know, so criminals and thieves finding ways to infiltrate our communities and infect popular packages. Once they infect a popular package, you know, once they get control of it and they can publish to it, they'll steal credentials or install backdoors or abuse compute resources for cryptocurrency mining.
8. Maintainer Vulnerabilities and Attack Tactics
Maintainers can become victims of attacks due to weak passwords, malware on their laptops, or being tricked by malicious actors. Attack tactics include the use of install scripts, privileged API usage, and obfuscated code. Install scripts, although having legitimate uses, can be a vector for malware. Privileged API usage allows attackers to access the network, file system, and environment variables to steal secrets. Obfuscated code makes it difficult to understand its purpose, and attackers may publish different code to NPM than on GitHub.
And so, you know, these are, you know, these happen for various reasons. So sometimes it's because the maintainer chooses a weak password or reuses the password or maybe the maintainer gets malware on their laptops. This is also kind of not helped by the fact that NPM doesn't enforce 2FA for all accounts currently, although they are starting to enforce this for the most popular accounts. And finally, sometimes maintainers just get tricked and give access to a malicious actor. This is partially just due to the fact that maintainers are overworked and when someone offers a helping hand, it's sometimes hard to say no to the help. So this is also a big vector as well.
So now let's talk about some attack tactics. So what does this attack code actually do? So as we mentioned, install scripts are a huge vector. Most malware is in install scripts. And so this is a quote from a paper we mentioned earlier, so most malicious packages, actually 56% start their routines upon installation, which might be due to poor handling of arbitrary code during install. So, in the npm package manager, packages are allowed to just say, hey, when this package is installed, we wanna run some code. And so unfortunately though, install scripts do have some legitimate uses, so we can't just disable them. It's not an easy problem to solve. So let's take a look at, just an example, another example of an install script. Again, you'll see it right here in the package json file. Super common. Yep, so, the next is privileged API usage. So we see packages accessing the network, accessing the file system, and accessing environment variables. This is very, very common because when an attacker runs code, what they want to do usually is steal some secrets, and they need the network to exfiltrate those secrets. So this is a typical example of malware that does that. So you can see here that it's making an HTTP request to an IP address, and it's sending some data. The data it happens to be sending is process.env, which contains all the environment variables in the environment. And then here is actually another file that it includes, which is a different exfiltration technique that uses DNS instead of HTTP. So the way this works is it creates a DNS resolver, and then it does a, it gathers the environment variables and then it does a DNS lookup with those variables as the subdomain. So it's just another way to get the data out of the system. And finally, we have obfuscated code. So we took a look at an example of this earlier. So obfuscated code like this is just, obviously, it's really hard to see at a glance what it's doing, although there are tools to attempt to unobfuscate code like this. There's also another kind of obfuscation, which is attackers can publish different code to NPM than they do on GitHub. And so, you know, when they do that, as I mentioned earlier, NPM doesn't make it easy to see what code is actually in the NPM package.
9. Evaluating Packages and Protecting Your App
When developing our product, the Wormhole, which enables secure file sharing, we prioritized security and privacy. We implemented common security practices, such as early consideration of security, writing tests, and conducting code reviews. However, we realized there was room for improvement and began exploring solutions.
And so, a lot of people who are trying to evaluate a package will rely on the code that's on GitHub, and there's no guarantee that that code is the same. OK, so now let's talk about how you can protect your app. So, you know, we asked ourselves this question, when we were working on, my company was working on a product called the Wormhole, which lets you share files with end-to-end encryption. And our goal was to try to build the most secure and private way to send files. So, you know, we did all the usual security things that we could think about. You know, we thought about security early in the design process. We wrote tests, we enforced code reviews, and we were pretty thoughtful about the dependencies that we chose to use. But, you know, we still felt like we could do better. And so we started thinking really carefully about this problem and what we could do to make it better.
10. Choosing Better Dependencies and Mindset Shift
Choose better dependencies and take responsibility for the behavior of open-source code in your app. The popular MIT license states that open-source code is provided as is, without warranty or liability. We need a mindset shift in how we view open-source.
So the first kind of thing I recommend is that you can just try choosing better dependencies. You know, if you shift code to production, you are ultimately responsible for it. And you know, as an industry, I think we need a mindset shift here because people assume that they can just install stuff from the internet and that it's going to be safe. And it's not necessarily true. And if you're shipping code to production that includes open-source code, then really, ultimately, that code is part of your app. And so you are ultimately responsible for the behavior of that code. And, you know, the most popular open-source license, the MIT license, actually literally says this in the license. It says that the open-source code is provided as is with no warranty of any kind. And in no event shall the author be liable for any claim damages or liability. And so, you know, while this is legally true, most people don't think of their open-source this way. And I think we really do need a mindset shift.
11. Dependency Evaluation and Updating
We often rely on heuristics to pick dependencies, without thoroughly examining the code. Socket provides a tool to quickly assess package security, highlighting install scripts and binary/native code. Quality scores are also displayed. Dependencies of packages, like Angular calendar, may have invasive behavior, such as running shell scripts and accessing the file system or network. Socket can highlight these behaviors in the code, making it easier to evaluate packages. Additionally, consider updating dependencies at the right cadence to avoid known vulnerabilities and outdated code.
The other thing is very few of us actually read the code that we're shipping to production. And so, we rely on other heuristics to help pick dependencies. So maybe, you know, we look at does the code get the job done? Does it have an open-source license? Does it have good docs? Does it have lots of downloads and GitHub stars? Does it have recent commits? Does it have types? And does it have tests? And we're not really cracking open the code to go much beyond this.
So, what that means is that we're sort of not aware of what the code may be doing. And so, we built a tool at Socket to help with this problem. So you can quickly at a glance get an idea of the security of a package. And so, this is what it looks like. So, you can go to Socket and look up packages to figure out what behavior the package has. And so, in this example here, you can see that this package contains install scripts and that's called out very prominently on the page. So, it's the first thing that you see. And this package also happens to contain binary or native code, which means that it's not easy to audit the code. It's not like a human readable. And so, both of these issues are called out. And in this case, it's not necessarily... This is not a supply chain attack by any means, but it is nice that this is called out very prominently so that you can make an informed decision if you want to use this package or not. You can also see that we have very helpful quality scores that show up at the top of the page as well.
Now, let's take a look at another example. So, this package here, Angular calendar, is quite a useful package. It's a calendar component that shows up on the page and renders a little calendar. But if you dig into its dependencies, you'll actually find that some of its dependencies are doing quite invasive things. So, here you'll see that one of its dependencies contains install scripts. It also runs shell scripts and accesses the file system and accesses the network. So, this is probably not something that you would expect a web component to be doing, and so it may be worth a little bit of further investigation to figure out what's going on here before you use this package. The other thing that we do that's quite cool is we can highlight when packages do these things and put that directly in line in the code. So, in this package here, I opened it up to take a look at the files and I could see here that the module is accessing the network as well as accessing environment variables, and I can see the exact lines where the package is doing each of these things. And so it makes it a little bit easier to get an idea of what a package is doing before you run it. So, if you want to research packages on Socket before you use them, this is the URL you can use and I highly recommend you take a look at some packages there and use that information to make an informed decision before you select a package.
Okay, the other thing you can do is think about updating your dependencies at the right cadence. So, what do I mean by this? So, there's a question about how quickly you should update your dependencies and this is actually a question we struggled with on our team as well. So, if you, you know, you can think of it as should we update slowly or should we update really really quickly and aggressively? If you update too slowly, you're exposed to known vulnerabilities and you're running code that's old and that you know, may have issues, may have some bugs that have been fixed in the newer version and so there's some downsides to updating too slowly.
12. Balancing Auditing and Automation
Updating too quickly exposes you to supply chain attacks, but doing nothing leaves you vulnerable. A full audit is costly and time-consuming, while not auditing has its downsides. A happy medium is using automation to evaluate dependencies and manually audit the most suspicious ones. The security information is shown in pull requests, empowering developers to address issues. The bot we created runs a full health report on new dependencies and leaves comments on potential issues. It's a low-cost way to enhance the review process.
On the other hand, if you update too quickly, you expose yourself to supply chain attacks because you're now running code that may have been published, you know, literally yesterday or in the last couple of days which means that you haven't had that many eyeballs able to look at the code and so, you know, as you think about security, you have to balance, you know, this trade-off and there really is no perfect solution here. It's just a hard problem.
Another idea is to audit every dependency. So, you know, if you're building a truly security critical application like we were doing with Wormhole, then you, you know, one option is to literally read every line of code of your dependencies. So if we again put this on an axis of starting from full audit on the one hand reading every line of code to YOLOing on the other hand and, you know, by YOLOing I mean, like, doing nothing. How closely should you audit your dependencies? And what you see here is we're in the same situation. We have trade-offs and really no good, no good solutions. So doing a full audit is something that only the biggest and richest companies seem to do in practice. It's a lot of work. Usually you need to have a security team looking at every one of these packages and we also have to approve them one at a time and add them to an allow list, which is really slow and, and this is expensive just because of the time and the effort that it takes. On the other hand, doing nothing and just installing whatever you want without, you know, even looking at the code has its downsides. So it means that you're vulnerable to supply chain attacks. It's risky. And, you know, a breach or bad security press is also, can be expensive, especially as regulators start to crack down on this issue more. And so this is another, like difficult trade-off. What, what do you do? And most teams, I think, err on the side of doing nothing and, but I think, you know, I think this is just, this is just a hard problem.
So one thing that we tried to do when we were building wormhole is to sort of think about a happy medium. Is there a way to use automation to kind of do something in the middle? And so what we want to do and what we've, what we ended up doing is using automation to automatically evaluate all of our dependencies so we could use static analysis to look through packages to try to find malware, hidden code, typos, squatting attacks, and this kind of thing. And that way we could manually audit only the most suspicious packages so we could spend our limited team resources looking at the code for the most suspicious packages and that's the most high impact way that we could spend our time. And so this is, this seems much better to me than an all or nothing approach where you either audit everything or you just hope for the best and look at nothing. And then the other thing we wanted to do is make sure that the security information was shown directly in pull requests so that the developers on our team were empowered to solve the security issues that they saw before they deployed into production. So what does this actually look like? So this is the bot that we created. It's implemented as a GitHub app that you can install on your GitHub repository. And whenever it sees that the package JSON file or the yarn.loc file has been modified, it will take a look at the new dependency that's been added and it will run a full health report against that dependency and if there's any issues found in it, it will leave a comment with, you know, whatever the issue is that was discovered. And so that way the developer reviewing the pull request can look at it and have their attention drawn to this potential issue. In this screenshot here, you can see that I accidentally installed the package browser list instead of browser's list, which is actually a very easy mistake to make. And actually for that reason, browser list, the type of package actually has something like 700,000 downloads a year. So this is really, really helpful. This is the kind of thing that augments your review process and it's very low cost since it only raises issues that, you know, that are really worth your attention and it runs automatically. So if you want to actually try this app out, we've actually published it for anyone to use.
13. Socket.dev: Features and Call for Feedback
It's free, so you can install our GitHub app by just going to socket.dev. It has a bunch of cool features like blocking typo squats, blocking malware, detecting hidden code, detecting privileged API usage, and detecting suspicious updates. We have 70 detections in five different categories: supply chain risk, quality, maintenance, known vulnerabilities, and license. We focus on actionable problems that users want to know about. Try it out at socket.dev, it's free for open source forever, and share your feedback to improve supply chain security in 2022. We're also hiring at Socket if you're interested in working on this project. Thanks for your time.
It's free, so you can install our GitHub app by just going to socket.dev and I recommend you give it a try and let me know what you think. It has a bunch of cool features so it actually can, you know, block typo squats, which, you know, as I just showed you earlier, but also can, can block malware, detect hidden code, detect privileged API usage, you know, such as the use of file system network, child process, et cetera. And also it can detect suspicious updates, so these are, these are updates that significantly change the package's behavior. So we have a whole bunch of things we look for in packages. We actually have 70 detections in five different categories. So we have supply chain risk, quality, maintenance, known vulnerabilities, and license. And we wrote basically, these are just all static analysis rules that we wrote. You can kind of think of it, this is a linter in a way. So it's sort of looking at the package's code and then, you know, looking for these different problems. We tried to focus all of the rules on problems, which are, you know, something that you as a user of the package really want to know about and not things that require a lot of knowledge of the internals of the package. So the things that it finds need to be actionable to you as the developer choosing to use this package. And so that's what we tried to do in our rule development here. So yeah, if you want to try this out, if you want to poke around our website and look at these different issues, you can try it out at socket.dev. And, you know, we have made it free for open source forever. And if you have a private repo it's free while we're in beta. And, you know, I really do want people to give this a shot and share their feedback with us. Because this supply chain security problem is big and only getting bigger. And, you know, I really do want the community to share their feedback with me on this. And I think together we can really do a good job improving supply chain security in 2022 and making 2022 not the year that the supply chain is destroyed, but rather the year that it's, that it's protected better than ever. So please share your feedback with me. There's my email and my Twitter. And also we're hiring at Socket if you're interested in working on this project and helping to secure the software supply chain. Thanks for your time.
QnA: Supply Chain Attack Detection and Storage
The speaker discusses the responses to a question about having a process to detect and block supply chain attacks. Approximately 40% answered no, 33% don't know, and 27% answered yes. The speaker expresses surprise and speculates on the processes used by those who answered yes. The speaker also mentions the confusion between vulnerability scanning and supply chain attack detection. The audience question is about the storage used for tarballs, and the speaker explains that Amazon S3 is currently used. They also mention having a server for local development and the use of a smaller subset of NPM for some developers. The audience member also comments on the responsibility for code quality.
So the question was if one of your packages has a supply chain attack today, do you have a process that will detect and block it? And 40% unfortunately says no. 33% doesn't know. And 27% says yes. We're really happy that one third roughly says yes or maybe a quarter. Was this what you were expecting? I wonder what process they're using. Maybe they installed Socket during the talk and maybe they went and added it to their GitHub repos to get protected. Or maybe they're using some other process like a vulnerability scanner and maybe because I think maybe you'll ask me about this but there's a little bit sometimes confusion between vulnerability scanning and looking for supply chain attacks. But who knows? Maybe they have a process. I'm not sure. Yeah, it was a little bit surprising. Yeah, yeah. Well, let's see in like a month or maybe two when people have had the chance to implement Socket. How many people will still be answering no?
So we're going to jump into the audience questions. First one is from oh, let me scroll back a little bit from Cici Miller. Where did you find 50 terabytes of storage? He also wants this in his life. Yeah, we're just using Amazon S3 right now to store all the tarballs and basically make a full clone of NPM. Originally, we were using Backblaze because they have a little bit cheaper storage. They have a similar thing to S3, but anyway, yeah, it's easy. Cloud storage is cheap. Well, relatively cheap, I guess. It's not cheap if you're doing it as a side project, but for a company, it's something that's affordable. I think we're paying a couple hundred dollars a month or something like that to store everything on S3, so it's not too crazy. For local development, we have a server in our apartment that actually stores the full clone of everything locally so we can do fast local development, but not every developer on the Socket team has a full clone of NPM locally to work with. So they just use a smaller subset when they're doing development on Socket. Yeah, exactly, so you could have remote employees in the future too that work with the subset. CcMailer also said, not my code, not my problem. It's so often used as the get out clause for issues caused. I try to make sure it's good code before it gets there. I'm not perfect, but get most stuff caught early. Well, just more than statement and question, I guess.
QnA: Responsibility for Dependencies and Code
When deploying an app, you're ultimately responsible for everything it does. The 'not my code, not my problem' mindset is no longer a good excuse. With the prevalence of supply chain attacks, it's important to take steps to minimize the risk and be a better developer. If you add a dependency, it becomes your code and your problem if it compromises your product's security.
But yeah, good to hear. I was going to say, I think that not my code, not my problem mindset is… Maybe I didn't understand what he was saying, but I think, like I said in my talk, I think at the end of the day, when you're deploying an app, you're ultimately responsible for everything that the app does. I mean, yeah, I guess you could say, there's nothing I could have done. We didn't know that the dependency was going to behave in this way. So it's not my fault. But I think now that Socket exists and now that I think the problem of supply chain attacks is getting more and more prevalent, I don't know if that's a really good excuse. I don't know. I think there's definitely things you can do to make it less likely and to at least look at the most suspicious packages, the most suspicious changes that are made to packages and to take a closer look at those. So, yeah, I don't know. I think to be a better developer, to be as good as you can be, I think part of that is kind of trying to think about how to make sure that the packages don't have, kind of aren't compromised or hijacked and do what you can to help. Yeah. Yeah. And not my code, my problem. Well, if you're adding the dependency, then it is kind of your code, right? So, and then it becomes your problem because your product is broken or insecure.
QnA: NPM Audit and Socket: Different Approaches
NPM audit is great for vulnerability reporting, but it can be noisy with many low impact issues. The focus should be on fixing high impact and critical vulnerabilities. There is a discussion about improving npm audit to make it more useful. Socket is focused on supply chain attacks and aims to provide fewer but more severe warnings. NPM audit and Socket can work well together as they serve different purposes.
So, question from AntiTomic. Hey, for us awesome and very useful talk. Thanks. I also really like how Socket sounds like. What do you think about the solutions like NPM audit? And is it better to use the solution like Socket and similar?
Well, I guess. Yeah. So, I think that NPM audit is great. It's awesome to have vulnerability reporting built into NPM directly. So, it can tell you when you're installing packages that have known vulnerabilities. But, I would say, you know, one thing is, I think a lot of people kind of ignore NPM audit because there's so many things reported by it. So, I mean, I have to admit like a lot of my projects and even probably even Socket, maybe itself, I don't know, may have a few low impact vulnerabilities that NPM audit reports, and we, you know, don't have an immediate urgency to fix them because we've looked at them and said, you know, hey, this is, you know, yes, I could see how somebody might use this to, you know, slow down our web server, you know, it's a regular expression denial of service is a common one that you'll see where it's like, you know, okay, there's some slow, regular expression in the server, and somebody could maybe use that to slow down your server, make it spend time, you know, crunching some heavy regular expression. But at the end of the day, that kind of thing is not often not going to be a problem in practice, because, you know, an attacker has to find it and has to exploit it and has to use it against you. And probably, you know, most of that stuff isn't isn't actually caught by actual attacker. Right? And, but you know, I'm not saying you shouldn't try to update and remove those from your code. Obviously, that's, that's good, you should do that. But I'm saying like, a lot of the time, you know, a lot of apps just have those, those issues are so many of them. And really, the important thing is to fix the really high impact and critical ones and not necessarily to like, fret about all, you know, the low impact ones. And there's just so many being reported that is, I think, I guess if I had to kind of say what I think about npm audit, you know, it's great. It's awesome that it's raising awareness about this problem. It's a little bit noisy for my liking. And I think there's been a little bit of a kind of a discussion in the community about is npm audit too noisy to be useful? Is it to kind of like, how can we improve it basically, so that it's more useful to end users? Because I think right now, you almost see on every installation that you do that npm audit is gonna complain about something right? And so you basically at some point as a human, you just start to get what do they call it warning blindness or alert blindness, you just see the same alert so many times you start to ignore it, right? So, so yeah, I think, I think our goal with socket is like, we're focused on supply chain attacks, which are very different than vulnerabilities. When they happen, they're very bad. They're very severe, they're a thing that you want to catch before you even run that code on your computer. And so I think we want to have like fewer alerts, fewer warnings than npm audit. But when we do warn, we want it to be because something very serious is happening. And we want to basically step in and protect people's apps. So yeah, I think, I think they can work well together. I just think, you know, they're doing different things. Yeah, it's not a competition. It's a they work next to each other and do different things.
QnA: Ad Sizes and Socket Integrations
The speaker discusses the different sizes of online ads and their dimensions. They mention the standard sizes like 728 by 90 and 300 by 250. The next question is about Socket and its availability as a GitHub app or a CLI tool for GitLab CI/CD. The speaker explains that Socket is focused on analyzing npm packages and detecting 70 issues. They mention their plans to provide integrations with GitHub, GitLab, and other platforms, including a CLI tool, an API, and a GitHub app. The CLI tool will allow users to incorporate Socket into their own scripts and workflows.
Yeah. Yeah, exactly. Yeah, exactly. What you mentioned about the looks, it reminds me of how I see ads, basically online. I don't even without. Yeah, I don't see them anymore. Exactly. And anything that any image that is shaped in that like rectangle shape, or that square shape. It's like your eyes... 728 by 90. Yeah, if it's 728 by 90, yeah, you know the exact dimensions. Yeah, I may have worked in it. Do you want to know the other standard sizes? Yeah, it's like 300 by 250 or something. I think I know some of them, but let's not go into that.
Next question is from Ler. Socket looks awesome. Is there only a GitHub app or also like a CLI tool available that can be used for GitLab CI, CD, for example? Yeah, great question. So with Socket right now, we really have been focused on solving the hardest part of the problem, which is to basically be able to analyze every npm package and find all these 70 issues that we can detect. And so we have all this really good data. And unfortunately, a lot of it is you have to kind of go to our website and type in a specific package to look up our report for it. And what we're working on now in the coming weeks and months is like as many integrations as possible. So there's different ways that people can consume the data. So GitHub app is one we're working on a CLI as well, which should be hopefully ready in, I would say, which probably in April. If not, you know, early April. And we're working on an API, a rest API and no JS API. And all of these different ways. So GitLab, GitHub Actions, there's so many ways people have been asking to, you know, to fit it into their workflow. And we obviously want to support as many as we can. And I think actually one nice thing about a CLI, of course, is like, people can take the CLI and use it in their own scripts and in their own workflows. So that will unlock that ability for them.
QnA: Continuing the Conversation and Conclusion
We're working on it. Thank you for joining us. If you want to continue the conversation, reach out to me on Twitter or email. Hope to see you again soon.
So yeah, definitely on our plan to do. And we're working on it. Cool. Super nice to hear.
Well, that's all the time we had for Q&A. So I want to thank you for joining us. If you want to continue the discussions for us, you can do so on his Speaker Room. Oh, you don't have a Speaker Room. Sorry. You can't. Sorry. Thanks a lot for joining. Just message me on Twitter, you know? Message me on Twitter or email me. I love to hear from people. I'm very, you know, I'll respond to you email me.
All right. That's good to hear. So you heard it here first, if you want to continue the conversation, you can do so on this website with a bluebird. Thanks a lot for joining us for us. I hope to see you again soon. Thanks, Mateen.