The Dark Side of Open Source

Rate this content
Bookmark

Join Feross, CEO of Socket, on a thrilling journey into the dark side of open source software. Come along for the ride as we explore the unseen risks lurking within everyday software dependencies. See firsthand how AI-driven solutions, specifically large language models, are helping us battle against malicious dependencies within the npm ecosystem. Arm yourself with the knowledge and tools to protect your codebase in this ever-evolving battle.

Feross Aboukhadijeh
Feross Aboukhadijeh
37 min
04 Apr, 2024

Comments

Sign in or register to post your comment.

Video Summary and Transcription

The talk explores the dark side of open source, focusing on supply chain attacks and the need for improved security measures. It highlights the dangers of loading external code and the importance of mitigating supply chain risks. The talk also discusses the use of AI and LLMs in code analysis to enhance security. It emphasizes the challenges of sustaining IC maintained open source projects and the future of supply chain security. Lastly, it touches on the variations in open source definitions and the empowerment of the open source community.

Available in Español: El Lado Oscuro del Open Source

1. The Dark Side of Open Source

Short description:

Welcome to the talk on the dark side of open source. We'll explore malicious code and threats in the NPM and JavaScript ecosystems. I have experience in open source and cybersecurity, and now work on open source security at Socket. Socket helps developers and security teams find, audit, and manage open source software. Today, over 90% of applications rely on open source dependencies, making supply chain security crucial. The open source ecosystem is under attack, with software supply chain attacks affecting companies of all types.

♪♪ ♪♪ ♪♪ Hey, everybody. It's Firas, and welcome to this talk on the dark side of open source. I'm really excited to share with you some of the lesser-explored parts of open source. We're going to dig into some examples of malicious code and going to give you a sense for some of the threats out there in the NPM and JavaScript ecosystems. So let's get started.

So first off, a little bit about me. I started out in open source. I worked on some pretty popular packages, including WebTorrent and StandardJS, and I'm also a former member of the Node.js Foundation board. And so I really got to see a massive increase in the usage of open source within companies and in the community. Then I moved into more of a security focus. I taught the web security course at Stanford, and now I'm working on open source security at Socket.

So real quick, just a couple words on Socket. So Socket's a tool that helps protect your code from everyone else's, and we help developers and security teams to ship faster and spend less time on security busy work by helping them safely find, audit, and manage open source software. We have a ton of companies using us. A lot of these are actually open source projects, and we're protecting over a quarter million repositories today. So I'm really happy that we've been able to help protect the community to this degree.

Okay, so let's talk a little bit about our applications. So in the last five years, the way that we write software has really changed. It's undergone a really massive shift. Today it's really common to see applications where over 90% of the code comes from open source dependencies. So that means code that your developers, you know, you and your teammates didn't write. The average open source dependency actually has 79 transitive dependencies. So in this world where your application is built on, you know, thousands of dependencies, software security is not just about your code. It's about every piece of code that you depend on. And so in this talk we're going to be talking about open source dependencies because we're JavaScript developers, but, you know, this is actually a broader issue. If you think about there's this term software supply chain, and that really includes all the third-party code that you rely on, whether it's APIs, cloud services, and even dependencies like your operating system, really all the parts and the pieces that make up our software is what we talk about here when we talk about supply chain security. And unfortunately, you know, the open source ecosystem is under attack. We've seen software supply chain attacks surge in the past couple of years. There's headlines pretty regularly about different breaches and attacks, and these attacks impact all types of companies. You know, it's really just anyone who depends on open source, and I know you do depend on open source, will at some point be affected by one of these attacks just because of the scale of NPM.

2. Supply Chain Attacks and the Problem of Trust

Short description:

The SolarWinds hack was a sophisticated supply chain attack that compromised SolarWinds and impacted thousands of networks. NPM packages, often maintained by individuals or small teams, can also be vulnerable to supply chain attacks. I'll present a real example of a recent attack that exfiltrated environment variables. As developers, we rely on trust in open source, but it takes too long to detect malicious packages and they're often not catalogued for future reference.

And so how many of you have heard of the SolarWinds hack? This was pretty big news a few years ago. It was a sophisticated supply chain attack that compromised a supplier called SolarWinds, and the way that it worked was that an attacker added malicious code into one of SolarWinds' software products, and then they did this by basically getting into the network of SolarWinds and adding their attack code into the SolarWinds product. And then downstream of that, they were able to get into thousands of networks of SolarWind customers, including U.S. government agencies and large corporations. And while it's pretty hard to kind of figure out the exact monetary damages of this attack, the costs associated with the investigation, the remediation, and the increased cybersecurity measures as a result of this was probably in the billions of dollars.

That's SolarWinds. That's a company that has a security team and a lot of effort to defend their products. Now let's talk about NPM packages, right, which are often maintained by individuals or small teams of volunteers. So sometimes software supply chain security can be kind of abstract, right? It's kind of like, what are we talking about here? So I wanted to make this really concrete for everyone and just really show you, what does a supply chain attack look like? So here's a real example. This is an attack that we detected a few days ago. And we're going to discuss, like, what's going on here. So let me help you out a bit. I'll highlight a few parts of the code here. Does that help? So now if you look at this, you can see, were a developer to install this package, this malicious code would immediately run in an install script, and it would exfiltrate or steal their environment variables, which can include, obviously, secrets, tokens, keys, and then it would send it to an attacker-controlled server. So you can see those three parts there. It's acquiring the network package, it's accessing the environment variables, and then it has this sort of obfuscated or hidden network request down there on that third line. And this is really how a software supply chain attack can lead to a breach at a company.

Now, this package was targeting probably Airbnb, given the name of the package, but honestly we don't really have all the details on what the goal of this package was, but the name is very suspicious, I'll just say that. And so fundamentally the problem here is that, you know, as developers we're using so many packages, but we just don't have the time to read every line of code in our dependencies. And so we're always trusting other people fundamentally, and open source is built on trust. And for the most part this trust is well placed. Most people are good. But there are a few bad apples. And unfortunately it does take us as a community a little bit too long to find these types of bad packages today. Right now we're looking at over 200 days to detect a malicious package as a community. And so, you know, this is pretty bad. This is from a research paper published in 2021. And the other big problem is when we find these malicious packages as a community, we report them and they get taken down, but they're often not catalogued and saved in any way. So they don't go into the typical vulnerability tracking systems like the National Vulnerability Database. They just get taken down and then no one knows whether or not they may have installed that package in the past.

QnA

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Remix Flat Routes – An Evolution in Routing
Remix Conf Europe 2022Remix Conf Europe 2022
16 min
Remix Flat Routes – An Evolution in Routing
Top Content
This talk introduces the new Flat Routes convention that will most likely be the default in a future version of Remix. It simplifies the existing convention as well as gives you new capabilities.
How to Make a Web Game All by Yourself
JS GameDev Summit 2023JS GameDev Summit 2023
27 min
How to Make a Web Game All by Yourself
It's never been easier to make your own web game, but it's still extremely difficult. What game should you make? Which engine should you choose? Let's discuss how to answer these problems and ways to leverage the unique platform that is the web.
How to Build Your Own Open Source Project
React Advanced Conference 2022React Advanced Conference 2022
16 min
How to Build Your Own Open Source Project
We all used open source projects every day such as npm packages, editors, web applications, and even operating systems... Have you ever thought of building one of your own? In this talk, I will share my journey building jest-preview, from when it was just a vague idea, to currently a well-adopted library to help frontend engineers write tests faster. I will share with you how to come up with an idea for a project to work on, what is the struggles you have to overcome as an author of an open source project, how to manage time efficiently, and how you get attention from engineers around the world.
Atomic Deployment for JS Hipsters
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
Atomic Deployment for JS Hipsters
Deploying an app is all but an easy process. You will encounter a lot of glitches and pain points to solve to have it working properly. The worst is: that now that you can deploy your app in production, how can't you also deploy all branches in the project to get access to live previews? And be able to do a fast-revert on-demand?Fortunately, the classic DevOps toolkit has all you need to achieve it without compromising your mental health. By expertly mixing Git, Unix tools, and API calls, and orchestrating all of them with JavaScript, you'll master the secret of safe atomic deployments.No more need to rely on commercial services: become the perfect tool master and netlifize your app right at home!
Your GraphQL Groove
GraphQL Galaxy 2022GraphQL Galaxy 2022
31 min
Your GraphQL Groove
Building with GraphQL for the first time can be anywhere between daunting and easy-peasy. Understanding which features to look for in your client-side and server-side tooling and getting into the right habits (and ridding yourself of old habits) is the key to succeed with a team of any size in GraphQL.

This talk gives an overview of common struggles I've seen numerous teams have when building with GraphQL, how they got around common sources of frustration, and the mindset they eventually adopted, and lessons learned, so you can confidently stick with and adopt GraphQL!
Full-stack & typesafe React (+Native) apps with tRPC.io
React Advanced Conference 2021React Advanced Conference 2021
6 min
Full-stack & typesafe React (+Native) apps with tRPC.io
Top Content
Why are we devs so obsessed with decoupling things that are coupled nature? tRPC is a library that replaces the need for GraphQL or REST for internal APIs. When using it, you simply write backend functions whose input and output shapes are instantly inferred in your frontend without any code generation; making writing API schemas a thing of the past. It's lightweight, not tied to React, HTTP-cacheable, and can be incrementally adopted. In this talk, I'll give a glimpse of the DX you can get from tRPC and how (and why) to get started.

Workshops on related topic

Node.js: Landing your first Open Source contribution & how the Node.js project works
Node Congress 2023Node Congress 2023
85 min
Node.js: Landing your first Open Source contribution & how the Node.js project works
Workshop
 Claudio Wunder
Claudio Wunder
This workshop aims to give you an introductory module on the general aspects of Open Source. Follow Claudio Wunder from the OpenJS Foundation to guide you on how the governance model of Node.js work, how high-level decisions are made, and how to land your very first contribution. At the end of the workshop, you'll have a general understanding of all the kinds of work that the Node.js project does (From Bug triage to deciding the Next-10 years of Node.js) and how you can be part of the bigger picture of the JavaScript ecosystem.

The following technologies and soft skills might be needed):
  - Basic understanding of Git & GitHub interface
  - Professional/Intermediate English knowledge for communication and for allowing you to contribute to the Node.js org (As all contributions require communication within GitHub Issues/PRs)
  - The workshop requires you to have a computer (Otherwise, it becomes difficult to collaborate, but tablets are also OK) with an IDE setup, and we recommend VS Code and we recommend the GitHub Pull Requests & Issues Extension for collaborating with Issues and Pull Requests straight from the IDE.

The following themes will be covered during the workshop:
- A recap of some of GitHub UI features, such as GitHub projects and GitHub Issues
- We will cover the basics of Open Source and go through Open Source Guide
- We will recap Markdown
- We will cover Open Source governance and how the Node.js project works and talk about the OpenJS Foundation
  - Including all the ways one might contribute to the Node.js project and how their contributions can be valued
- During this Workshop, we will cover Issues from the nodejs/nodejs.dev as most of them are entry-level and do not require C++ or deep technical knowledge of Node.js.
  - Having that said, we still recommend enthusiast attendees that want to challenge themselves to "Good First Issues" from the nodejs/node (core repository) if they wish.
  - We're going to allow each attendee to choose an issue or to sit together with other attendees and tackle issues together with Pair Programming through VS Code Live Share feature
    - We can also do Zoom breakrooms for people that want to collaborate together
  - Claudio will be there to give support to all attendees and, of course, answer any questions regarding Issues and technical challenges they might face
  - The technologies used within nodejs/nodejs.dev are React/JSX, Markdown, MDX and Gatsby. (No need any knowledge of Gatsby, as most of the issues are platform agnostic)
- By the end of the Workshop, we'll collect all (make a list) the contributors who successfully opened a Pull Request (even if it's a draft) and recognise their participation on Social media.
How to create editor experiences your team will love
React Advanced Conference 2021React Advanced Conference 2021
168 min
How to create editor experiences your team will love
Workshop
Lauren Etheridge
Knut Melvær
2 authors
Content is a crucial part of what you build on the web. Modern web technologies brings a lot to the developer experience in terms of building content-driven sites, but how can we improve things for editors and content creators? In this workshop you’ll learn how use Sanity.io to approach structured content modeling, and how to build, iterate, and configure your own CMS to unify data models with efficient and delightful editor experiences. It’s intended for web developers who want to deliver better content experiences for their content teams and clients.