1. Introduction to Observability in Node.js
Today, I will be talking about examining observability in Node.js. Observability is a term from control theory. In several products and services, observability means you can answer any questions about what's happening on the inside of the system by observing or by asking questions from the outside of the system without needing to ship new code to answer questions. Because system complexity is outposing our ability to predict what it's going to break. There's a lot of useful tools that come to rescue.
Hello, all. Today, I will be talking about examining observability in Node.js. Hi, my name is Liz Paradis, I'm the head of Developers Relations at Nodesters, and I am from the beautiful country of Colombia.
This is the agenda for today. First, we're going to talk about what is observability, then why observability is important, how we can use Node.js internals for observability, including performance hooks, trace events, heap snapshots, BA native module. And then we will look at some external tools, benchmarks, and conclusions at the end.
So, let's begin by understanding what observability is and why it's important. Observability is a term from control theory. A simple definition could be it's the measure of how well internal states of a system can be inferred by the knowledge of its external outputs. In other words, we see the results, or the output, and we can know what's happening on the inside. Let's take, for example, bananas and avocados. In Colombia, we have a lot of bananas and avocados. It's amazing. And just from looking at the outside, we can know how it is on the inside, if they're ripped, or if they're not ready for consumption yet. In several products and services, observability means you can answer any questions about what's happening on the inside of the system by observing or by asking questions from the outside of the system without needing to ship new code to answer questions. And that's very important. We shouldn't need to write new code to observe what's happening on one of our systems. Because system complexity is outposing our ability to predict what it's going to break. There's a lot of useful tools that come to rescue.
2. Importance of Observability and Choosing Tools
Talking about system complexity, software is becoming exponentially more complex. The number of products and tools are multiplying. With environments as complex as we see today, simply monitoring from problems is not enough. Observability gives you the instrumentation you need to understand what's happening in your software and fix it. Observing is exposing the internal state to be viewed externally, and monitoring is collecting and displaying the information that has been exposed. To solve complex nodejs problems, it's necessary to have an observability tool. In choosing an observability tool, we're going to focus on nodejs internals and external tools like performance hooks.
Talking about system complexity, software is becoming exponentially more complex. In infrastructure, we're seeing things such as microservices, containers, Docker, Kubernetes, and others that decompose monoliths into microservices, which are great for products but can be hard on humans. The number of products and tools are also multiplying. There are countless platforms and tools for empowering people to have better observability and control over their code. Great for users. But it's very hard to choose which one is the best one.
Now that you know what observability is, you will be wondering why it's important and why you should care. With environments as complex as we see today, simply monitoring from problems is not enough to recognize, find, and fix a number of issues that arise. Sometimes the new issues are known or not, which means that you don't know the problem, and even worse, you don't know how to find it. So without tools to observe the environment, it's almost impossible to fix that problem. This is why observability is important. It gives you the instrumentation you need to understand what's happening in your software and fix it. A good observability tool helps you find what and where the problem is. It doesn't add overhead to your app. We don't want our app to slow down. Quite the opposite. It also has enhanced security, has flexible integration, and doesn't need to modify your code. And it's also very important to differentiate between observing and monitoring. Observing is exposing the internal state to be viewed externally, and monitoring is collecting and displaying the information that has been exposed, and usually involves writing automation tools around that.
If you want to solve the most complex nodejs problems such as memory leaks, or performance issues, or even just to follow the best practice to keep your code healthy, it's necessary to have an observability tool. The next step will be choosing an observability tool that are best for our needs. So in choosing an observability tool, we're going to focus first on nodejs internals, and then we're going to check some external tools. In nodejs internals for observability, first we're going to talk about performance hooks. This is particularly helpful for checking on performance. It's an object that can be used to collect performance metric from the current nodejs instance. Performance monitoring is not something you should start considering once you start seeing problems. Instead, it should be part of your development process in order to detect possible problems before they are visible in production. Because of the asynchronous nature of nodes, code profiling with regular tools can be very challenging. Especially because part of the time spent could be outside of your own code and inside of the event loop itself. This is why it's important to use even internal nodejs tools like performance hook or external tools as we're going to discuss later.
3. Demo of Performance Measurement and Tracing
In this demo, we measure the performance of different search engines using performance hooks. The fastest search engine in this example is .co, followed by google, ping, and yahoo. Another example demonstrates the use of performance hooks to measure the duration of a simple 'hello world' program. However, there are tradeoffs, as manual code instrumentation impacts reliability and the performance observer has a significant overhead cost. Profiling with the node prop process flag generates a log file that provides insights into where the application's time is spent. Trace events enable centralized tracing information, including file system access and performance data. Tracing in Chrome allows for visualization of events and their durations. HeapSnapshot provides a static snapshot of memory usage in V8, allowing analysis of memory usage patterns.
Now let's do a little demo. So here I'm using performance server from performance hook and then we have the four biggest search engine, google yahoo ping and .co and then we have an initial mark, an end mark here and we're going to measure the performance of here, the calculated time here. And here we're just going to print the console log, print the duration. So if we go to terminal, we can see the fastest one in this example is .co then followed by google then ping and lastly with yahoo with one second and this one is 447 milliseconds.
Let's look another example. Here we are creating just a simple hello work using performance hooks, performance observer for performance hook and I'm just going to console log the duration of this hello world and that's it. So if we just do node, so this app hello world, it just takes 8 milliseconds. While this is very informative there are some tradeoffs. It requires instrumentation of your code manually impacting reliability. In the case of the performance observer there is a significant overhead cost to the observer which is not good. It makes the application slower.
4. HeapSnapshot and Chrome DevTools
5. The BA Inspector and Observability Tools
The BA Inspector is a development tool that helps you monitor your application. Chrome DevTools integrated into BA expands its capabilities. There are multiple ways to get started, including using the inspect flag or the inspect break flag for local development. A demo showcases the use of the inspect break flag and the WebSocket communication session. However, the VA inspectors should never be used in production. Node.js provides observability tools like profiling and performance hooks, but they have limitations. External tools like the blocked library can help with observability.
The BA Inspector. This is not an observability tool, but instead is a development tool that helps you monitor what's happening in your application.
A few years ago, Chrome DevTools was integrated directly into BA, expanding its capabilities to include newer applications.
There are a few ways to get started. One is using the inspect flag, as we can see here, which will start the inspector. Then you can pass a host and a port that you want to listen, just as here. And if no parameters are passed, it will connect to the port 127 by default, as we can see here. One other way is useful when doing a local development using the inspect break flag, this flag. This has the same options for hosts and ports that the inspect flag, but also puts a break point before the user code starts. So you can do any type of setup you prefer without having to try to catch breakpoints in your code at runtime.
However, the VA inspectors should never be used in production because devtools actions hold events. This is acceptable in development, but it's not suitable for production environments. For production environments, we will see later which one is the best options. But there are some problems with things that Node.js already provides for observability, like profiling, performance hooks, and others is that it tells you that there is a problem but it doesn't tell you where to find it. Also, sometimes there's not easy to implement. It doesn't give you enough information or is not presented in an easy way, like graphs or a center performance metric that standard tools provide. Also, it has significant overhead. Generally, it is not viable in production and only provides data overhead, which means that there's a ton of data and it requires expertise to separate signal and significant data from noise.
But there are some pros because there are great toolings and they have extensive data and insight. So now, we will check some external tools for Node.js observability. First is the blocked library. The block npm package is a concise example of using timers for observability. It helps you check if the loop is blocked.
6. Observability Tools and nSolid
If you're running Node A or higher versions, you can get a stack trace pointing to the blocking function. The block function reports every value over the configured threshold, defaulted to 10 milliseconds. While it's useful for understanding the overhead of event loops, it can have false positives in some scenarios. Another external tool is New Relic, an observability platform that helps engineers create better software. Data Doc is a monitoring service for cloud scale applications, providing metrics on requests, latency, distributions, error, and more. Instana is an application performance monitoring tool for microservices, providing detailed metrics on calls, error rate, mean latency, and more. Dynetrix is a software intelligence platform that monitors and optimizes application performance and development, with Node.js observability features. However, these external tools have limitations and overhead. nSolid is an enterprise runtime for Node.js with minimal overhead.
If you're running Node A or higher versions, you can get a stack trace pointing to the blocking function. The block function reports every value over the configured threshold, defaulted to 10 milliseconds. And you can do whatever you want with it. You graph it, log it, alert it and others. While it's useful for understanding the overhead of event loops, it can have false positives in some scenarios because of the time offset. In addition, it can also create like a numbing effect alerting you to event loop blocks but not signifying or pointing into what is actually causing the blockage.
Many times, developers will just ignore it as there's no clear action to take. Another external tool is New Relic. New Relic is an observability platform built to help engineers create better software. From monoliths to serverless, it helps you to instrument everything, analyze, troubleshoot and optimize your entire software stack. It also provides different solutions. This is how New Relic insights appear. You can see waste transaction times, application activity, error rate, hosts and others. The next one is Data Doc. Data Doc is a monitoring service for cloud scale application providing monitoring of service, database tool services, true or false based data analytics platform. With Data Doc, you can check like requests, latency, distributions, error, percentage of time spent and other metrics as we can see here. Instana is an application performance monitoring for microservices. It lets you manage the performance of your application in real time and see every detail about the inner workings and inner dependencies of your application services. We can see some of the metrics here like calls, error rate, mean latency, top services, processing times and others.
So, Dynetrix, Dynetrix produces a software intelligence platform based on artificial intelligence to monitor and optimize applications' performance and development, IT infrastructure and user experience. For Node.js observability it can tell you the number of processes, CPU and memory usage, the percentage, connectivity and availability, traffic, the most consuming, requests and other Node.js metrics. But there is a problem with all of these solutions, all of these external tools. The way APM works, they become agents as we can see here. We become agents between, which are basically intermediaries between the application and the Node.js runtime. The APM is injected into your code and encapsulates your application so they can extract the information and thus have a significant cost also known as overhead. Another problem is that sometimes you have to modify your own code in order to implement the APM. But they can be very useful tools that provide you with additional insights and extensive data. Now let's look at one tool that doesn't have this problem because it's not an APM. It's an enterprise runtime for Node.js and it adds minimum overhead, nSolid. nSolid is a drop-in alternative to the Node.js runtime.
7. Benefits of Nsolid for Node.js Observability
Nsolid is an enhanced observability tool specifically built for Node.js, offering low-input performance insights and greater security for mission-critical applications. It provides valuable insights into Nsolid processes, including CPU usage, garbage collector counts, and more. Unlike traditional APM tools, Nsolid operates at a lower level, avoiding the overhead of wrapping user code. In benchmarks, Nsolid outperforms other tools in load times and speed, with minimal memory overhead. Observability in Node.js is crucial for security and performance, and Nsolid is the top choice for Node.js-specific observability. Other tools may have their merits, but Nsolid offers the best of both worlds.
It's enhanced to deliver low-input performance insights and a greater security for mission-critical Node.js application. It has fast time resolution, a stronger infrastructure, and had better security. This is important because traditional APM tools sit on top on the Node.js runtime layer performance and it has performance overhead that might vary from application to the next depending on the architecture.
Node.solid was built specifically for Node.js because it's Node.js runtime itself, it's not an agent. This is the console overview that provides valuable insights into the clusters of Nsolid processes running in a variety of configurations. You can see the number of processes, vulnerability, host, and the number of applications. On the cluster view, where you can see each processes, CPU, CPU percentage, garbage collector, counts, and others.
It's important to clarify that the previous slides show and they all contain libraries that help you expose data, but the main function is as a monitor. For example, I can export data using New Relic API and consuming via AWS. This is where Nsolid has an advantage. The additional metrics that Nsolid provide can be consumed by many monitor solution and without any additional overhead. This is the best of both worlds.
And finally, we will see some benchmark. If we're going to check load times, which is the time that it takes for Node.js process to be available to receive a process request, we can see that Vanilla Node.js here is the fastest with 30 milliseconds, followed by Nsolid with 40 milliseconds. Then we can see Instana with 210 milliseconds, which is an increase of 600%, New Relic, Datadog and finally AppDynamics, which is an increase of 3600%. The standard times we can see that Node.js takes, Vanilla Node.js takes 30 milliseconds in startup time. Followed by Nsolid with 35 milliseconds. Then 150 and 250 milliseconds by AppDynamics and Datadog, which is an increase of 600%. Added baggage or memory overhead of Nsolid only adds 2MB of memory, while New Relic adds 15 and Datadog adds 57MB of overhead. Finally, measuring speed, Nsolid is the fastest one with almost 10000RPS. Then AppDynamics with 2000RPS and finally Datadog with 1500RPS here.
As a conclusion, observability in Node.js is very important for security and performance. It allows you to fix errors faster and if you are focusing on Node.js specifically Nsolid is the best observability tool out there. The other tools are great but they come with a cost because they add noticeable overhead from wrapping the user's code into their own libraries. Nsolid avoids these penalties by observing the application at a lower level allowing Nsolid to make observations without directly affecting how the program runs. For other types of applications and depending on your needs there are other great observability tools that add a lot of value. Thank you so much. This is where you can find me on social media and if you have any questions please let me know.