Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.
In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.
Transcript
Intro
Hi, everyone. I am Matteo Collina. Today, I'm going to talk about one of my favorite topics, logging. Just a little bit of a little bit of intro of me before we start. I'm Matteo Collina, @matteocollina on Twitter. I work for a company called NearForm. We are a professional services company based in Ireland. I'm also a member of the technical steering committee and I'm the co-creator of Fastify. And this is what we're talking about today, Pino. I also do... I'm also software architect. So I typically help companies run JS in production. I also write every week on my newsletter @nodeland.dev. Please subscribe.
Anyway as part of my...Oh, this is actually important, maybe? I have a lot of downloads on MPM. This is...I ended up maintaining a lot of things in the ecosystem because as my activity, as a consultant, I tend to balance between client work and the need of the client with the maintaining Node.js ecosystem, and that is a good synergy.
So, I end up implementing new libraries and new modules to solve problems that my clients have. And I bring those new things to my clients. As part of this initiative, I built various things like Fastify and Pino.
About Pino
[01:50] What's Pino? Pino is one of the fastest logger for Node.js. It has a minimal feature set. Well, we'll talk a little bit about the minimal and it has a great community. Pino's now been downloaded 6 million times per month, and we have four collaborators, maybe three, but whatever...four collaborators and it's version six and we are working on version seven. This talk, it's about version seven. Hey, this is actually nice. So, new things coming through.
How to use Pino? Using Pino's very simple. You can just create your logger. You adjust a factory function. So you just create your logger and then you start doing info, error, debug those things that you like. We also have this child logger functionality that enables you to create a new child, with a certain set of properties already prepopulated. It's a new line, the limited Jason logger. So it produces a new line limited, Jason. So I said Jason followed a new line and another Jason followed a new line, pretty interesting.
[02:50] It's fast. And you can use Pino to drastically reduce the overhead of logging inside your application. You cannot really bring it to zero because it's still doing IO in the end, but it helps reducing it quite a lot. Know that logging should be fast. It should not have very expensive logging because if you have very expensive logging, then you will be inclined to log less and not more. And you need more observability most of the time.
There is this nice talk of the cost of logging that my partner in crime, David Mark Clements, and myself have done at Node conf 2016. It's a long time ago, but Pino's been around for a bit. So hey, happy days. And you can check it out because it's a really good talk. And most of what we say is still modern. There was a bunch of things that are different, and we're going to change in v.7. that we're going to talk about soon.
Why Pino is so fast?
[03:53] So, well, the first bit that I want to cover about Pino and how come it is so fast. How can it be fast? Well, I just want to give you a little bit of overall and thank you, James. These lights are amazing. This diagram is probably one of the best one we have ever done. This diagram explains the event loop and explains when we are doing IO and how can we achieve very performant IO in Node.js out right performance, Node.JS applications.
So first and foremost, you have data that is coming on the event loop. So you have data, you have events coming in. Some IO has finished or maybe an HTTP request has arrived. When that lands in JavaScript land, it means that, well, before doing that, the event loop called some C++. So inside the C++ code inside, Node.js, then we kick off the JavaScript function, which is a function that gets executed. Which in turn, might resolve a promise or schedule a next stick. And therefore we have to process also the micro tick cues.
[05:14] After all of that is said and done, we go back into the event loop. From the moment we start calling inside C++ to the moment that we return, the event loop is blocked. So there's no more event, no nothing. There's nothing happening IO wise in the system. This is not great. Well, or it is great in the sense that if I want to make it fast, if you make my code fast, but the time is I spend executing JavaScript needs to be as more as possible.
So what happens if you have a slow IO operation...So, let's imagine in the context of logging that you're writing to a destination and you need to do some mangling and processing of your logs before moving them out. Now, because of you're writing all those logs and then therefore you have a slow IO operation and the amount of task concurrent task increase, right? But if the concurrent task, a synchronous concurrent task increase, then it also increase the memory consumption of your application.
[06:44] But if the memory consumption increase, you have more garbage collection to do, which runs on the CPU. And so when the CPU runs, the GC grows, the memory grows, the GC grows in use. And then the CPU gets filled in the garbage collection activity, too. Now, that's the problem, right? You see? Because latency and throughput are deeply connected. So, if I have a slow operation in my code, then it will increase concurrency, which will increase memory pressure, which will slow down the CPU. And it's possible that, because of that slowdown in CPU, the buildup of events to process would go up again and enter some sort of catastrophic cycle where until the pressure is relieved somewhere, everything would be blocked.
So, the trick here is to make sure that most of the process is done as fast as it possibly can. That's it. That's the answer to this. And it does not schedule anymore work. Now, this means in the context of logging, for example, that if you are having, if you want to send the same log line to multiple destinations, it will be problematic because we have, or especially even over a network, because if we try to do that, that data will stay alive for longer. And we are creating actually more work for our event loop. In Pino, we do all the data processing synchronously. So, whenever you call that in for whatever, you can be certain that using the default that all the processing is as done within the macro tick. So there's nothing scheduled to be executed later. This is phenomenal and really important for getting good throughput because we make sure that there is no memory located left, and it gets essentially cycled very easily.
[09:12] Now, you could turn on the async mode in Pino so that the logging is then flashed after a bit. Okay? When, and is not written immediately, but it's a little bit tougher on the debugging side of things. We'll talk about that in a second.
Worker_threads
[09:33] In the original presentation about Pino from 2016, we had this light about the fact that Pino is more than a module. It's a way of life. Well, Pino, it is more than a module. And at the beginning, we flagged this because it was clear that we are proposing something radically different. And what were we proposing in 2016? Well, we were telling people, well, look, you need to implement your transport separate processes, or rely on your infra. So you log on standard output and your infra peak standard output and send it somewhere else. Yes, we still recommend that. That has not changed. However, for several teams have reached out to us and asked us, "Well, but we really need to execute the distribution in our process." The typical target is sending your log to Datadog or Elasticsearch or something like that. Well, Pino allows you to do that. Well, Pino will soon allow you to do that from the same process without blocking the main thread. We'll see that in a moment.
[10:49] What happened was that all people started asking us for the same features. They wanted to format their logs. They wanted to send it to remote destination. They wanted to implement log rotation. This was a big thing. And really, they wanted to send multiple logs to multiple places at the same time. By us saying, well, you should do all those things out of process. We were saying to all those teams, "Well, you need to have your operations people or your DevOps to take care of those things."
It was harder for most teams, to be honest. So and users keep asking those questions every day, every day, every day. So, hey, what could we do? Well, there was something that we could have done. We could have used worker thread.
[11:45] Worker threads are now stable in all long-term support release lines of Node.js 12 and 14, of course. And they offer some easy synchronization primitives, for example, Atomics, and it is great. And maybe we could use them to build transports, right? Wow. Well, how? First of all, this idea is not new. It was originally showed to me Mark Martin, at the conference a few years back, where he was actually using a native add-on to do most of this work. And it was shown immense, a really good potential in term of throughput and performance. So, it was pretty great.
So, what I did was, I started writing this library called thread-stream. What does thread-stream do? Well, it essentially wraps a worker thread in a stream based API. So it's not purely have Writable stream because it does not innate from Writable, but it's fast and you can write streams to it. You can't write buffers. So basically you called streams, right? And you just serve writing to your thread. And on the other thread, you need to provide a writable stream in. That's it. That's the only interface that you need to provide. So once you have done this thing, that has provided you both of those, it's actually pretty great because then now I have a one way to send data to my worker thread as a stream. But a stream is the interface that Pino have.
[13:24] So, let's see how to use that. Well, what we do is, we want to have our main thread send logs to our worker thread. That's what we do. So, we will have two trades, one for main and one for the logs processing.
How do they communicate? They communicate using a ring buffer. Using a ring buffer allows them to write to this buffer, mostly without locking the buffer itself. So one can write and the other one can consume without blocking between them. So, and I can always determine the part to read from that work.
[14:04] So, I'm going to introduce you to pino.transport now. I build up all this tension, and hopefully we release it right now. So pino.transport is a new function inside Pino that allow you to do log processing in a remote transport, sorry, in a worker thread, that is wrapped as a transport. You can use modules, you can specify either a top dependency or a single file, with an absolute path. Or you can use a destination file or you can use a console and prettyPrint. So, that's it. So you can actually do a lot of those things in different ways.
Demo
[14:54] So, it's time for a demo. So, what I'm going to do, I'm going to open up Kibana and we'll see our way. So this is Kibana. We have last five minutes. This is nothing. And I have my script. So, first of all, I'm going to show you the example and this is the basic example of Pino. This just log things out and to standard output. So, what we used to do is Node example, and you see it prints this, but then I could get prettyPrinting. Note that what I've done here is...That's it. Okay. So what I've done here is, I have called executed Node modules being Pino ... slash pino-pretty to prettify and pluralize my locks.
Now, what I can do instead with the transports is these new... Yeah, open anyway. Yay. Is use this new technique where I am creating my transport. I am specifying a file for Warren destination. I'm going to use a Pino Elasticsearch to send the data to Elastic, specifying the Elasticsearch note and doing prettyPrint. So, let's see how, if that works. So, I've been doing...And oh, note that this, the rest does not change at all, is exact same thing. And it's pretty cool, right? Because I can write my transport, not keep my app, but doing multiple destinations in the same process.
[17:03] Well, let's run this. So if I run example transports, I got the exact same output as before. I only got these log lines added because in fact, this shows only the warning logs. Okay? Here it is. It's pretty great. And then we can see if this was updated. And now you can see that they also have logs inside Elasticsearch. And it's really nice because you can see that we have all our data in a really easy to process function, which Kibana gives us. So it's pretty great.
Cool. I just wanted to thank you for watching this talk. If you have any more questions about Pino and Fastify and Node.js, please reach out to me @matteocollina or ping me an email at matteocollina@nearform.com. Oh, the way, we are hiring for all sorts of roles. So, if you're interested, hey, check us out. Bye.