What are some of the problems of the current Node core HTTP client and how can we move beyond them? A talk about how undici tries to bring the next step in client side HTTP in Node. Includes some practical advice to using undici and common pitfalls with some commentary in regards to API design decisions and future plans with some extra Fetch API bonus.a
Evolving the Node HTTP Client with undici
AI Generated Video Summary
The Talk discusses the current state of the Node HTTP client and the problems it faces, including the lack of support for HTTP pipelining and the intrinsic link between request and response objects. The speaker introduces the library Indichy, which aims to provide a more user-friendly API for HTTP in Node. The Talk highlights the performance advantages of using WebAssembly in the Umidigi HTTP client and the plans to include it in Node Core. The speaker also mentions the support for signals and the ability to post requests in Umidigi. Additionally, the Talk covers the customization options in Undici, the different types of dispatchers available, and the potential inclusion of Indichy in Node Core. Future plans include support for HTTP 2 and 3, DNS lookup enhancements, and improvements to fetch and pool scheduling. The Talk concludes by discussing the differences in TCP implementations across operating systems and the considerations for adding Web APIs and standards to Node Core.
1. Introduction to Node HTTP Client and Indichy
Hello, everyone. My name is Robert Nagy. I'm the lead developer at Next Edition and a frequent contributor to Node.js and also a TSC member. I'm going to talk about the current state of the Node HTTP client and the problems we see. The response object is not actually a Node stream, it lacks support for HTTP pipelining, and the request and response objects are intrinsically linked. We have tried fixing these issues multiple times, but it's difficult without causing disturbance. A few years ago, Matteo Macalina created the library Indichy, which aims to provide a more user-friendly API for HTTP in Node.
Hello, everyone. My name is Robert Nagy. I'm the lead developer at Next Edition and a frequent contributor to Node.js and also a TSC member. And I'm going to talk a little bit about some work we've been doing on the HTTP client in Node using a library we created called Bitchy.
So, first of all, let's talk a little bit about the current state of the Node HTTP client. I think most people are familiar with the HTTP request API. And people that are using NPM libraries like Got and Axios, etc., these are usually just wrappers around the HTTP request call that Node Core provides. Now, this has a long history and it has worked for a very long time. But those of us that have been working on maintaining this API feel we have kind of reached the end of the line of what we can do with reasonable effort with this API.
So what are some of the problems we see here? So first of all, the response object you receive from the request API is not actually a Node stream. It just pretends to be a Node stream. There are various reasons, both the compatibility and performance reasons for that. But there are slight differences that can cause weird issues if you're unlucky. Especially if you're using APIs that expect streams, and you wanted to use it seamlessly. It doesn't have support for HTTP pipelining. Pipelining is a feature of the HTTP protocol that can provide some significant performance advantages. Also, the request and the response object in this API are intrinsically linked. So if you destroy, for example, the request, even though it has completed, it will also kill the response. And this linkage is very difficult to fix or remove without breaking the entire ecosystem. It's entirely based on streams, which causes some limitation of what we can achieve in terms of performance. And a lot of the internals in the API are publicly accessible, and we have ecosystem modules that depend on internal implementation details. Back then, we didn't have symbols so it wasn't possible to actually hide things from users. And all of these things cause issues that we believe are very difficult to fix without causing undue disturbance in the entire ecosystem.
So a few years ago, Matteo or actually, another point here is we have tried fixing these issues multiple times, and that's caused problems and we had to revert them. I'm actually one of the persons that had spent a lot of time trying to resolve these issues. And it's quite disappointing when you are forced to revert them due to compatibility or performance regressions, et cetera. So here are some pull requests that just revert work that has been done towards this. So a few years ago, Matteo Macalina created this library called Indichy, which is a new take on what HTTP can and could, can be in Node. And I got involved a year or so ago, and I've done a lot of work making this production ready. So what are our aims? What is we are achieving or trying to achieve within Indichy? So we want a bit more user-friendly API so people don't have to go to a third-party library by default.
2. Node HTTP Client and Umidigi
We have managed to replace the entire HTTP client outside of Node core by using WebAssembly, which provides performance advantages. Unleashy supports Keepalive and pipelining, addressing issues in Node Core. We have sorted out the problem with Keepalive in Umidigi, and achieved almost 10 times better performance compared to the Node Core client. We are developing Umidigi outside of core for later inclusion and have hidden all the internals behind symbols. We're also looking into implementing fetch on top of Umidigi. It's important to consume or destroy the body to release the connection, and we provide a helper function called dump. We have support for signals.
We don't want to have any native dependencies. So this is actually required for us to, in order to replace the entire HTTP client outside of Node core, we native dependencies. And the HTTP parsing node is HTTP, which is a native library. But we have managed to get around this by using WebAssembly, and it's working very well, and we do see actually some performance advantages with using WebAssembly.
Another important thing is we want to support Keepalive and pipelining, so I'm just quickly going to explain that. Keepalive is actually something that the node core client supports, but it's not enabled by default. And there are some things you have to think about this in Node Core that Unleashy tries to handle for you. So without pipelining, every request you do actually creates a new connection and closes that. With Keepalive, you can reuse the connection for subsequent requests. So you skip the overhead of closing and establishing a new connection. And with pipelining, it's actually possible to send multiple requests before the server has replied. And thereby, you can reduce some of the latency of your requests. And we have spent a lot of time making sure Unleashy supports this natively.
And as I mentioned, there is a problem with Keepalive that Nord Core doesn't handle, is that once a connection is established, there's a timeout that the server will keep that connection alive, waiting for further requests before closing the connection. Now, if you're unlucky, you could actually send a request at the exact moment the server's timeout happens and therefore the server will close the connection while your request is arriving. Now, some servers provide a Keepalive hint in their responses so you can actually find out for how long the server expects to keep the connection open, and therefore, the client can actually do stuff to avoid this unexpected closure happening. So that's something we've sorted out in Umidigi.
We have also looked at performance so with Umidigi we were able to achieve an almost 10 times better performance relative to the Node Core client, which I have to say is a rather good result. We are developing it outside of core at the moment for later inclusion. There are some advantages of this especially in terms of the implementation velocity development velocity and we've hidden all the internals behind symbols so we don't have a repeat of third-party libraries, depending on implementation details. And we're also looking into implementing fetch on top of Umidigi. So Umidigi's API is the most basic API is Umidigi request and you basically do an await Umidigi request where you get a body, a status code, and headers and the body is a node stream but we have also implemented some helper functions inspired by the fetch mixing body specifications so you have body.text.json.array buffer and you can just await those but otherwise the body is a normal node.js stream. So this is quite simple. One important note that I've noticed some people miss is that it's very important, even if you don't care about the body, you should consume it or destroy it to release the connection because the way it works with the keep-alive is unless the previous request has completed you can't actually process the next one so it's important to either destroy the body or dump it or consume it. We provide a helper called dump, which the downside of destroying the body is that that actually will cause the socket to be destroyed. We have a helper function here called dump which will actually try to read the entire body so that the connection can be reused but if the body or the response from the server exceeds a certain threshold then it chooses to eventually close the connection. So you don't have to download a gigabyte of the data before being able to reuse the connection. And yes, if you don't do this then the connection won't be freed for the next request. We have support for signals like any good citizen of the promise world.
3. Support for Signals and POST Requests
We have support for signals and the ability to post requests. However, it is recommended to avoid using HEAD requests due to compatibility issues with servers. Instead, you can use a range request or read one byte of information to achieve similar functionality.
We have support for signals like any good citizen of the promise world. You can also post requests. You just pass the body as an option to the request method. Please, avoid using HEAD. HEAD requests have some limitations due to compatibility issues with the servers that might or might not send a body with the HEAD response and therefore Enduci actually always chooses to close connections to be safe. A workaround for this in order to maintain the similar functionality with the same performance is to just use a range request if possible or you can read one byte and you can receive a lot of the information that you otherwise would have to use a HEAD request for.
4. Configuring Keepalive and Stream API in Undici
Keepalive provides options for customizing request settings. A dispatcher called an agent dispatcher can be created in Enduci to change these settings. You can configure Keepalive timeout, pipelining depth, and the number of connections to a server. If a request in the pipeline fails, Undici will retry subsequent idempotent requests. The stream API can be used to avoid the unnecessary performance overhead of the readable glue stream in the request API.
Keepalive provides some options. You can create your custom so-called dispatcher in Enduci which allows you to change some settings on how requests are made. So here we're creating a dispatcher called an agent dispatcher which has some options. So we can have a Keepalive timeout. This is how long we expect the server to keep the connection open. This should probably be lower than what you expect the server to do. If the server provides these Keepalive hints, then that will actually override whatever setting you use here, so you can be actually quite aggressive in putting it at low.
We have a limit so if the server provides a hint, that hint can be two days, then maybe you don't want to use that. So we have a max threshold so I won't go over there. And also since the timeout of the connection actually is from the moment or from the moment the server sent the response, there is a delay in the response being sent to receive by Undici. So we have a timeout threshold which is basically, if the takes into account the transport latencies. Timeout is five seconds. We will remove one second so the client assumes it's four seconds left until the connection will be killed by the server. That's all with that.
You can also configure pipelining. Undici does not do pipelining by default. So you can configure here how deep of a pipeline you want, how many requests can Undici send before having a receiver response. What is the best value here is a little difficult to say. It depends on your use case, but two or three is usually fine. And then you can also choose how many connections Undici may do to a server. So if you make a request to a server and there's no connection available, then Undici will start a new connection. And this way you can limit the number of connections, similar to how it works in Node Core.
An important thing to keep in mind is, if you have multiple requests queued in the pipeline, and the one at the head of the pipelining queue fails, then Undici will automatically kill the connection and retry everything after the one that failed, if those requests are idempotent, like get requests and put requests. Otherwise, you will have to retry things yourself. Now, there are some other APIs we have been experimenting with in Undici. One of the downsides of the request API is that it always creates this readable glue stream. So, basically, Undici reads from the socket, parses it with the LLHTP, and then writes it to the readable stream, and then you read it from the readable stream to use it. And this has this unnecessary performance overhead of the glue. You can avoid that by using the stream API, which basically allows you to return a writable stream where the response should be written to, instead of this intermediate glue stream. As you noticed before, there is a closure always created here.
5. Undici API and Dispatchers
We provide an option to send in an OPAC property for micro-optimization. We have a UDG pipeline for easily creating transform pipelines for HTTP requests. Undici is extensible and uses dispatchers to handle different kinds of requests. There are different types of dispatchers available, including the undici client, undici pool, balanced pool, undici agent, and proxy agent. All these methods use the global dispatcher by default, but you can pass a dispatcher in the options to use a different one.
So, we provide an option to send in an OPAC property, which will be provided in the callback. And that way, the V8 engine doesn't need to create the closure. This is a little bit of micro-optimization, but the option is there.
We also have a UDG pipeline, which is in order to, you know, easily create the transform pipelines that are HTTP requests. So, you post data into a HTTP request and then you get a data body back. And then you can use it with stream pipelines, for example.
You don't actually need this. You can achieve the same thing with, you know, undici requests and a generator. It's a little bit more extra code, but this is just as good. This is also one of the reasons why we are developing undici outside of core is in order to experiment with different API ideas. Now undici is very extensible. It basically uses the concept of a dispatcher to dispatch different kinds of requests. The API is rather simple. You need to implement these methods and also the one event called brain. I will not go into details there. But we provide a few different types of dispatchers that are usable.
So, first, we have the undici client which is the most basic one. It provides you with a single connection to the server and it provides support for keep alive and also allows you to follow redirect responses. Then we have the undici pool which creates multiple clients and then dispatches requests on multiple connections and it uses depth first scheduling. If you have a client and every client has a pipelining value of two, it will first fill the pipeline before going to the next connection. We have a balanced pool while the previous pool will always connect to the same server. So it will have multiple connections to the same server. The balanced pool allows you to connect to different servers and have requests balanced over those and that also uses step first scheduling. The undici agent is built upon the pool and it allows you to have multiple different origins and it will also follow cross origin directs. We also have a proxy agent which allows you to proxy through servers. I won't go into details there. And all of these methods have been showing you the undici request. They use something we call the global dispatcher which is default and you can change it yourself. And then all calls that don't explicitly set on a dispatcher will use that. But you can also pass a dispatcher in all of these APIs in the options and then that dispatcher will be used.
6. Customization and Indichi in Core
We support follow redirects and allow customization of server connections. You can provide a connect method to implement your own custom socket or DNS lookup. We have been discussing including Indichi in core, considering the pros and cons. Implementing fetch based on Indichi is in progress, with experimental support in Node 17. Contributions to the fetch implementation are welcome. There are some differences with the Web Bear, particularly in garbage collection in Node.js.
As I mentioned before, we support follow redirects and you just pass max redirections, so how many redirects you allow and Unleash will take care of that. We can also customize how we connect the servers. So you can provide to some of the dispatcher implementation a connect method, which as you can see, it takes some parameters and then you can have your own custom socket implementation or custom DNS lookup stuff.
So for example, if you have a quick implementation, you could run HTTP over quick here by just providing an object that looks like a socket, but goes over quick. Some of the dispatcher has this factory methods. So for example, pool or agent, you can, these use dispatchers internally, so they have their own logic and then they dispatch onto a sub dispatcher and then you can change how that would be. So in the agent case, you know, the default factory checks if, okay, how many connections have you passed? If it's one, then it uses a client because that has some less overhead because you don't have to manage connections. Otherwise it uses a pool, but here you could use your custom implementation and reuse the agent logic.
We have been discussing including Indichi in core. It's an ongoing discussion with pros and cons, so why we should include it or why not. It has, yeah, I would just say look at the issue and read through. There are a lot of thoughts there. One of the important advantages of having it outside of core has been the development velocity. But as things get more stable and more thought through, there might be time to include in core. Now we have also spent a lot of work implementing fetch based on Indichi, and we are actually landing experimental support in note 17. So it's under experimental fetch flag. So now we're getting close to finally having a fetch in core. If you guys want to help out making it non-experimental, then please head over to Indichi repo and help us improve the test And there are always spec compliance edge cases that can be found and sorted out. It's quite easy to contribute to the fetch implementation in Ditche. We are using a very literal way of implementing the spec in Indichi. So basically we take the spec and try to as closely as possible follow the spec steps as literally as possible. So if you want to help out, just read the spec. There are some to dos here that you can see if you can figure out. But it's reasonably easy to contribute. We do have some differences with the Web Bear and primarily how garbage collection works. In the browser fetch will automatically garbage collect. I was talking about that you have to always consume the body. That's the same in fetch. The browser automatically does that during garbage collection. But garbage collection in Node.js works a little bit differently.
7. Focus on Usability and Future Work
We are currently focusing on usability and compliance, with performance improvements in the pipeline. Future work includes support for HTTP 2 and 3, DNS lookup for balance pool and pool support, and enhancements to fetch and pool scheduling. We also aim to address the lack of support for HTTP 100 continue requests in the HTTP 1 spec. Thank you for your attention.
So, we can't do that. We are still missing file. Implementing file from the core in order to do full spec compliance. And right now, we are focusing on the usability, making it easy to maintain and contribute and compliance. Performance is not the priority at the moment. But we are, of course, looking at improving that as well. But I personally would recommend to use the other APIs if performance is the important part. I think fetch is good for usability and cross browser and Node compatibility.
Future work. Of course, we would love to support HTTP 2 and 3. So, that's something we're working on. It would be great to have DNS lookup for support for balance pool and the pool. We're looking at improving fetch. Also, the pool scheduling, the way requests are dispatched and scheduled over multiple connections. There are interesting things you can do there. Should it be depth first, breadth-first scheduling? Should we weight different connection based on their latency and how many times request has failed on a connection or upstream? So, there are a lot of interesting things to do there. And we're still lacking support for HTTP 100 continue requests in the HTTP 1 spec.
And that's all I have. If you have any questions, I hope to be able to answer them. And that's it. Thank you. It seems like we have a clear winner. So, 48% of people use fetch. Are you surprised by this at all? Both yes and no. I think fetch is the API with. We just recently landed it as experimental in Node.js. So I guess that move was good given this result. And personally, I'm more performance minded. So I use a request which is the least common one here in the voting, which also is quite surprising. So yeah, that's a nice hint where to put most energy into.
Node Fetch Implementation and Q&A
The Fetch in Node is built on top of undici, not the normal Node HTTP code. Nock doesn't work with undici, but undici has its own mocking implementation. NodeCore does not currently support Qwik, but there are plans to add support in undici. Benchmarks show that reusing connections in undici is significantly faster than opening and closing them all the time.
Yeah, that sounds great. Our audience has a lot of questions for you. But I have even more. One thing, so you talked about the Node implementation of Fetch. And I'm also quite excited about that. I hope everyone is. Does it use undici? Does it have anything in common?
Yes. So actually, the Fetch in Node is built on top of undici. We don't expose anything from undici other than Fetch in Node, and that's a conscious decision in order to not to expose two experimental APIs that can later change and break. But the implementation of Fetch is done on top of undici, and not the normal Node HTTP code.
That's awesome. I suppose that everybody running Node would essentially be in undici now. So that's the best way to…
Oh, we have a question by our speaker, Joda Developer, who asked, how to mock a request in undici? So for example, there is a Nock library that allows you to mock requests. So if you're using undici requests, have you tried mocking it? And how would you do that?
So Nock doesn't work because it depends on the internals of the HTTP client. We have mocking implemented in undici. So there's mock client and mock pool and the tools that are required to basically do the same thing you would do with Nock. So it's all there.
So NodeCore does not support Qwik yet. I think it landed as experimental, but then was later removed. There is a PR that James is working on for landing it. Unless I missed something, but once that's landed, it should be quite easy to add support for that in undici to implement HTTP1 over Qwik. Implementing HTTP3 support is a little bit more work.
That sounds really interesting. Thank you. Another question is, is there any benchmark on the overhead from creating a new connection every time as opposed to the undici approach?
So we have some benchmarks included. And yeah, it's hard to say where all our performance improvements come from. And again, it depends on your use case. But I would say it's significantly faster to keep their reuse connections than opening and closing them all the time, especially in terms of latency.
Differences in TCP and the Future of HTTP in Node
Are there any notable differences across operating systems in terms of TCP implementations? The current plan is to include Undici into Node Core, but there are considerations regarding stability, API surface, and the future of HTTP in Node. Adding Web APIs and standards to Node Core is a debated topic, with arguments for both tailored and uniform solutions. Fetch, although specific to browsers, improves developer experience and familiarity. It may have lower performance compared to other APIs, but it offers convenience for quick implementation.
Are there any notable differences across operating system to operating system? Like do the TCP implementations actually make a difference here? In terms of CPU overhead, yeah maybe. But you still need to perform the actions required by the TCP handshake. So it's more of a network protocol thing.
Okay. Well, the next question is, what is the current plan or progress including Undici into Node Core? We just talked about fetch and how fetch is running on top of Undici. I remember in the past there have been multiple requests to add things like OWG streams to Node. Do you think this fetch milestone that we achieved would actually open the door to a number of other things? The primary focus has been to implement all the web standards into Node.js because those are well specified and there's not so much breaking changes that you expect in the future. Once we have if and once we land Undici into Core that does put some restrictions on what we can do in Undici and how quickly we can do things. So there's always a trade-off between having it in Node Core and giving it a mark of we won't break stuff or having it outside and keep up the development velocity on it. I think we are closing, getting near to the point where Undici feels stable in terms of the API surface. So it might be worth the pulling it into Core. But I would like to see a bigger plan in terms of both server and client side on how we want the future of HTTP look in Node before we start including new APIs.
Yeah, that is really interesting. One thing, so, you mentioned about adding more of these Web APIs and then Web standards, so to say, into Node Core. Now, this is a hotly debated thing, right? Not everybody agrees that adding things like Fetch to Node is a good idea. One of the reasons that they used to support that is by saying that Fetch at all are designed by bodies like WG, who focus on browser use cases, and so an alternative HTP implementation that cares more about server-side cases would be more appropriate for Node. How do you think this stacks up? And for example, there's so many things in Fetch which are specific to browsers, like cookie jars. How does that work in DG? Yeah, I mean this is always a trade-off between making a tailored solution to a uniform solution. I think a lot of the things with Fetch is improving the developer experience, making it easy to write good code, right? Because people work in the browser so they need to know Fetch anyway. And they know the ins and outs and the edge cases there. And then having to learn a third and fourth and fifth and sixth API, even if those are better for specific use cases, has some downsides. For example, Fetch, undici Fetch, has lower performance than the other APIs we provide like undici Request. But it's still a good idea to include because it helps with developer familiarity. And developer hours are much more expensive than buying a faster hardware. So I think that's a little bit, that's my take on the whole thing. If you just want to get things done quickly and get it to work, go with undici Fetch. If performance is your top priority, then I would recommend some of the other APIs that are not yet to be included.
Undici Fetch Implementation and Future Plans
There are certain implementation details in Fetch that only make sense in a browser context, like cookies. In Undici, we try to follow the spec literally, even if it's not possible in a Node.js environment. Redirects in Undici Fetch work differently than in the browser, following the approach of CloudFlare and Deno. Garbage collection and consuming responses are also considerations. Undici has inspired fixes in Node Core streams. Future plans for Undici include support for Quick, HTTP 2, and HTTP 3. The File class from the web spec is awaited for full Fetch implementation.
Yeah, that actually sounds really great. Thank you. I think I made the question too long but I tried to hammer in a second part. The suite of feet, there are certain implementation details in Fetch, which only makes sense in a browser context. Things like cookies and so on. So how does that work in Undiji? Good question, and that's something we have to handle on a case-by-case basis. We actually have some, the way we implement the Fetch in Undiji is that we try to follow the spec literally. So we actually implement spec that is not possible to reach in a Node.js environment, hoping that in the future we can either modify the spec or make our own modifications to make things work.
For example, redirects in Undiji Fetch works a little differently than in the browser. We have followed the way CloudFlare and Deno does it. There's also issues like garbage collection that can affect how the implementation behaves and whether it's important to fully consume all responses or not. In Undiji Fetch you have to consume responses while in the browser. It's a little nicer. It will clean up during garbage collection.
Yeah, thanks. That was really insightful. We have another question by the viewers who says, Is Undiji your main drive for the many fixes that you've made to Node.Force streams? Or in other words, was the complexity of streams a roadblock for Undiji development? I guess it was but I didn't even realize it at the time. So I've been trying to, I spent a lot of time working on HTTP streams and the HTTP client and server in Node. And a lot of work was to get streams working in a good way. I had a talk in no TLV about this issue with streams and once I got as far as I could with streams and hit the roadblock with the HTTP stuff, that's when I went onward to Undiji. And yes, there are some fixes in streams inspired by issues I found in Undiji.
Yeah, that's quite interesting. I remember that this thing about sort of reforming notes streams has been such a long topic and people have thought about this for so long. I'm glad that you're making progress on this. Another question that I had for you. So you already have a bunch of interesting features, so to say, in Undici. Is there any other web standard or web feature that's on your radar? Is there something that you plan to do next? A little that we talked about, there's Quick and HTTP 2 and HTTP 3, and that we are looking in Undici, and of course in Node Core we are waiting on the File class from the web spec to be able to fully implement Fetch. Great. Well, that's amazing. Thank you so much Robert, and this was an amazing presentation, and it was quite informative, at least for me, and I hope the same applies to all of our viewers. If you're not done asking all these questions to Robert, you might want to join him in the speaker room, so there's a link to the speaker room. Join him on Spatial Chat. Thank you so much Robert.