Push Notifications: Can’t Live With Em, Can’t Live Without Em

Rate this content

Consider how many notifications you get per day... and now consider the millions of other people who are also receiving notifications. 16 million notifications a day that have places to be and people to see, in a race against time, load, and latency.

So what happens behind the scenes to ensure that all those notifications get where they need to go, and quickly? A combination of auto-scaling, rabbitMQ, scrupulous monitoring, and a tireless dev team. In this talk I’ll discuss the functionality of the message bus at the core of Vonage’s communications platform, the powerful Node scripts that power the entire operation, and how you can use similar solutions for a number of different challenges.

9 min
24 Jun, 2021

AI Generated Video Summary

The Talk explores the journey of a notification in a communications platform, highlighting the challenges of infrastructure engineering. Trace IDs and local storage play a crucial role in ensuring the arrival of notifications, allowing for easy debugging if they don't reach the device. The logs demonstrate the journey of a notification, reaching the app store in just 4 milliseconds.

1. The Journey of a Notification

Short description:

When I was growing up, I loved a children's book about a louse who travels the world and finds a perfect match. This story reminded me of the journey of a notification in our communications platform, which must arrive in milliseconds. Let's explore the real-life challenges of infrastructure engineering and the steps involved in delivering notifications to different devices, including desktop and mobile apps.

When I was growing up in the USA to Israeli parents, most of my books and movies from an early age were actually in Hebrew, of course, to teach me the language before the English took over my brain.

One tape that I particularly loved was called Hakina Nekhama, Nekhama the Louse, which if you're not familiar with this word, is the singular version of lice. Gross, I know.

The story told of this louse who decided that she doesn't want to stay in one head forever. She wants to get out there and travel the world, see different heads and different cities. But obviously everybody hates her and wants her gone, and she journeys tirelessly from one head to another until she accidentally lands on the head of a bald man, who is actually very excited to have her because now he's got the same problem as people with hair. They become friends and live happily ever after.

Now this is obviously a ridiculous story, but when I joined Vonage last year and learned about our communications platform and how many messages it handles per day, I considered the one-in-a-million little notification making its way on a rapid fire journey of 5, 10, just milliseconds arriving to where it needs to be, it suddenly made me think of that pink little louse I loved as a kid, one-in-million, who with purpose and ambition eventually made it to the head of the bald man who wanted her, of her perfect match, precisely where she needed to be. Miraculous, isn't it?

Here's the thing though, in the world of children's books, it's happily ever after, but in real life, things don't go as planned, objects get lost, notifications never make it to their destination, so let's talk about this real life of infrastructure engineering, of tracing our steps and ensuring that we know what's going on at every point of the dangerous journey. But first, a little context.

The Vonage Business Communications platform, the system I'll talk about today, offers capabilities for voice, messaging, video, and voicemail on both desktop and mobile. The platform is home to 150,000 users who through all these functionalities produce 800 notifications per second. And the kicker? All those notifications get to where they need to be in 15 milliseconds, because that's the kind of standards we're used to nowadays. So what does this 15 millisecond journey of a cute little notification look like? Let's start at the beginning.

Say you're on your computer using the desktop app. Your laptop, through the client, connects to the messaging infrastructure that we call the bus station and makes an introduction, Hey, I belong to John's computer. I'd like to sign up for notifications. The bus station pings its API, which we named Frizzle because the children's book references never end, to tell it about the new connection. All the identifying information gets stored. Frizzle communicates with the message broker, which creates a new queue for your particular user and with the help of the message protocol determines which queue to put your messages in. The protocol returns a URL and thus a WebSocket connection is formed between your client and this entire thing that we call the bus. Now on the other end, a message gets sent to you. It hits the Frizzle API belonging to the bus, and Frizzle sends the notification to RabbitMQ, and the Message Protocol sends it to your device. Cool.

But you're wondering now, what about cell phones? The mobile app? I can't exactly maintain a WebSocket connection with the phone at all times, can I? So what about those notorious push notifications from the title of my talk? And more importantly, how am I following the progress of those notifications because with 800 notifications per second, it can be very easy to quickly lose track. Well, when you fire up your mobile app, a similar flow happens. Your phone connects to the bus station and introduces itself with your ID and information about your device and something called a push token, assuming you clicked Allow Notifications in the app. The API handles and stores away this information, and you are now signed up to receive notifications. Now, when Frizzle sends the notification to your queue, that notification also gets sent in parallel to what we call the bus HTTP service. Written in Node, the service knows whether you've signed up to receive push notifications.

2. The Role of Trace IDs and Local Storage

Short description:

If you have a notification and want to guarantee its arrival, trace IDs in the logs play a crucial role. By using a continuation local storage package or CLS hook, you can save the trace ID in local storage, making it accessible for any log or request at any point in the flow. In the HTTP service, middleware grabs the trace ID from the headers and adds it to the session's local storage. The trace ID is then included in logs and requests, ensuring that the logger doesn't need to know about it. Finally, when the notification is sent to PushMe, the trace ID is tracked in the headers, allowing for easy debugging if the notification doesn't reach the device. These logs demonstrate the journey of a notification, reaching the app store in just 4 milliseconds.

If you have, the notification and your information gets sent to PushMe, another Node service for push notifications. This service has a database of those push tokens mapped to each user, which tell it which operating system and which device to send that push notification to. Now with an infrastructure in place, how can we guarantee arrival for 16 million daily notifications? The obvious answer is the trace IDs in the logs, ensuring that the same ID is always used for the specific notification in each service, right? But how do we do that while separating business logic from infrastructure? We don't want a trace ID showing up all over our code now, do we? A handy NPM package comes to the rescue.

And this part's important even if you're not trying to build a massive communications platform. I'm talking about a continuation local storage package or CLS hook that uses async hooks in order to maintain something like a local storage for each request session. If you're already using node 14, though, this functionality is built in with an experimental native API. It allows for the trace ID to be saved in local storage so that it can be retrieved for any log and any requests at any point of the flow.

So, let's consider this part of the diagram. Imagine a push notification headed for your phone telling you it got a message from Jonathan. As we know, Frizzle sends this notification as a request to the bus HTTP service. Upon arriving to the HTTP service, middleware grabs the trace ID from its headers and places it in the session's local storage. Various logs are written as different things happen, like asserting a rifle, for example, and finding information about the user's device. And then each time, that trace ID is added to those logs before they're written. Assuming you're accepting push notifications, the interpreter grabs the notification right before it heads to PushMe and tracks the trace ID to the headers of the request. What's cool is that CLSHooked can be used for other things, not just logs, like in a situation where you need the user's details in every step throughout the flow of your service. But let's see what it looks like in our code, which has all been taken from the HTTP service. We first create an instance of the middleware like this, and we retrieve it later like this. So when a request arrives at the service and is intercepted by the tracing middleware, which checks whether it has a trace ID, assuming it arrived from Frizzle, it should, then it saves that ID to the local storage instance. And at this point we're handling business logic, right? We're checking if we should send this guy a push notification. At each point of the way where a log is to be written and before it gets sent, it will be caught by the middleware here, which will add the trace ID by taking it from the local storage and then writing the log. Now the same interpreter sits on Axios, the HTTP client, so that every time a request is about to go to another service, we catch that HTTP request right before and add the trace ID to its header. And all of this ensures that the logger doesn't need to know anything about the trace ID. Cool, right? When PushMe finally sends that notification to your device, we'll have that final log with the trace ID in question and we know we sent that notification to the app store. That way, if a notification was sent to your phone and you didn't get it, we can pretty much blame Apple, because don't we love blaming Apple? And this is what a series of logs look like for a notification of an inbound call. If you look at the timestamps, you'll notice that it's going from Frizzle to HTTP service to PushMe in a matter of 4 milliseconds. So this may not be the heroic journey of little cute parasite around the world, but hey, for all you know, these could very well be the logs of a little notification that made it around the world in under 15 milliseconds.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Node Congress 2022Node Congress 2022
26 min
It's a Jungle Out There: What's Really Going on Inside Your Node_Modules Folder
Do you know what’s really going on in your node_modules folder? Software supply chain attacks have exploded over the past 12 months and they’re only accelerating in 2022 and beyond. We’ll dive into examples of recent supply chain attacks and what concrete steps you can take to protect your team from this emerging threat.
You can check the slides for Feross' talk here.
Node Congress 2022Node Congress 2022
34 min
Out of the Box Node.js Diagnostics
In the early years of Node.js, diagnostics and debugging were considerable pain points. Modern versions of Node have improved considerably in these areas. Features like async stack traces, heap snapshots, and CPU profiling no longer require third party modules or modifications to application source code. This talk explores the various diagnostic features that have recently been built into Node.
You can check the slides for Colin's talk here. 
JSNation 2023JSNation 2023
22 min
ESM Loaders: Enhancing Module Loading in Node.js
Native ESM support for Node.js was a chance for the Node.js project to release official support for enhancing the module loading experience, to enable use cases such as on the fly transpilation, module stubbing, support for loading modules from HTTP, and monitoring.
While CommonJS has support for all this, it was never officially supported and was done by hacking into the Node.js runtime code. ESM has fixed all this. We will look at the architecture of ESM loading in Node.js, and discuss the loader API that supports enhancing it. We will also look into advanced features such as loader chaining and off thread execution.
JSNation Live 2021JSNation Live 2021
19 min
Multithreaded Logging with Pino
Almost every developer thinks that adding one more log line would not decrease the performance of their server... until logging becomes the biggest bottleneck for their systems! We created one of the fastest JSON loggers for Node.js: pino. One of our key decisions was to remove all "transport" to another process (or infrastructure): it reduced both CPU and memory consumption, removing any bottleneck from logging. However, this created friction and lowered the developer experience of using Pino and in-process transports is the most asked feature our user.In the upcoming version 7, we will solve this problem and increase throughput at the same time: we are introducing pino.transport() to start a worker thread that you can use to transfer your logs safely to other destinations, without sacrificing neither performance nor the developer experience.

Workshops on related topic

Node Congress 2023Node Congress 2023
109 min
Node.js Masterclass
Have you ever struggled with designing and structuring your Node.js applications? Building applications that are well organised, testable and extendable is not always easy. It can often turn out to be a lot more complicated than you expect it to be. In this live event Matteo will show you how he builds Node.js applications from scratch. You’ll learn how he approaches application design, and the philosophies that he applies to create modular, maintainable and effective applications.

Level: intermediate
Node Congress 2023Node Congress 2023
63 min
0 to Auth in an Hour Using NodeJS SDK
Passwordless authentication may seem complex, but it is simple to add it to any app using the right tool.
We will enhance a full-stack JS application (Node.JS backend + React frontend) to authenticate users with OAuth (social login) and One Time Passwords (email), including:- User authentication - Managing user interactions, returning session / refresh JWTs- Session management and validation - Storing the session for subsequent client requests, validating / refreshing sessions
At the end of the workshop, we will also touch on another approach to code authentication using frontend Descope Flows (drag-and-drop workflows), while keeping only session validation in the backend. With this, we will also show how easy it is to enable biometrics and other passwordless authentication methods.
Table of contents- A quick intro to core authentication concepts- Coding- Why passwordless matters
Prerequisites- IDE for your choice- Node 18 or higher
JSNation Live 2021JSNation Live 2021
156 min
Building a Hyper Fast Web Server with Deno
Deno 1.9 introduced a new web server API that takes advantage of Hyper, a fast and correct HTTP implementation for Rust. Using this API instead of the std/http implementation increases performance and provides support for HTTP2. In this workshop, learn how to create a web server utilizing Hyper under the hood and boost the performance for your web apps.
JSNation 2023JSNation 2023
104 min
Build and Deploy a Backend With Fastify & Platformatic
Platformatic allows you to rapidly develop GraphQL and REST APIs with minimal effort. The best part is that it also allows you to unleash the full potential of Node.js and Fastify whenever you need to. You can fully customise a Platformatic application by writing your own additional features and plugins. In the workshop, we’ll cover both our Open Source modules and our Cloud offering:- Platformatic OSS (open-source software) — Tools and libraries for rapidly building robust applications with Node.js (https://oss.platformatic.dev/).- Platformatic Cloud (currently in beta) — Our hosting platform that includes features such as preview apps, built-in metrics and integration with your Git flow (https://platformatic.dev/). 
In this workshop you'll learn how to develop APIs with Fastify and deploy them to the Platformatic Cloud.
React Summit 2022React Summit 2022
164 min
GraphQL - From Zero to Hero in 3 hours
How to build a fullstack GraphQL application (Postgres + NestJs + React) in the shortest time possible.
All beginnings are hard. Even harder than choosing the technology is often developing a suitable architecture. Especially when it comes to GraphQL.
In this workshop, you will get a variety of best practices that you would normally have to work through over a number of projects - all in just three hours.
If you've always wanted to participate in a hackathon to get something up and running in the shortest amount of time - then take an active part in this workshop, and participate in the thought processes of the trainer.