Getting Started with Open Source Observability

Rate this content
Bookmark
Slides

Learn how to instrument Node and client applications using OpenTelemetry and popular open-source observability tools like Prometheus and Grafana.

Connor Lindsey
Connor Lindsey
21 min
24 Oct, 2022

Comments

Sign in or register to post your comment.

Video Summary and Transcription

Observability is a crucial tool for engineers to ship reliable websites and have a good developer experience. Metrics, logs, and traces are fundamental data types for achieving observability. OpenTelemetry is an open source standard for observability data, and instrumentation can be done manually or automatically. Data collection and transportation can be handled by packages built by the OpenTelemetry community, and the collector running can add security benefits. Grafana is used for data visualization, and it allows for analyzing web performance, exceptions, and traces. Traces capture user interactions and provide insights into user behavior and error occurrences. Leveraging these tools allows for gaining insights into various parts of the system and improving engineering goals.

Available in Español

1. The Story of StrongBad Inc.

Short description:

Let's talk about the fictional but not entirely unrealistic story of StrongBad Inc., a young software company. Strong Bad Inc. is an email advice platform where people can go online and they can ask questions which Strong Bad will answer. On his first day, after getting his dev environment set up, Homestar Runner was super excited to pick up his first ticket and add a tags and input to the email submission form. They found a fix and were able to ship it to production and hooray, the website was back up. As engineers, we want to ship reliable websites, debug issues quickly and confidently, and have a good developer experience. Observability is one tool in our toolbox to help achieve those goals.

Let's talk about the fictional but not entirely unrealistic story of StrongBad Inc., a young software company. My name is Connor Lindsey and I'm a software engineer at Grafana Labs.

Strong Bad Inc. is an email advice platform where people can go online and they can ask questions which Strong Bad will answer. Like, can you draw a dragon or do you like techno music? Strong Bad is a good developer but he wanted a little bit of extra help around the office so he brought on his friend Homestar Runner to help out with some development tasks.

On his first day, after getting his dev environment set up, Homestar Runner was super excited to pick up his first ticket and add a tags and input to the email submission form. Everything was great, opened his first PR, it got approved and everyone was ecstatic when it got merged into production. He was so excited to be contributing code on his first day. Only to find out after they came back from lunch that the website had been taken down. They had no idea what was going on so they walked over to customer support to see what was happening and indeed they confirmed that the system was down. After a couple of hours of stressful debugging, they found a fix and were able to ship it to production and hooray, the website was back up. But they were left wondering what caused the bug. Even after debugging, they weren't 100% sure what the root cause was. Even with the fix, they weren't left with a lot of confidence as to what was happening and how they could prevent that in the future.

So Homestar was wondering to himself, there's got to be a better way. As engineers, we have the same goals. We want to ship reliable websites. When we have issues, we want to debug them quickly and confidently, understanding the root cause of what is going on. We want performance websites and we want to understand performance issues that are happening so we can improve them. Ultimately, we want reliable and performance sites so we can have a positive user experience. So that people can go to our websites and successfully accomplish what they came there to do. We also want to have a good developer experience. It should be as painless as possible. To write, ship, and maintain code. I think one of the most frustrating things as a developer is trying to reproduce a really, really difficult to reproduce bug where you just have no leads, no idea where to get started. And are left, you know, kind of wandering aimless through your logs, through whatever kind of breadcrumbs you have to get started. So, observability is one tool in our toolbox that we can reach to to help achieve some of those goals. This concept comes from control theory. And a formal definition would be having the ability to infer the state of an internal system based on its outputs. So, when we think about our front end applications running on end user devices, they're kind of like a black box.

2. Types of Observability Data

Short description:

We can achieve insight into how our applications are running in production by using observability. Metrics, logs, and traces are fundamental data types that help achieve observability. Metrics provide aggregated data types, but traces and logs are important for more detailed insights. Traces give a detailed overview of request flow across service boundaries, while logs provide a linear stream of events. Having all these data types and correlating them is crucial for achieving observability.

We don't have full control or insight into what is happening when they're running. You know, it works on my machine, but I have no idea why it's broken on a customer's. And so, observability as a concept can help us gain insight into how our applications are running in production in the same way that we do when we're running them locally. You know, where we have full insight into exactly what's happening. We have all of our logs, we have our dev tools, we can see the network tab, et cetera.

We want to achieve some of those same tools and some of those same insights in production that we have when run locally. So, when talking about observability, let's talk about some of the tools, some of the things that we can reach for to achieve that goal. And one thing are different types of data that we can collect. Metrics, logs, and traces are kind of the fundamental data types that we have to work with. These in and of themselves do not mean that our systems are observable. But they're starting points. They're tools that we can work with to achieve observability.

So, metrics are numeric or aggregated data types. And because of aggregations that occur to get these metrics, you lose some level of detail. Which is why traces and logs are really important accompanying tools that we can reach for. So, some metrics that we can see on this dashboard are things like memory versus CPU usage. The number of requests that our servers are getting. Or for the frontend, distribution of page load times.

The next data type are traces. Traces, similar to a stack trace, will give you an overview of a request as it passes through your system. Unlike a stack trace when you are running a single application, a distributed trace, when talking about observability, can go across service boundaries. So, for example, you could see a trace of what happens when a user logs into your website. Well, some operations occur on the frontend, which then makes an HTTP request to your backend, which then go and makes a request to a caching layer to a database, for example. Traces give you a lot of detail of how the request flows through all of those systems and can be really, really useful.

Next, we'll look at logs. These are a linear stream of events, which are really handy to have but can often feel like a firehose. So, having good formatting and a good log aggregation tool make logs a lot more useful and a lot easier to work with. Individually, each of these data types are really powerful, but they have different pros and cons as to the type of information that they're showing, their cost, their performance implications, you know, what it takes to operate them. And so, having all of them and being able to correlate them is super, super useful. For example, when we think about a metric, it's this aggregated high-level overview of a single data point, whereas a trace is a single request that's passing through a system.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

Levelling up Monorepos with npm Workspaces
DevOps.js Conf 2022DevOps.js Conf 2022
33 min
Levelling up Monorepos with npm Workspaces
Top Content
Learn more about how to leverage the default features of npm workspaces to help you manage your monorepo project while also checking out some of the new npm cli features.
Automating All the Code & Testing Things with GitHub Actions
React Advanced Conference 2021React Advanced Conference 2021
19 min
Automating All the Code & Testing Things with GitHub Actions
Top Content
Code tasks like linting and testing are critical pieces of a developer’s workflow that help keep us sane like preventing syntax or style issues and hardening our core business logic. We’ll talk about how we can use GitHub Actions to automate these tasks and help keep our projects running smoothly.
Fine-tuning DevOps for People over Perfection
DevOps.js Conf 2022DevOps.js Conf 2022
33 min
Fine-tuning DevOps for People over Perfection
Top Content
Demand for DevOps has increased in recent years as more organizations adopt cloud native technologies. Complexity has also increased and a "zero to hero" mentality leaves many people chasing perfection and FOMO. This session focusses instead on why maybe we shouldn't adopt a technology practice and how sometimes teams can achieve the same results prioritizing people over ops automation & controls. Let's look at amounts of and fine-tuning everything as code, pull requests, DevSecOps, Monitoring and more to prioritize developer well-being over optimization perfection. It can be a valid decision to deploy less and sleep better. And finally we'll examine how manual practice and discipline can be the key to superb products and experiences.
Why is CI so Damn Slow?
DevOps.js Conf 2022DevOps.js Conf 2022
27 min
Why is CI so Damn Slow?
We've all asked ourselves this while waiting an eternity for our CI job to finish. Slow CI not only wrecks developer productivity breaking our focus, it costs money in cloud computing fees, and wastes enormous amounts of electricity. Let’s take a dive into why this is the case and how we can solve it with better, faster tools.
The Zen of Yarn
DevOps.js Conf 2022DevOps.js Conf 2022
31 min
The Zen of Yarn
In the past years Yarn took a spot as one of the most common tools used to develop JavaScript projects, in no small part thanks to an opinionated set of guiding principles. But what are they? How do they apply to Yarn in practice? And just as important: how do they benefit you and your projects?
In this talk we won't dive into benchmarks or feature sets: instead, you'll learn how we approach Yarn’s development, how we explore new paths, how we keep our codebase healthy, and generally why we think Yarn will remain firmly set in our ecosystem for the years to come.
Atomic Deployment for JS Hipsters
DevOps.js Conf 2024DevOps.js Conf 2024
25 min
Atomic Deployment for JS Hipsters
Deploying an app is all but an easy process. You will encounter a lot of glitches and pain points to solve to have it working properly. The worst is: that now that you can deploy your app in production, how can't you also deploy all branches in the project to get access to live previews? And be able to do a fast-revert on-demand?Fortunately, the classic DevOps toolkit has all you need to achieve it without compromising your mental health. By expertly mixing Git, Unix tools, and API calls, and orchestrating all of them with JavaScript, you'll master the secret of safe atomic deployments.No more need to rely on commercial services: become the perfect tool master and netlifize your app right at home!

Workshops on related topic

Deploying React Native Apps in the Cloud
React Summit 2023React Summit 2023
88 min
Deploying React Native Apps in the Cloud
WorkshopFree
Cecelia Martinez
Cecelia Martinez
Deploying React Native apps manually on a local machine can be complex. The differences between Android and iOS require developers to use specific tools and processes for each platform, including hardware requirements for iOS. Manual deployments also make it difficult to manage signing credentials, environment configurations, track releases, and to collaborate as a team.
Appflow is the cloud mobile DevOps platform built by Ionic. Using a service like Appflow to build React Native apps not only provides access to powerful computing resources, it can simplify the deployment process by providing a centralized environment for managing and distributing your app to multiple platforms. This can save time and resources, enable collaboration, as well as improve the overall reliability and scalability of an app.
In this workshop, you’ll deploy a React Native application for delivery to Android and iOS test devices using Appflow. You’ll also learn the steps for publishing to Google Play and Apple App Stores. No previous experience with deploying native applications is required, and you’ll come away with a deeper understanding of the mobile deployment process and best practices for how to use a cloud mobile DevOps platform to ship quickly at scale.
MERN Stack Application Deployment in Kubernetes
DevOps.js Conf 2022DevOps.js Conf 2022
152 min
MERN Stack Application Deployment in Kubernetes
Workshop
Joel Lord
Joel Lord
Deploying and managing JavaScript applications in Kubernetes can get tricky. Especially when a database also has to be part of the deployment. MongoDB Atlas has made developers' lives much easier, however, how do you take a SaaS product and integrate it with your existing Kubernetes cluster? This is where the MongoDB Atlas Operator comes into play. In this workshop, the attendees will learn about how to create a MERN (MongoDB, Express, React, Node.js) application locally, and how to deploy everything into a Kubernetes cluster with the Atlas Operator.
Azure Static Web Apps (SWA) with Azure DevOps
DevOps.js Conf 2022DevOps.js Conf 2022
13 min
Azure Static Web Apps (SWA) with Azure DevOps
WorkshopFree
Juarez Barbosa Junior
Juarez Barbosa Junior
Azure Static Web Apps were launched earlier in 2021, and out of the box, they could integrate your existing repository and deploy your Static Web App from Azure DevOps. This workshop demonstrates how to publish an Azure Static Web App with Azure DevOps.
How to develop, build, and deploy Node.js microservices with Pulumi and Azure DevOps
DevOps.js Conf 2022DevOps.js Conf 2022
163 min
How to develop, build, and deploy Node.js microservices with Pulumi and Azure DevOps
Workshop
Alex Korzhikov
Andrew Reddikh
2 authors
The workshop gives a practical perspective of key principles needed to develop, build, and maintain a set of microservices in the Node.js stack. It covers specifics of creating isolated TypeScript services using the monorepo approach with lerna and yarn workspaces. The workshop includes an overview and a live exercise to create cloud environment with Pulumi framework and Azure services. The sessions fits the best developers who want to learn and practice build and deploy techniques using Azure stack and Pulumi for Node.js.