Data Visualization for Web Developers

Rate this content

In this workshop, through hands-on projects we'll learn how to use Observable, a browser-based reactive coding platform, to rapidly build insightful, interactive visualizations in JavaScript. After completing this workshop, you'll have the basic tools & techniques you need to start using dataviz to better understand your code, your projects & your users, and make better data-driven decisions as a developer.

139 min
16 Jun, 2021


Sign in or register to post your comment.

Video Summary and Transcription

Today's workshop focused on data visualization in web development, highlighting its value in understanding user needs, optimizing feature development, and analyzing code bases. The workshop covered various topics such as data wrangling, mapping abstract data values to visual values, and using tools like D3 and Vega-Lite. Participants learned how to create visualizations using Observable notebooks, plot library, and different types of plots. The exercises included working with user behavior and device types, API responses, and adding interactivity to charts. Overall, the workshop emphasized the importance of data visualization in making informed decisions and accessing insights and analytics.

Available in Español

1. Introduction to Data Visualization

Short description:

Welcome, everyone. Today we'll explore data visualization and learn how to create meaningful visualizations without being an expert. We'll discuss what DataVis is and then dive into hands-on exercises to build our own visualizations.

Welcome, everyone. I hope you're all doing well. My name is Anjana Vakil. I am a Developer Advocate at Observable, which is a company that makes a JavaScript notebook environment for doing really fast, and quick, and hopefully easy data visualization in the browser, which we're going to be doing today. We're going to be working in the browser to do some data visualization. So you'll learn a lot more about that.

And I am usually based in San Francisco, California, in the US. But right now, I am coming to you from New York City. I can say a couple more words about myself. So I work as a developer advocate now. I have a background in not in computer science, but in the humanities. So in philosophy and in linguistics, and the social science of linguistics. And so I had some experience with Python and doing a little bit of data analysis and data visualization in that before I became a kind of full stack JavaScript developer a few years ago. working at Observable and kind of back to thinking about data and data visualization. But I definitely am not an expert in visualization. I work with a lot of experts in visualization, so they tell me what I'm wrong about things, which is great. But what I want to convey today is that you don't have to be an expert in JavaScript, in data science, in data visualization to create meaningful visualizations. So what we're going to do today is we're going to talk a little bit about what DataVis is. And I'm going to use DataVis a lot, because saying visualization takes forever. And then we're going to get really hands-on, and we're going to spend most of the workshop working on building our own visualizations. So we'll talk more about how we're going to do that in a moment.

We are going to probably take what we're going to be working, we're going to be alternating kind of talking as a group and working on our on our own hands free. So feel free to take breaks or turn off your video or do whatever you need to do. We're all in this virtual world together. So no worries if you need to drop out, come back, go off video, whatever. And I think I think with that, that pretty much covers our basis. We will definitely take breaks as needed. But please, you know, feel free to just just holler if you if you need a break or if you have any questions.

2. Data Visualization in Web Development

Short description:

Data visualization is the process of translating data into a visual representation, allowing us to quickly grasp the meaning and patterns in the data. As web developers, data visualization can help us with feature development, understanding performance, and analyzing code bases. By visualizing usage data, we can identify user pain points and prioritize feature development. We can also analyze performance data to optimize our sites and services. Additionally, visualizing code-related data helps us understand the development process and collaborate effectively. These are just a few examples of how data visualization can be valuable to web developers.

Alrighty. So hopefully, everyone can see my screen. So what I'm going to do is try to drop you all the link to these slides just in case anybody is having difficulty finding or seeing the slides. So this... Sorry. One second. Just making sure I have the right link here. Yes, I think so. Goodness, virtual life is hard. No? I think so. It's certainly a lot easier when we could all be in person. But then again, we wouldn't all be able to be scattered all over the world right now if we were at an in-person conference. Okay, virtual presentation, here we go. This link, let me know if it doesn't work for you, but it should be accessible, has the presentation that we're going to be going through. And if I can get my windows straight, okay. So data visualization, what is it, why do we care if we're web developers? And how can we go about starting to use data visualization to get some of our web development work done and level up as developers. That is what we're going to be digging into today. Okay. So what is data visualization? That is a complicated question with a lot of possible answers. One possibly boring, if very correct answer, would say that data visualization is the process of translating data, so some values in a spreadsheet or in a JASON object or something, into a visual, graphical representation of that data. So mechanically that is what we're doing when we're visualizing data. We're taking some numbers or some strings or some whatever values we have, and we're turning them into something we can see. But that's sort of a boring way to describe it. Because, like, what is that? Why is that important? Well, maybe it's important to us because data visualization is this big, fancy topic that lots of people are seeking out, people with skills in, and so it's like a fancy skill that we can learn and we can put on our resumes and put in our portfolios and get exciting new opportunities from, which is totally true. It's still not maybe the core motivation of why we as web developers should probably be interested in data visualization, although it is definitely a great skill to have and something that employers definitely value. So that's a bonus. But really, data visualization is a means to an end. The goal that we have when we have some data is to find meaning in that data, to find insights about the world or in our case, the web development that we're doing through that data. And so visualization, when we can actually see that data, can be a really great way to quickly grasp the meaning and the patterns in that data and what that data means for us in the real world. So that's where more we're getting into the core objective here. And as we said, the reason that we're trying to do that is because if we can quickly take a huge table or JSON file or whatever it is and understand what it means for our day to day lives as developers, we can do our jobs better. We can make better decisions. We can be more productive. We can learn insights about the things that we're building, the users that we're building them for. So that is really the motivation here. That is the idea of what we're trying to do. We are trying to use visuals to quickly discover meaning and patterns and insights from data.

Okay, so looking at data is pretty easy, like we can and let me know if this is too small, but we can we can take some data in a table and we can visualize it in the sense of just laying it out in a tabular format and seeing all the values. But what does this how easy is it to find actual meaning in data that's just laid out in a table like this? I think it's pretty hard, but if we turn that into a visual representation where here we're looking at some build times for integration test jobs, which we're going to talk a lot more about in our next project, we can see like, ooh, this, this, this lint job is running really quickly. But some of these tests on Mac os are taking forever sometimes and, you know, they're, the Ubuntu tests are a little faster. I use Ubuntu, so I like to see Ubuntu performing well. But anyway, that's neither here nor there. The point is, I think, and I don't know if you agree, but I think it's a lot easier to So that is the idea that we're that we're going for, with data visualization here. And of course, in our case, since we're web developers, we're going to be particularly interested in data relevant to our work building websites. OK. So now why, as we would we particularly, we web developers or aspiring web developers, be interested in doing data viz? Or how can it help us directly? So one thing it can help us with is feature development. So as we're trying to figure out new features for our websites, for the for the UIs that we build, for the products. We can look at usage data to see what are people really struggling with, or what are they missing, or what do other sites have that we don't. Like what should we build? What are some of the pain points in our user behavior, in our user workflows? Like what are the different use cases that people coming to our websites have? And how can we prioritize between them to really prioritize our precious development time? So we can we can do data viz before we build something to understand how, what we should build, and how we should prioritize our decisions around it. Then once we have built a feature, we can look at the data afterwards to see like, okay, how well is this being adopted? Is it actually serving the people that we were hoping to serve with it? Are they enjoying it? Are they using it the way we expected? Or are they using it in interesting new ways? And how can we continue to improve on it? How can we keep iterating on these features that we're building? Because if you're as a web developer, you often we're spending a lot of time kind of making gradual improvements and trying to give people a better and better user experience, in a lot of cases. Whether that be on the front end, in the UI, or in the backend, in terms of how people are using our APIs, or being able to get information from our site. So feature development and looking at how people actually use the things we build is one big area where visualization can be helpful. We're going to look at some examples of that in our projects today. Another thing that it can really help us with, and maybe you've seen this around the web world, is understanding performance. Understanding how our sites and our services are moving. Are they moving quickly enough? Are they doing what they're supposed to do, when they're supposed to do it, on a timeline that's reasonable? So how fast are we? How fast are our sites and our services? How reliable are they? Are they giving errors a lot of the time or are they giving people what they want? When they want it. Where are the little pain points, maybe in our performance? Where are the bottlenecks where we could most efficiently focus our energies to give people a more performant experience? And then once we've tried to look at one of those bottles and bottlenecks and make some changes to improve the performance of our site, we can look at the data afterwards to see how well that investment worked out for us, how well our efforts paid off. So performance is definitely another big area where as web developers we can benefit a lot from looking at our data visually. Another way that benefits us as web developers or whatever kind of software developers we may be is to look at data around the code we write, the code bases we work in, and how we collaborate on those code bases. And this is especially true for things like open-source data where we have big contributor communities, but even if we're working on closed-source internally to a company or maybe even on hobby projects, we can use data about the code and our Git commits and our changes to the code to understand things about how the process of development is working out for us. So for example, we can visualize how a code base is organized or how the development workflow is running, like are my teammates able to give me reviews on a timely manner, are the builds that we need to make in order to just have our kind of operational day-to-day as developers go smoothly, are they working well, how productive am I, how productive is my team, is my company, is my collaborator community in terms of are we taking a really long time to get to each other's pull requests, are we pushing a lot of code with a lot of errors that we have to then keep a rerunning, repushing to fix the integration tests and things like that. And if we have, if we do work in the open-source especially, we can understand like how many contributors do we have, how engaged are we, how can we help them have a better experience if we're open-source maintainers or if we're trying to learn how to contribute to a new project. We can understand the community that we're trying to join. So those are just a few of the areas where data visualization, I think can be really valuable to web developers in particular. But I would love to know and if folks want to share any thoughts either coming off mute or in the chat, if folks have seen any other really useful use cases for data visualization in the day-to-day life of a web developer. So if there's other things. I mean, these are just a few examples, but if there's other things that you've seen that well, you know, data is really helped me out in this case like we love to hear about them if you want to drop a note in the chat or come off mute and say hi. Okay, just going to give folks a second think on it and I am going to take a sip of water. Yeah, we use them a lot in like I work for a security company and we use them to make graphs of like attacks to people's products. So there's always like a history of like today, you were tagged once yesterday was three times and things like that and we have always like graphs available and they can filter on the time they want to be showing on the screen and things like that.

3. Data Visualization Use Cases and Data Sources

Short description:

Being web developers, we can use our web development skills together with our DataVis skills to build cool dashboards and tools. Data visualization can help us understand user needs, pain points, and optimize feature development. We can find publicly available data sets on the web for practice or use data from our own projects. Some examples include sports statistics, current events, and data from services like GitHub's API. Check the slides for links to public data sets.

Awesome. No, that's that's a great example. Yeah being able to show kind of the concentration of where where are these sort of hot points of things working or not working or attacks happening? That's that's a really interesting case. Yeah security is another huge one being able to like if we can really quickly see where there might be danger zones or things like that that can definitely save a lot of a lot of pain. Great, great point.

Anybody else wanted to share anything? I just realized I can't hear anyone. Can you hear me? Um, yeah. Okay. So Juan says using DataVis mostly for end product, so creating tools or creating apps where we can check the data. Yeah, and so that's the interesting part about being web developers working with data visualization is we can use our web development skills together with our DataVis skills to build really cool like dashboards and internal tools or maybe public facing tools to surface data on the web to surface data through pages to create custom dashboards and things like that. So that is absolutely a great use case. And yeah, so I've been talking of like for example on this slide. I'm talking a lot about the development process, but I think also, you know, it's really useful as you say for for figuring out like how to build the best user facing products that we can build. So in terms of the feature development like really understanding our users and what they need and what they how they use our sites and how they what pain points they might be having. That is I think another really important area but this is by no means an exhaustive list. These are just some I think really strong use cases for how data visualization can help us out as we're trying to do our work. Okay, cool. Well, feel free to keep sharing more thoughts if folks have them. But meanwhile, let's sort of dig in. We're here to have a hands-on workshop. So the next question if we're sold already on like data vision something I want to try out. Data vision is something I want to get more experience with. Let's see how we can do that. So, of course, first of all, we're going to need some data to visualize. Now. Usually if we're talking about data around our products or our sites or our security or our code bases, we're going to have our own data. But as we're learning data viz, we might not we not but might not be using like production data from our company or something like that. So there are some really great sample data sets. You can find a publicly available on the web to practice data viz skills or to learn about, you know, how to work with different with data around different domains. Or, just for fun on side projects to build a really cool data viz of I don't know, whatever sport you're into or the statistics of your favorite team or current events, you know, things like the coronavirus pandemic has been a in the last year or two, so. Those types of things you can find a lot of great data on the web. So, in the slides, in these links, there are some places where you can look for some data if you need something to start with or some inspiration of a data viz to build, but of course, you can also use data from your kind of developer life. So if you have log data from your services that you run or maybe you have data from GitHub's API or APIs of other hosting services that can tell you about what's going on, we're going to look at a couple examples of the type of data you might have. But I just wanted to flag some public data sets so that if you do sort of want something to get your hands on to get started with a weekend project or what Here are some places you can check out. All of these are live links in the slides that I pasted in earlier.

4. Data Wrangling and Tidy Data

Short description:

Assuming we have some data, our next task is to wrangle it. Data wrangling involves reshaping data, filtering out unnecessary elements, and transforming data using operations like map and reduce. Tidy data, where each row represents one observation and each column corresponds to a single feature, is the easiest to work with. Today, we'll focus on simple data wrangling techniques, but it's important to note that data wrangling is a crucial step in data visualization.

Now, let's assume we have some data, whether it is toy data that we're just playing with. Today, we're going to be looking at a lot of mock data that's kind of like toy examples just for the sake of learning, or whether it's like real data that our boss dumped on us that we have to now figure out what to do with.

Assuming we have some data, our next task is to wrangle it. So when we talk about data wrangling, we're talking about reshaping data. Often data comes with like a lot of stuff we don't care about, and in a format that's not very convenient for us to make a data visualization out of it, or it makes it hard for us to see those patterns and get those insights that we're looking for. So we want to start out by wrangling that data into a shape and and and limiting it to just the content that we need in order to answer the questions we're interested in. And the great news is, once we get our data into JavaScript, as arrays usually of object or array of arrays, we can use JavaScript built-in to reshape that data.

So, for example, I really love this graphic of like how the array methods filter map and reduce work. If we start out, if I want a sandwich at the end of the day, but I start out with just a bunch of whole vegetables, I have some work to do to get from having a bunch of vegetables and a loaf of bread and whatever, to having my nicely packaged sandwich that I can easily eat. So, filter allows us to exclude elements of the data that we don't care about. Like if I don't like cucumbers, I can filter out all of the cucumbers from my sandwich ingredients. Map allows us to transform data from one format to another by giving it some kind of function that's going to be applied to each element in the data. So, if I have a data set with a loaf of bread, a pepper, a tomato, a head of lettuce, and I use the map operation with some kind of function that says, hey, chop each of those, chop them up. Then I get sliced bread, and I get sliced peppers, and sliced tomatoes, and so on and so forth. And now I can more easily work with that because what I want is slices that I can easily layer onto a sandwich. And then in data wrangling, often we end up doing some kind of aggregation or some kind of bringing together disparate elements of data into a unified value. So, for example, reduce is an operation that allows us to take these different slices of vegetables that we've got out of our map function and pass them into reduce with something like a stack, like a layer, layer these things, function. And that's going to combine all those different values into a single value. In this case, in the sake of this very silly graphic, a sandwich that then I can take a nice juicy bite out of. And so, but these are the type of operations that you, if you've been working in JavaScript, you probably had to do this type of manipulation to an array here and there. We're definitely going to be using those kind of operations today, especially filter and map is like I spend a lot of my day writing filters and maps. And of course, there are more sophisticated tools or libraries and toolkits for doing more complex operations to data. Today we're going to be doing some pretty simple data wrangling, but the more you get into DataViz the more it becomes evident that like a huge part of doing data visualization is doing the data wrangling beforehand to get the data into the right format to make your visualization. So another thing to say about wrangling data is that tidy data is really the easiest type of data to work with. So what is tidy data? Let me see if I can make this a little...Oops, a little bigger. Spoilers. Okay, I don't know if this is too small to see. Oh, okay. Here we go. All right. So hopefully this is a little bit legible, but tidy data is data, and there's a link in here where you can read more about it, is data where essentially we have one observation for each row in our data set and each column corresponds to a single feature of that data or a single property of that data, and each cell or each value in our table here, if we're looking at as a tabular view, but we could imagine this as like an array of arrays or an array of objects, each sort of cell corresponds to a individual value. So on the left side here, we have some data that's not tidy where we have some like case counts from different countries and each row has, we have like a column here with type where it shows like okay, this row has the cases in Afghanistan, this row has the population in Afghanistan for a certain year, and so on and so forth. If we want to like that, that's a little bit tricky to work with because this type column has tells us information about how we should interpret the count column and that gets kind of confusing. It's a lot easier to work with if we separate everything out so that we have a separate columns for the cases and the population and every observation every year and country combination that we've seen gets its own row. So we're not going to talk a ton about transforming things into tidy data. We're going to work with data mostly that is pre tidied today just for the sake of time because we only have a few hours together, but there's a link here in this slide deck goes to a great resource on understanding how to work with tidy data and why it's beneficial.

5. Data Visualization and Mapping

Short description:

Data visualization is a slippery slope from lightweight visualization to big data and machine learning. We need to map abstract data values to concrete visual values using scales. Scales are functions that transform data values into visual values. Channels encode data properties into visual channels like position or size. This is the key concept of data visualization.

Okay, let me let me take a pause there and take a sip of water and see if anybody has questioned. It says it feels like the start of machine learning. It is a very slippery slope from doing sort of lightweight data visualization as we're going to be doing today to getting into really big data and having to really process that data with very sophisticated statistical models that ultimately end up in the kind of realm of machine learning and AI. So yeah, the bigger your data gets and the more complex the Wrangling gets, eventually you end up with things like machine learning being relevant for your interest. So yes, we could say it's a slippery slope starting to do some of this data analysis getting into machine learning. It's also a great pathway, I think. Awesome, okay. Okay, well, that's that now we've talked about visualizer. We talked about Wrangling the data. Once we have Wrangled things into a format that is workable, then we can start to actually visualize it. And as we said before, perhaps boring way we could talk about data visualization is that translating or mapping data into visual. So that means we have to take some abstract values like numbers of cases, what have you and turn them into concrete values that correspond two pixels on a screen. So maybe it's like a width a number of pixels of a bar in a bar chart or maybe it's a color or an opacity value. So we often talk in data viz about scales and a scale is something that allows us to transform a space in the world of abstract data. So the data that we have in our spreadsheet or CSV file or JSON which is often referred to as the domain of a scale and turning that into values that exist in the visual realm. Again, maybe it's width pixels here, or maybe it's colors and that is often what's called the range. So this is if you think back to sort of math class and functions mapping from a domain to a range, scales are essentially functions that take our data values and turn them into visual values. And so, that's what we're going to be working with a lot is basically mapping certain properties in our data to certain visual values that we often talk about channels of encoding data from a certain property into a certain channel. Like the X position and however many pixels across it's going to be on an x-axis or the size of a circle or of a bar or something like that. So we're going to dig into that in concrete terms in our hands-on project today, but I feel like this is the key, the one key thing of data visualization is going from this data space to this visual space. So I really love this graphic that my colleague Mike made for this transformation.

6. Introduction to Data Visualization Workshop

Short description:

Today, we'll explore data visualization using JavaScript and tools like D3 and Vega-Lite. We'll also work with a new library called observable plot, which provides a quick and easy way to create visualizations. No prior knowledge of observable plot is required, and we'll learn together. You can use the observable browser environment for the exercises, but it's not mandatory. We'll cover topics like integration test data, browser usage data, and API response data. Let's get started!

Alrighty, so that is what we're going to be doing today. We're going to be mapping values from data which we may or may not have had to wrangle a little bit into the visual space and then we're going to be also, hopefully if we get to it in time, going to be adding a little bit of interactivity so that we can really take advantage of the Web to have that kind of hands-on interactive experience of playing with our data. So we'll dig into what that feels like a little bit later.

But before we jump in, I just want to say that if you are a JavaScript developer or maybe you're new to JavaScript, if you're a Python developer, definitely have a ton of options as well, JavaScript has a lot of great tools for doing data visualization. So there are some really like powerful but perhaps also kind of low-level and complicated libraries like D3. JIS is a really popular one that you may have heard of before. D3 is great. It's really great for building very custom, very amazingly impressive looking visualization with lots of animations and transitions and fanciness like the one we just saw on the previous slide. Woah, so fancy. So D3 is a very powerful library for JavaScript. There's also some lighter weight libraries that are a little bit more intended for quickly getting a visualization going when you're trying to just explore your data. And you don't necessarily need something super unique like the kind of visualizations that you see in the New York Times or other the Financial Times or other publications that have these very bespoke, very fancy visualizations. Sometimes you just want to see a bar chart or sometimes you just want to see a scatter plot of dots of your X and your Y values. We're going to look at some of those later. So Vega-Lite is another great library for that. And there are also some great tools for data wrangling. Couple of libraries linked here, which you can check out later, TidyJS. We talked about TidyData before. TidyJS is a great library that's sort of informed by the R statistical language and environment, if you've worked with, if anybody comes from an R world, that's a useful tool for coming from R into JavaScript. There's another great library called Arqero, which was developed by the University of Washington dataviz team. And there's just a lot of different charting libraries out there. So there are some links in here and you can, you know, have a search on your own and find all kinds of different charting libraries.

Today. What we're going to work in is what I hope is a very quick and easy to use library called observable plot. So observable, the company was founded by Mike Bostock, don't know if you've heard of Mike Bostock, who was the creator of the d3 library we just mentioned and Mike has been building data visualization libraries and tools for over 10 years and he has just recently spearheaded at observable. We've created a new visualization tool called plot, which is intended to be again, like kind of a higher level lighter weight, more quick and easy tool for getting started with data visualization that makes it really easy to explore data and like quickly go from zero to a meaningful visualization. So that is what we're going to be working with today. We're going to assume absolutely no familiarity with observable plot because it is brand new. So nobody is familiar with it. And also because it is brand new it is it can have bugs. It is still in kind of a early life stage. So we're all going to be learning together. I am also new at working with observable plot. So I'm just going to share a few of the things that I've learned About quickly making making useful plots and there are a lot more learning resources and full documentation and all of that on the observable HQ site which is observable HQ dot com slash at observable HQ slash plot has like a ton of more information. So if after the workshop or during in between exercises, you want to check out more documentation feel free to dive in there.

So we are going to be doing our exercises today on observable observable is a totally in browser JavaScript environment. So we're going to be looking at some notebook sand and seeing what that means. You can sign up for a free observable account if you wish or maybe you already have one that makes it easy for you to take the workbooks that we're going to be working through the worksheets that we're going to look at in a moment and create your own copy. So your own fork so that you can save your work really easily and and come back to it later. However, you do not need an observable account to do any of the work that we're gonna be doing today because everything runs live in your browser. So all you need to do the exercises is a web browser open to one of these pages. You don't need to sign up for an account if you don't want to however, it will just make it a little bit more complicated to save your work because you have to kind of copy and paste it into a file or or download download a file and then edit locally and bring it back and so it'll be a little bit of a couple extra more steps if you don't have an account but absolutely no worries. And if you don't mind about saving your work at the end of the day, then you don't even have to worry about it. Okay, so let's jump in. Shall we? We're going to make some plots today. We're going to hopefully, if I haven't been talking for way too long, which I probably have we're going to hopefully get to a few different projects. We're going to step through some basic data viz skills as we do these hands-on projects. So the first thing we're going to look at is how are our integration tests. So our like web n to n tests passing, are they passing? Are they not passing? Are they taking a long time to run? How are they doing? We're going to look at some integration test data. This a type of thing that you might be used to looking at. If you work with GitHub and you every time you push some tests run on your code and you see whether they passed or failed. So we're going to look at that. And that's going to teach us about this scaling. So taking features and using channels and scales to put them from the data world into the visual world. Then after that, we're going to try and take a look at some browser usage data to see how folks. What different devices and what screen sizes people are looking at a website on what that can tell us that's going to teach us about basically how to ask the questions that we're interested in of our data. Do we need to aggregate the data to get a single value? Excuse me, do we need to split it up? So we can see the details better. We're going to dig into that. And then finally if we have time, hopefully we're going to look at some API response data to understand more about like how fast or how quickly our servers are responding to people's requests. And that's going to teach us about how we can make interactive graphs where we can have input widgets and really give people the ability to play with the data hands on. So that is what we're going to be doing and yes folks are maybe noticing that some of these are not, these links are not working. Let me just verify that we have these links open. Give me one moment, let's start with the, okay, yep. Nope, that's why that's not working. Okay, fixed. So now we're going to start with this first project, r tests passing. So hopefully now if you click it, you should be able to access that notebook. Let's see. I think it's because I'm in fullscreen that we're not okay. Okay, can you all still see my screen here? Let me know if you can't.

7. Using Observable Notebooks for Data Visualization

Short description:

In this workshop, we'll be using an observable notebook to work on exercises that progressively modify a visualization of data from GitHub Actions. The notebook allows us to edit code and see the changes in real-time. We'll start with a basic plot and make it more exciting by following a series of to-dos. If you have any questions or get stuck, I'm here to help. At the end, we can discuss the solutions together. Now, someone asked about the difference between plot and Vega-Lite. Plot is inspired by Vega-Lite, another visualization library, and there was nothing wrong with it.

And make this a little bit bigger. So what we're looking at now, what we've been looking at is an observable notebook. But it's kind of a special one because I'm using it for slides. So it's got a little bit different than usual. What we're looking at now is an observable notebook. Where what we have is a mix of JavaScript code. So for example, this is some JavaScript code. This is some JavaScript code. This is some JavaScript code. This is code using plot. And above each chunk of code which we call a cell here, above the source code of a cell, we'll see some kind of output. And then that's interspersed with some text and some other little widgets that are just helpful for understanding what's going on here.

So the way Observable works is it all runs in your browser. You can edit any of this code by clicking in the left margin of a cell that'll open up the source code. In this case, it's markdown. This is a little markdown cell. So I can edit that. And you, even though I own this notebook, I at Anjana own this notebook, you can edit anything in this notebook. You can't break it because all those edits will just be local for you. However, if you do have an Observable account, you can click in this menu up here. You can click the fork option and make a new copy of this notebook, which will then be yours. And you can edit, and you can save to keep your work after the workshop.

Now, we're not going to dig super far into how Observable works. Hopefully it's going to be self-explanatory the way these exercises are structured. But the nice thing about Observable is that it is a reactive environment. So when I update something in the notebook, everything else is going to update as well. So for example, at the top here, I have some data. I have a test jobs data array. It has 338 little JavaScript objects in it. If I were to, let's say, slice it, oops, no, I've got to do an await here. If I were to say slice this out, because I have an asynchronous thing, this is just to say, now I have 238 things in this table down here. This data is going to update as I make changes. So this is just a silly example. Ignore what I'm doing here. We're going to put it back. Oh, no. I'm going to put it back, but as you can see, when I mess something up, everything else in the notebook also changes. So this is kind of something a little bit weird about Observable. If you're used to working in vanilla JavaScript or in kind of Node.js or something like that, you'll notice that Observable updates your whole notebook anytime you change something. And that is going to work to our advantage today. So the exercises in here are pretty self-explanatory, hopefully. But of course, questions are always welcome. But let me just set the stage for you here.

So in this project, we're looking at some data from GitHub Actions. If folks haven't used GitHub Actions, there's some links you can read up about it. But essentially, what it does is if you have a repository of code on GitHub, you can set up this kind of these, what are often called continuous integration jobs, which are some kind of code that runs every time you, let's say, push a new set of commits to a certain branch or every time you merge in a pull request or things like that. So what you have is some data here about integration tests. So these are kind of tests to make sure the code is working properly in different environments. And this is coming from the public GitHub data of the Felt JS library, if anybody has worked with that. It's irrelevant what Felt is. It's a front-end framework. But even if you haven't worked with it, that doesn't so much matter as it's a big open source repository and they're running a lot of tests when they push code to GitHub. So what we're seeing is some data about different runs. So this might be on a particular commit or a particular push to the repository, the different names of tests, the different conclusion status. So whether they succeeded or were canceled or failed, and then the number of minutes that it took the job to run. What we're going to do, and there's a series of to-dos in this notebook, what we're going to do is walk through a series of exercises to progressively modify a visualization of this data. So we're going to start out with this pretty unhelpful grid of black squares here, which you can read all the code comments to get a sense of a really basic plot, chart, a plot plot, as you will. And what we're going to do is in a series of to-dos, you're going to get some exercises, some requests to make changes to that visualization, and we're going to progressively make it more and more exciting. So at this point I think we can kind of take a break from our kind of group setting and spend some time working through this worksheet. If you get stuck, of course, that is what I am here for, for us to talk about it. At the end we can talk through the solutions, but there are also some hints and some code snippets. If you click these hint and solution buttons, if you get stuck you can copy and paste some helpful code there. There are some little guidelines and some tips for how to do each exercise. So hopefully it's self-explanatory and makes sense. If not, please let me know. And then we will come back and work through or talk about the solutions together. So. Does anybody have any questions at this point? OK, we have a question about, yes, plot versus Vega-Lite. Why did Observable make a new visualization grammar that is similar to Vega-Lite? If you haven't worked with Vega-Lite you might not recognize that, but plot is very much informed by another library called Vega-Lite. Was there something wrong with it? Definitely not.

8. Using plot and Observable

Short description:

Vega and Vega-Lite are excellent tools. plot is intended to be really nice to play with Observable's reactive environment and also to make it much easier to create interactive charts. We are working through the first project notebook, which you can fork, edit, and save in your browser. Feel free to sign up for an account and make your own copy if you'd like. Let me know if you have any questions.

Vega and Vega-Lite are excellent tools. plot is intended to be really nice to play with Observable's reactive environment and also to make it much easier to create interactive charts. However, there is, there is a whole breakdown of some of the differences between plot and Vega-Lite that I can find the link for in one second, and send to you. So if you're curious about the differences between plot and Vega-Lite and the motivations behind plot definitely recommend you check out some of these resources. So let me find that for you in a moment and I will drop that in the chat. Thanks.

We are working through the first project notebook, which I'm just going to drop here again. Hopefully it is pretty self explanatory. All I was explaining was that in Observable, you can sign up for a free account. If you want to fork the notebook, make your own copy and then you can save your work. Otherwise, you can edit the code directly in your browser. It won't be saved if you refresh the page or something like that. But, but you can tinker with everything. It all runs right in your browser. So you won't mess anything up or anything like that. Don't be afraid. But definitely feel free to sign up for an account and fork it if you'd like to make your own copy. Hopefully the descriptions and instructions are self explanatory, but if you have any questions, just let me know.

9. Displaying SVG Elements and Using Plot in React

Short description:

Question: How does plot display the SVG elements? Plot creates SVGs to represent the plots by creating different SVG elements for tick marks, labels, and cell marks. These elements are generated based on the specification provided. To display the SVG, you can append it as a child to an HTML node or create an HTML node where you insert the plot. Depending on the framework you're using, the method of insertion may vary. While there may not be a specific React wrapper for plot, you can use the export feature in Observable to generate a snippet that can be easily dropped into React.

Question. How is this displayed? I assume plot creates the plot, but how do we display it? Do you mean how, like how under the hood does plot turn the code into the visuals that you're seeing or is there is it not showing up in a way that you think it should be showing up?

No, I mean. I mean, when you want to show something on the screen you have to, for example, use HTML, a table, an image and so on. So how is that... How is it doing it under the hood?

Yeah, OK. Yeah. So if we inspect this, we might be able to get a sense of what's going on. Plot is creating SVGs to create these plots. So what it's doing is it's essentially... and you can poke around in the HTML here. If you inspect, you can see it's creating an SVG with... Oh, gosh, sorry. Different lines for the tick marks and text elements for the labels. And then once we get into the... In this case, it's a rectangular plot. These cell marks are essentially creating SVG rects. So what it's doing is it's doing a bunch of work behind the scenes to take your specification and then turn it into concrete SVG elements. Does that make sense?

That makes perfect sense. And how do we display this SVG? Like how do we get this element back? For example, if it was React, do we call a function that generates a component of this SVG component or...

Mm-hmm. Okay, yeah. Yeah. But outside of observable, outside of the exercise that we're doing right here. Yeah. I'm trying to see how would I use that in my day to day. Yeah, yeah, of course. No, definitely. And this is something we're gonna talk about, or we can talk about quickly later. If you have a good day, tell me. But if you essentially in the links... Where did it go? Okay. If we find... Scroll down, where is our plot link? If we open up the plot documentation. That has a link to the GitHub repo, which will show you... So it's github.hq.plot, which has a lot more details about using plot as kind of a standalone tool. So how you can installment. How you can call it in your, whether it's vanilla HTML or what have you, essentially how you can get it in to your JavaScript. And then essentially what you're going to do is either import it as a ES module. And then basically what you're going to do is append somewhere in your document, or create an HTML node where you can then insert the plot plot. Does that make sense? So depending on what framework you're using that will sort of change how you're, how you're inserting that. In this case, we're just doing it in like a little script and appending a child. Um, yeah. I can, I can definitely see, this is actually, this actually answers my question. When I see the plot that plot and that's sent to the append child, which means this create the child with the HTML, I would have to look if there's a react wrapper already, which means that some smart people are working on it and maintaining, if not, I guess it's not so complicated. Looks straightforward to. So, um, I'm not exactly sure if there is a specific kind of react, um, wrapper, as you say, because it's such a, it's such a sort of lightweight thing that it kind of should be able to sort of standalone. However, there is a handy feature in observable where if you go up to the menu in the top right and you go to export here, um, there's an option to embed cells. And when you do that, if you choose a, um, if you choose a sorry, let me get out of here. If you choose a cell that has a name. So if I give this one, a cell, like a basic plot, if I assign this to, um, a name and an observable, this is how we do it. We don't need a, we don't need a bar or a, or a letter of const. If I.

Sorry to interrupt. Is observable a language? Observable is, uh, you could say an environment. So what we're running in this notebook environment here is, uh, is kind of a JavaScript based environment, but there are some differences between observable and, um, sorry, let me get out of here and vanilla JavaScript. And I can drop you a link to, um, to how that works in a moment. something is going a little bit wrong here, but, uh, let me try. Let's try. Okay. Let's let me try this other notebook here. I'm not sure what's going on with that one, but if we, for example, the way I've done this embed here, let me use this example. if I, if I, if I've got some chart that I want to embed, in this case, this is called chart. I go to export. Hello? It's, it's thinking about it. And I think my computer is little unhappy with me. Okay. Export embed. And I select the cells I want. There is an option to get, um, a kind of snippet that you can easily drop into react there. So if you do some work in observable and you want to grab it into react, you can use that export option.

10. Connecting Data and Visualization

Short description:

Yeah, there's no reason why there could, anybody would create a framework then if it's literally generated. Where does the information come from in this snippet? So data, because information is what we create after we processed the data. React is a reactive environment and there's a little bit of massaging that needs to happen to get their reactivities to play nicely together. The tooltip question got me thinking, how well does Observable or the Plot Library do with accessibility? Plot is an open source library on GitHub and there's still a lot of work to do. One of those things is aria role.

Yeah, there's no reason why there could, anybody would create a framework then if it's literally generated. Uh, however, where does the information come in this snippet? Where does the information come from in this snippet? Where does the information come from? So data, because information is what we create after we processed the data. Right? Yeah. So there, so integrating like, kind of like if you have data that you're, that you've got somewhere in your react app and you're trying to pull it into a visualization that you've made an observable and now you're embedding, is that the question? Like how do you connect the two? Yeah. Um, so let me drop some links in it, but there is essentially a, um, there's so react is a reactive environment. Yeah, React's a reactive environment, and so your, your model is also reactive environment. And so there's a little bit of a massaging that needs to happen to get their reactivities to play nicely together. So let me, um, in a moment, just grab a, um, grab a notebook that explains how to do that. Um, so let me grab, I'll grab a couple of links for you about, um, embedding and about React. And, uh, that should help one. Cool. I look forward to reading them. With the, um, node that they were currently on? Because I don't want to fall back, and then I don't want to also be like everybody else. But yeah, please share those links and I'll ask further questions afterwards. Sure, sure, sure. Yeah, yeah, yeah. Um, okay, so this is kind of an overview to embedding. This one is about embedding in React. Thanks again. So hopefully that gets you started. And then, if you have any, like, kind of specific hiccups or questions, we can maybe follow up offline afterwards, if that sounds good. That's very nice. Thanks again. How do tool tips work? That's a great question. Okay, it seems like folks maybe need a couple more minutes. That is totally fine. Why don't we go until the hour then? And if folks want to, if are done early and want to take a little break, you can totally do that. Okay, so questions about tool tips. Let me close this. Okay. Yeah, so there are some hints. I believe the question was, how do we get the tool tip content populated? And yes, hopefully those hints are working for us. If we're not, if it's not working, I would suggest, and we can go through the solution in a moment, but I would suggest taking a look at the solution and comparing to see if there's any difference in maybe like where you're inserting, implementing that title property, and making sure that it is in this options object that we pass in as the second argument to the cel function. So if it's there and it's returning a string and it's still not working, then maybe we can take a look at the code and debug. Wish this was I could walk over to your computer and take a look. There's a way to DM me a message in Slack, sorry, in Zoom. Or yeah, so we're going to walk through in a moment. I think everybody is going to take another five minutes if that still sounds good to work through on their own, and then we can walk through the solution together, if that sounds okay, Pavel. Awesome. In the meantime, let me try and find you some documentation that might also be helpful, from that Observable The tooltip question actually got me thinking, how well does Observable or the Plot Library do with accessibility? Since most of the stuff is generated? That is a great question, and I love that it's something we are thinking about. I don't think I can say that it is compliant with accessibility guidelines. I would have to double check with my team on that. But in terms of things like the colors that you're using, the contrast between different colors, I can't say that the defaults are necessarily always the best practices in terms of accessibility. As we're going to see, or maybe you've gotten a glimpse of in the middle exercise here where we're changing the colors, you do have control over what those scales like the color scale or the font sizes and things like that you do have control over. For ARIA roles I believe they've gone to efforts to make it kind of okay. But I am not sure that I can make any hard guarantees there. That is a great question and something I would definitely love to follow up with my team on. I just checked actually for example any rect does not have any accessible elements. You just have a fill, you don't have any way to directly do it. As far as I know you can make an SVG accessible with some SVG specific tags. And that takes me to the next question. Is observable or the plot in this case I guess we're talking about plot, open source? Can we contribute? It is and I am going to address both of those points by dropping in this link. Plot is an open source library on GitHub. And this is very brand new. Just released I don't know I want to say maybe six weeks ago or something like that. There's still a lot of work to do. One of those things for example is aria role. Folks have created issues. You can search the issues there to see other issues or bug reports. Definitely upload or comment to give that feedback. If you know if you'd like to see that happen, that will help us show how important it is to the community that we work on these things. That's a great way to contribute back. If it's an upload, a thumbs up on the issue, that would be great. Of course, if you find bugs, those sort of things like totally possible, so you can check the issues there to see if there is already someone has flagged a similar issue that you're running into. If not, feel free to go in and open a bug report or feature request or what have you. That's pretty cool. Thanks for sharing. Thank you for bringing it up. It's a really important question and definitely something I hope we get to soon.

11. Plotting Data with Plot and Channels

Short description:

We started with a basic plot that lacked insight. Plot operates in a declarative way, similar to Vega and Vega-Lite. Channels are important in working with plot. We added a fill to express the conclusion property through color. Channels use scales to configure the mapping of data values to visual values. We set up the domain with data values and the corresponding range with meaningful colors. Opacity was used to express the duration of each job. The opacity scale turned longer durations into darker rectangles. The API uses two arrays for domain and range to make mapping and editing easier. For categorical variables, specifying start and stop values for the domain and range is more practical than enumerating every possible value. Numeric variables will be covered in the next exercise.

It's a very small team working on this. It's a battle against there only being 24 hours in a day. But hopefully, this is something we'll be able to make some strides on quickly. Thank you for bringing it up. How are folks doing? How are we feeling? I wish I could see anybody, but I can't, but hopefully, folks have had a little more time to mess with the exercises here, and play around with some plots.

Why don't we just walk through just to recap what we did in this exercise. Please, as questions come up, please, please, please drop them in the chat or come off mute and ask. I think I can hear everybody now, so I'll be able to hear that. We started out with some data that was pre-wrangled for us, which is nice, and it had basically a bunch of jobs, a bunch of job names and then a bunch of runs of those jobs. We started out with this basic plot that is not very helpful, does not give us a lot of insight that we then broke down.

What we see here is one way of creating a plot. In the next exercise, we're going to see a slightly different syntax that we can also use. But essentially, as we hopefully have discovered through this exercise, plot operates in a very declarative way, which is similar to Vega, Vega-Light if you've used that, where we are just saying what we want our chart to look like, and then plot is figuring out how to make it happen. The most important thing is to have at least one mark, so we have this marks array where we could also create different types of marks and layer them, and so on and so forth, and you can read up in the plot documentation more about the different types of marks. But what we have here is a cell mark, which are these little rectangles, where we get one rectangle for each element in the data, or for each value in the data, I should say.

Here we're passing in the data first, so that's our test jobs data. Then we are passing in an options object to the mark that tells it which so-called channels we want to express which properties of the data. Here we're expressing the run number, which is essentially like the GitHub action that ran on the x-axis, so horizontally, and then the names on the y-axis. We have a few config options that are essentially just for formatting and things. Hopefully, you've got a chance to play with those and see how if you reduce the margins, it cuts off the labels and things like that. If not, feel free to play with that afterwards. Nothing super exciting there. We're just going to blow past it. But these channels are really the important thing for working with plot.

For any mark that we're trying to make, we have a variety of channels that we can use with that. For these cell marks, we have the X and Y channels, but we also have channels like the fill, the coloring, the shading of the cell, and then the opacity of that fill or the stroke, so the line around it, the opacity. Then we have that title channel, which is a special text channel that allows us to put text values into tooltips. We go through modifying this chart to assign more data, to more properties of the data to more of these channels to flesh out a more interesting chart. We start by, and then if we look at the solution of this first exercise, we start by adding a fill or maybe you played around with some of the other things you could do here, but if we add a fill conclusion to the options to our mark here that, and if I add a comma so that it'll break the JavaScript, that is going to now express whatever is in the conclusion column of the data or that conclusion property, since these are little JSON objects, it's going to express it through the fill channel, which uses color to express different values. And so then in the next exercise, we see how channels use scales.

So in this case, fill uses the color scale to configure how those abstract data values are mapped into those visual values. And so here, if we grabbed the code from the solution, I'm just going to paste it all in here. We should be able to see that with this color scale, we can set the properties to take in the domain of the three different values that we know we have in our data, success, failure and canceled. And hopefully you can explore that by let's say looking up at the table at the top and checking out what the different conclusion statuses are that you see there. And those are going to be our input values. So going back to our kind of scale visualization, that's the domain here. So these are kind of our data values, the real values in our data. In this case, they're not numbers, they're categorical values. So there, it's like, there's only so many options. There's only so many categories of completion or conclusion status that we can have. So we set up our domain with those data values and then we set a corresponding range for a kind of semantically meaningful color value for the... That's gonna correspond to each of the values in our domain. This is gonna be kind of the lower edge of our... This is gonna be the visual space that we're mapping to in our scale here. So setting up that scale config is going to tell plot how it can map those categories to the different colors. And once we do that, we see like hopefully more meaningful colors. And so on and so forth. We used then opacity to express the number of minutes that it took each job to complete. So that is sort of our duration. Our duration value here is called minutes and that is then going to show us... So the opacity scale is turning a longer duration in minutes into a darker rectangle, a higher opacity value between zero and one. And, Zona? Yeah. Yeah, sorry to interrupt. I just wanted to ask you about the color. I guess this might not be something known, so it's okay. I would understand if it was already there. But why is it that it's two arrays instead of an array of objects that have domain and range? Why is the API like domain, array, and then range, array? Correct. Because it makes it harder to map, it makes it harder to edit, right? Okay, well, so I totally hear that feedback. I think in some cases that might be a question of trying, this is gonna work. So we're looking at this for a categorical variable here, essentially something that only has three values. And in that case, I totally see what you mean. Like maybe it would make more sense to say success maps to green, failure maps to red. But imagine we can also do this for a like a numeric value where we have some number that we don't know what exactly all numbers we have in our data set but they're gonna be between zero and 1000, let's say. And so in that case, it's a lot easier to specify these ranges as kind of like a sort of the domain and the range as like a start and a stop value or sort of a between here and here value for both the data space and the visual space rather than kind of like enumerating every single possible value that we could have and how it's gonna translate to another value. Yeah. Sorry, sorry, I got a bit lost to the fact that you said that you can just say like minimum value to like kind of specify a range. Can you show us that? Yes, that's gonna be an example in our next exercise. So if we can maybe come back to that in a moment, we're gonna see, you're gonna get a chance to play around with numeric variables and the domain and range there if that makes sense. In this one, it's more kind of categorical data so we don't have a really solid example in this data set for it, but yeah, that's gonna become clearer when we start looking at like scatter plots, dots on an X and Y axis. That's a good question.

12. Keyboard Shortcuts in Observable

Short description:

Some folks are having issues with the keyboard shortcuts in Observable. There is a help icon on the right-hand side of the screen that provides information about the keyboard shortcuts. Hopefully, this reference will help resolve any conflicts or confusion.

Okay, and some folks are having issues with the keyboard shortcuts. Okay, sorry to hear that. If any of you folks do have any questions about the keyboard shortcuts that are in Observable, there is on the right-hand side of the screen a little help icon that opens up a side pane with some more information about which keyboard shortcuts Observable pays attention to. So I feel you, I feel you're paying. I have run into this myself. There are cases where maybe, especially if you've got certain shortcuts mapped for yourself, they might conflict, but hopefully at least having this reference there will help understand what's going on. So sorry for any issues that's causing there.

13. Adding Tool Tips and Analyzing the Chart

Short description:

We've mapped the opacity and added a stroke channel for better visibility. Tool tips can be added using the title channel, and we can pass a string field name or use a function to combine different fields into a string. By analyzing the chart, we can identify failing or frequently canceled jobs, as well as tests that take a long time to complete. This information can help us improve the health and efficiency of our projects. If you're interested in working with integration tests, Observable provides pre-configured data visualizations for duration and flakiness. These templates can be customized with your own GitHub credentials and repository. You can find more details in the templates section on Observable's website. If you have any questions about this exercise or want to explore the inner workings of Plot, feel free to check out the source code on GitHub.

Okay, so. Moving along here. I think the, yeah, so we've got the opacity mapped out, and then we had, we also have, we're adding another stroke channel that's just, it's kind of optional, but like, for example, when the values are very, very short, it becomes hard to see what color the squares are. So we can add that stroke value, that stroke channel to give them a little border so you can see, oh, it's, okay, it's gray. And hopefully that is, to start, we're starting to feel like this muscle of like, okay, I pick a channel and I pick the data that goes into it, and then I can configure how that channel operates by configuring the scale it uses. So hopefully that is starting to feel good.

And then our last task was to add tool tips using this title channel. So I know folks had some questions around that. So let's make sure we get in, we dig into it. So if we go in the plot, and so the GitHub, ReadMe has a lot of details about some of how plot works. So if we, so it's speaking of Control F, I'm just going to search for title here. Okay. So titles are one of these channels that give us a tool tip, which expects a kind of string of text. So one problem we might run into with titles, is if, let's say we're accidentally returning undefined, or some like an empty string. Because if we see in the fine print here, titles will only be added if they're non empty. So what we want with this title channel is we wanna pass in a string to it. So one thing we could do, is add a really simple title channel, like maybe I just want to pass the minutes into the title channel. So I could set that up by adding another channel here, title and mapping the minutes value to it, and so now I'm getting these just numbers when I hover over one of these cells. So that is kind of like a simple way we can pass in a string field name to this title channel, or we can get a little fancier, and we can write a function that combines a few different fields into a string. So this is one way to do it, it is definitely not the only way, but one way we can do it is we can use a JavaScript template literal, so these backticks here, define kind of a string template that we can then use these dollar sign curly braces to insert values into. So what we're doing is we're kind of creating a template for a string and inside of this function for each data point we get, that string is gonna be constructed from the data points conclusion field and then the minutes field, which we're also gonna use this number to fixed to make sure it only, we only see two digits. Like right before we were seeing like a lot of digits and that's kind of annoying. So we're just gonna truncate that. This is just one way you could do that, it is by far, like it's certainly not the only way you could do this. You could also like concatenate strings with the plus operator or do a few different things here. But so if we pass in that function and that returns a string for every data point, then we should start seeing these fancier little tool tips show up.

I have a question. Can we use functions like that in all of the other parameters? Yes. Excuse me. Can we use functions in all of the other fields, channels? Yeah, you definitely can. So for example, like okay. On the Y-axis here, I have some strings that show the names, right? What I could do is I could pass in a function that's like for every data point, take the name and then convert it to uppercase, let's say. So that's still returning a string. And so now I'm getting these transformed values. And you can imagine like with numeric values, you can do operations, you could multiply it by a scalar, or you could do... We're gonna see some more examples of that in our later projects. But yes, you can pass functions in to manipulate the data. Another way you could do that. However, it's also, before you pass the data into plot, you could wrangle it and manipulate it, excuse me. Pardon me, in the data array itself. So I could also, for example, do whatever transforms I want to do to my data on the array that I get out and maybe do like an array map or something like that to change the value. So that's another option. But in many cases, it's easiest to just throw a function into, now I've lost which cell I'm in, to throw a function into one of these channels to do a simple transform. Great question.

Okay, dope. So then hopefully by the end of all of this, we've got a pretty informative chart. So what can we see now that we couldn't see when we were looking at just a bunch of black squares? Well, we can see which jobs are failing or maybe being canceled a lot. Like for example, this unit test for Mac OS latest, seems like something is going on there because for many different runs, this job ends up getting canceled after like quite a few minutes. So if I'm working on like improving the health of my project here, I might want to look into like what is going on in these Mac OS techs. Do I have some kind of bug in Mac OS that I should look into? Or is it maybe a problem with the tests themselves? Something wrong there that I just need to fix to make my life as a developer easier so that I don't have to wait five minutes just to cancel my test all the time. And then similarly, we can see like some of these are succeeding, but they're taking a really long time. Like these again, it's Mac, this is why I'm on Linux people, no just kidding. But there are some of these Mac tests that are, they're passing, but they're taking a really long time. So again, I might wanna look into maybe there's something inefficient in my code here or maybe there's something like going on again in that test code itself that's doing more work than it really has to. And it's either making my site run slowly or it's making my developer workflows run slowly or both. So I probably wanna spend some energy or it's like worthwhile if I'm trying to look for ways to make things go faster to focus on those tests. And so if you are... If this is like an interesting area for you in terms of working with integration tests, we have, if you go up to any page on Observable, you see there is a little dropdown menu. There's like a page called templates where we have a bunch of kind of pre-configured data visualizations for different goals. So some of them are just related to like business analytics and stuff, but we actually have a couple related to integration tests, like duration and flakiness. So you can dig into a lot more details there if you go to these templates and check out some of the visualizations there. And this is intended to be a sort of prebuilt visualization that you can drop in your own GitHub credentials into and then point it at your own repo or whatever repo you're interested in. So there is this one, this duration one, and then another one for flakiness to see which of your tests are flaking out. So if you wanna dig into this type of integration test data more, you can check that out after the workshop. Alrighty, does anybody have any questions about this exercise before we move on? Oh, I'm seeing I missed something in the chat here. Some questions about the titles. Oh, okay, so yeah, and again, since this is all open-source, if you wanna see how Plot is doing anything, you can dig into the source code on GitHub. The Readme has lots of documentation but all the source code is right there on GitHub as well. So if you're curious, how do marks process the different tool tips and things like that? You can dig into some of these different source files and really see what's going on under the hood with the scales and everything as well. So feel free to dive in if you're curious about what's happening under the hood.

14. Working with User Behavior and Device Types

Short description:

We'll now move on to the next exercise, which involves working with data related to user behavior and device types. We'll start by wrangling the data to get it into the right format. Once that's done, we'll move on to visualizing the data using different types of plots, such as bar plots and scatter plots. We'll explore how to aggregate and group the data to gain insights, as well as how to analyze specific categories in more detail. Throughout the exercise, there are hints and solutions available if needed. Feel free to take breaks as needed and work at your own pace. If you have any questions or need clarification, don't hesitate to ask. Now, let's dive into the next project and get started!

And that again is on GitHub slash ObservableHQ slash Plot. Okay, so how are we feeling? We made some data viz. We visualized some test durations. We got a sense of, most importantly, we got a sense of how these channels work and how they take in properties from the data and how we can manipulate their scales by configuring those scale domains and ranges. See some thumbs up. Awesome, awesome. Cool. Happy, hooray. All right. Do folks... So we've been going at it for a little while here. I wanna give folks time to take a break, but I'm thinking that maybe since we're all kind of working on our own pace, what might make sense is if we introduce the next exercise and then we can take kind of a longer period where everybody can take a break if they want and then get back to work or work if you're in the zone and then take a break or what have you. Does that sound okay to everyone? Seeing a couple nods and thumbs. Excellent. Sorry, I still can't see everybody at the same time. So let's take a look at our next project and let me make sure that y'all can see this one too, which, okay. Let's see here. Sorry. I think now everybody should be able to see this notebook. This, what size devices are users browsing with? Let's open it up. Close this one so that hopefully my computer doesn't crash. Fingers crossed. Okay, so for our next project, we're going to dig into different type of data. What we have here is some data that's more oriented around user behavior. So this is made up. This data is just like toy data. It's not real, but it's the type of data that you can often get out of an analytics platform, like Google Analytics. And if you use that, or several other types of like hosting services offer similar data, where they'll tell you who is coming to your site and what kind of devices, for example, they are using to access your webpage. So what we see here is some data about different device types. So there's some desktop devices, there's mobile, there's a couple of tablets in there. And then the different resolutions, the screen resolutions that people are using when they access the site. And then so for each combination of a particular device type and a particular resolution, we have the count of users that we've seen with that and the number of sessions in total that those users have been browsing the site with that device or that screen resolution on that device. So you can take a look at these objects in the table here. You can take a look at the first one to see kind of like what type of data this is. So all of these seem to be strings. We have like strings for the resolution with this little X in the middle. We have strings for the numbers of or what are supposed to be maybe numbers of users and sessions. So for this one, we have a little bit of work to do. This is the type of data that maybe like comes as a CSV or somebody gave it to me from my analytics team in a spreadsheet or something like that. I downloaded it as a CSV. And so now we've parsed it. We're using these observable file attachments to parse in these CSV files and get little JSON objects. So we've now got it into JavaScript land. So now we can use JavaScript to do some data wrangling to get the data into the right format and with the right information that we need. So that's our first task is to do a little bit of simple data wrangling. And don't forget that you have some hints and the solution here, if you get stuck. There's some asks here of like what we need, what we want the data to look like. And there are definitely some hints in there if you get stuck on that. Once we've done that wrangling, then we can start visualizing. And so we're gonna try to take a new type of plot here. We're going to do, in this case, we're gonna look at a bar plot. And then later we're gonna look at a scatter plot, which we don't have any data on yet, but it's going to map resolutions width and height on the X and Y axis. And then we're gonna use some other channels to dig in to the real meat of this data. So that is what our task is here. And what we're gonna be looking at in addition to that data wrangling is like how can we sometimes aggregate or group the data to pull together information for particular categories? And then later on, how can we split apart the data when we actually want to dig into more details and the aggregation isn't necessarily what we're looking at. So hopefully the exercises, the to do's in here are self-explanatory as they were before. If folks have any questions right now, like by all means, or feel free to just jump on whenever. But if it sounds good, what we can do maybe is take the next, like until the hour to do a combination of working through these exercises and taking a break for however long you want to, how you want to proportion your break versus coding time. That's up to you, we're all adults here. So why don't we do that if you need to take a bio break right now like I do, you could just go and take a break right now and then come back to it and otherwise feel free to just move at your own pace. I was kind of looking at the previous one as well and I just have a quick question. At the end of it, there's an appendix that says the cells below power the exercise above and it says import QA styles from, what is this exactly? Sorry, finding the mute button, always hard. That is a, so that import statement that you see is pulling in cells from another observable notebook. And if you click on that notebook name there in that after the end of the import statement, it'll take you to the original, if you wanna check out it's another plot worksheet, but essentially that QA and styles, these are some little utility functions that make those little hint and solution buttons and that make it sort of easy to copy and paste those little hint snippets and solution snippets. So those are just some kind of like teacher helpers for making the worksheet. Just the reference of the worksheet, like you said, on the notebook, it's not regarding to the exercise. Sorry? It's not regarding to the exercises... No, no. Yeah, it's just, it's kind of under underlying. It's a smaller, yeah, structural kind of code but essentially it shows that it's all observable notebooks can be kind of imported into each other.

15. Using Observable Notebooks and Data Visualization

Short description:

You can write functions or have data in one notebook and pull it into another to reuse it. Observable is a platform where people can create content and share code. It is similar to Jupyter, but runs in the browser and updates reactively. The log type doesn't work for the R channel, but you can write functions to transform the data. The solutions for the device size exercises are available. Data wrangling is an important step in data visualization. Group transforms allow for aggregating data. You can create bar charts using different syntaxes in Observable Plot. The group X function is used to specify the outputs of the group transform.

And so you can write a function, let's say, or have a value or have some data in one notebook and then pull it into another notebook to easily reuse it. So you don't have to keep redefining the same stuff all the time.

Yes, and this notebook, is it like an already existing thing? Can you just use it or did you guys create the whole notebook system?

So, I mean, observable, that's, this is what we do is we create that kind of the platform but all kinds of people we have a big community of people who are creating content. And as long as content is like, like as long as a notebook is published, meaning it's public and visible to everybody on observable, anyone can grab code from it. So we have, some handy tools are like built into the platform. Like some of the, like Plot itself, for example, you don't have to import it. It's like already out of the box. But if you see, let's say a really handy utility that somebody else wrote on observable and you wanna use it in your own projects, you can import it from their notebook. So it's all kind of across the whole platform. Yeah. Yeah. Cause the thing is that this kind of reminds me and I hope I'm not overstepping here, kind of reminds me to Jupyter and I was wondering, so you're basically like a Jupyter for JS because Jupyter is for Python, I think.

Yeah. Jupyter was originally developed for Python and it actually allows you to run a bunch of different languages because Jupyter, the main difference between Jupyter and observable is not exactly the language because Jupyter allows you to run arbitrary kernels in different languages on the backend because the way Jupyter works is you have the front end for the notebook, but it's talking to a server on the backend, whether that's running locally or whether that's hosted somewhere like Jupyter Hub. So you can have a backend in JavaScript or in Ruby or in different languages, but in observable, everything is running in the browser. So there is no server that this is talking to except the observable servers where it's saving your work in your account if you're using an observable account. And so there's the, that's one big difference is that it's all JavaScript. It's all running in your browser. The other main difference is that Jupyter runs top to bottom. Jupyter runs like you run cell and then you run the next cell and so on and so forth. Observable runs reactively. So essentially it's using under the hood. It's using kind of principles of like functional reactive programming so that when you update one thing, everything else that depends on that value in your notebook, every other cell that uses the data that you just updated, or every other cell that uses the function you just changed is going to automatically update so that you don't have to go through and manually rerun everything every time you change something. So the idea is for it to be much quicker and more kind of responsive to prototyping and iterating and working on things bit by bit so that it really quickly kind of updates every time you make a little change. Ken share some documentation on how Observable runs that might clear that up, but there's also have a series that sort of explain some of the differences of Observable and Jupyter as well. So I can grab some links for y'all.

Thanks. Thanks for the thorough explanation.

Sure thing. Yeah. And there is also another question I see in chat about using the log type for the R channel. That is a great question. I'm glad you stumbled upon that. We can get into that as we go over the answers here, but essentially, the log type doesn't work for the R channel because the R channel essentially goes from 0 because it's trying to show an accurate aerial encoding so that basically it's a proportionate area to where your values are. And having the log of 0 not really working out means it doesn't really work well to have a log scale in the R channel. However, you can, as we saw in the question from just before, you can write arbitrary functions on the data that you're passing into the channel. So you could do a log transform just calling math.log on whatever the value is that you're trying to pass into the R channel to kind of get around that issue of the scale not being able to take in the log of zero. So I will leave that as a challenge to the reader. And I think we'll see in some of the solution code, a workaround for why we can't use that R value in the log type scale. Words are hard, but yes, it's because we're trying to preserve accurate area encoding that has sort of meaning for the R channel.

Cool, so as promised here is a little documentation about Jupyter and another little explanation of the reactivity in observable notebooks and how it kind of runs, how it updates things as you work. So hopefully those are helpful for that previous question.

Okay, yeah, go ahead, sorry.

Oh yeah, no worries. Yeah, how did this go? How did our device size exercises, sizes exercises workout? That is hard to say. Hopefully folks were able to get through at least a few of these to do's here. We are running a little short on time. So I think maybe rather than working through this step-by-step together, I would just go ahead and suggest the solutions are here. If you click these solution buttons, you can see, and again, there's multiple different ways to do this, but you can see one possible way to make this wrangling for example. If anybody had any questions, especially about this wrangling step, I am definitely here to answer any questions, but the main thing we're trying to show here is that in this data wrangling step, you can use things like the map function or method, excuse me, to create whatever new data you need, maybe in this case, like we're splitting up the resolution string into two separate values that we can put in their own properties so that we can easily access them in the plot later. And then you can do things like convert the types of different values. And so here, we're using this plus operator to convert strings to numbers and so on and so forth. So this is the type of thing you end up doing a lot in data visualization before you even get to the visualization part is massaging your data to try and get it into something that'll be easy to feed into the type of visualization you're trying to create.

Okay, so we also looked at some group transforms here and we're starting to get into this notion of sometimes we don't wanna look at data point by point by point. Sometimes we wanna aggregate it and we wanna see some totals for the whole category. Like in this case, we have a bunch of different resolutions for desktop. We have a bunch for mobile and a few less, but still a bunch for tablet. We want to smoosh them all together and get some actual understanding of how many total users or sessions that there are. So if we copy this solution here into our, just gonna do the shortcut here, what we're doing is we're creating this bar chart and here we're using a slightly different syntax for creating plots than we did in the last one. So before we called plot.plot and then we passed in a marks array. That's one way you can work with plot. Another way is if you know you just have one mark, like in this case, we just have a bar mark, you can call the mark directly and then call plot as a method on it and pass in whatever kind of margins or other scale configurations, et cetera, you want. So these are equivalent. These are two different ways to write your plots if you're using observable plot preference style thing. So you can get a feel for which one you like better. In any case, once we are calling that bar y function inside that method, it's gonna be pretty much the same as before we pass in our data. And then here, instead of passing in directly this object that configures which channels we want, we're going to wrap it in this call to the plot group X function, which takes in sort of a first object that says, these are the outputs that I want out of my group transform. I want a sum of whatever's in the Y channel and a sum on the title channel. And then the second thing that we pass in is gonna be what we would have originally passed into the bar. So this is our mapping of channels to field names.

16. Group Transform and Categorical Variables

Short description:

We can use the group transform to sum up data for different categories on the X channel. It's a bit more complicated than a straight-up channel to property mapping. The group transform can also perform other operations like average, mean, and median. There's another similar transform called bin, which is used for chunking continuous variables into categories. Group works well for categorical variables, while bin is used for continuous variables. The fill field does not go through the group transform, so it remains unchanged. Hopefully, this explanation helps you understand the group transform better.

So we have, I want a device on the X axis, a user is on the Y axis, and when I pass in that user's fields to Y, then the group transform is going to take that user's and sum it up for the different devices that I am passing in on the X channel. So group X is basically saying, take whatever categories you have passed in on the X channel and create groups for the channels that I'm asking for in my, like, please give me these outputs object here. So in this case, it's going to be the Y channel and then title is going to get kind of converted from the individual number of users for this particular resolution to the sum of all users for that group on the X channel. So this is a little bit, it feels, at least for me anyway, it feels a little bit more complicated than doing a straight-up channel to property mapping. Does anybody have any questions about that group transform and how we can use it to do things like sum? There's other operations as well, like average or mean, median, et cetera, which you can read about in the plot documentation. Transform, okay, so these are scale transforms. Oh, sorry, group is what I'm looking for. You can see... Oh, I passed it. Some other transforms here of the... Yeah, we did the sum. You can do things like counts or proportions, et cetera. So that is all in the group. And there's another similar one called bin, which group works well for categorical variables. Like we have device, there's only three options, mobile, tablet, and... Uh... Desktop, right. What I'm on right now. And then... And so group works well for these categorical variables. And then bin is sort of the equivalent for if you have a continuous variable, chunking it into little categories. And Dagas has a comment, having that output color configuration in the input section is somewhat confusing. Let's see. Oh, the fill. Yeah, so basically the fill is kind of not going through this group transform, it's sort of staying the same as it ever would have been. We're not really doing any aggregation on the fill field. So this is, you could sort of think of this as like, these are the inputs to the different channels. And then this is the outputs that I want to apply this group transform to. So these ones still get kind of passed, this one still gets kind of passed in, if that makes sense. But as with always, it takes a little bit of getting used to. So yeah, hopefully this was a useful way to kind of get our feet wet with these group transforms. Does anybody have any questions there before we move on?

17. Visualizing Data with Dot Plot

Short description:

We created a dot plot to visualize different devices. The X-axis represents width, the Y-axis represents height. We used fill to color the devices, and opacity and R to show the number of sessions. The log scale was used for the opacity scale to better display the data. However, log scales don't work for R, so we applied the math.log function to the sessions number before feeding it into the R channel. Adding a domain range helped constrain the chart and focus on the informative area. This experimentation and flexibility are part of the fun of data visualization. We can pass three values to the domain to specify categorical values. Removing one zero from the 50,000 value would bring it closer to the same range as the other values.

Okay. Okie doke. So then we started, we looked in the other direction. Instead of trying to aggregate our data together, let's try and look at every little data point that we can see. And so here we created a dot plot. So in this case we're calling it this plot dot dot syntax of our different devices. So again, just for the sake of time, cause we're getting towards the end of our workshop, sadly. Let me just copy this example code and paste it in here or the solution code. And again, this is just one possible way of doing this.

Let's see, what has gone wrong here? I am missing some data or something. Hmm. Let's see, you got width on the X axis. Let me simplify this a little bit. See if we can just get this going. Hmm, hopefully y'all's are working better than mine. Were people having issues with this exercise? Maybe there's a bug in the solution code. Let me try refreshing, see if something got stuck here. Okay, I don't know, was this working for other people? Is this just me it's broken for? The finished one should look like this one. So, let's just scroll down here and look at this instead. Was working for people, yeah, I'm not sure. I must have, maybe I changed something somewhere else. Accidentally.

Okay, so what we're doing here, we've got width on the X axis, height on the Y axis. And then there are many different ways that you could have encoded the rest of the information. Right? So this is just one way. I'm using fill here to color the different devices. So we have like yellow for the mobile devices, blue for the desktop ones. And then also using opacity and R to show the different numbers of sessions. Now, there was a question about using the log scale for R. So for the opacity scale, so for fill opacity, that channel uses the opacity scale. And here I've specified for it to be a log scale because without that, if you tried playing around with this we lose, it's really hard to see a lot of the data because there's some really, really high values and then there's some really, really low values. So sometimes changing the type of, oops. Changing the type of the scale can help kind of surface data in a more visually helpful way. And this is what it's all about, right? It's about being able to quickly see what's going on with the data. So in this case, I think the log transform works well, like the log scale works well. But as we said, log scales don't work for R because having, yeah, having a log of zero. R depends on having sort of zero as the base, the minimum value, and that doesn't work well when you're trying to take a log of zero. So instead we can do is pass in a function to the R channel to kind of apply the math dot log function to the sessions number before we feed it into the channel. And that gives us essentially a similar thing. So you can see without that, it's a little bit harder to see the difference in the number of sessions. And if we don't use that transform and we just kind of pass in sessions as normal, it's like get no information here, it's all, it's way too big. The numbers are way too big for this to work effectively with the radius. So this type of massaging of like trying stuff out and seeing what works and seeing what's useful and what's completely unhelpful like this is part of the fun of developing a data visualization. And then going back to that question of like Observable being hopefully helpful to quickly make those changes and play around with things and see what works. And so then here and then we can also add a title. We saw how that worked in the last one. And also in this one, if you saw, if we don't have like a domain set on these axes somebody, somewhere has a really, really, really, really, really, really, really giant monitor. And it's sort of throwing off the automatic calculation. It's like this wild outlier here. So if we want to kind of constrain the chart to map to a, or to only take in a certain subset of the kind of overall domain of values that we have in our data, we can specify that domain range here. Well, that's a confusing way to put it, but the minimum and maximum values for the domain in an array here. And that again gets back to that question we had earlier about using the domain and range configuration for continuous variables like the widths number here. So variables that are kind of smoothly moving from zero to whatever, 40,000 or whatever it was we had. So we can kind of cut that off to help zoom in on the more informative area of the chart as it were. So did anybody have any questions about that?

So how is it that it knows that it's a range? So what happens if you have a third number in that domain? A number like- A third number in this, well let's try it out. I was wondering. Oh, yeah, let's see if it's a good one because if it goes down again. So okay, let's say we had a bigger one. Yeah, so I think what's happening here is that it's, I guess, it's taking- You can remove one zero to keep it on the same range. It's only gonna be 800 afterwards. Sorry, say that again. You can remove one zero from the 50,000 and it's gonna become closer. Yeah, just in case. Yeah. So what we can, as we saw before, we can pass in three values to the domain. Like before we had the kind of categorical values of like the different completion statuses of the test jobs. So we had, you know, we had three different categories. So we wanted to pass in the domain spans, these three different categorical values. I believe, and I would have to check the implementation to know what this is doing, but I believe in this case it's kind of ignoring the middle value. No, it's not ignoring it.

18. Understanding Scales, Plot, and D3

Short description:

When working with scales, it's important to understand the expected values and the correspondence between the specified domain and the actual data. Feedback is always welcome for improving the API. Plot and D3 are created by the same person but serve different purposes. Plot is designed for quick data exploration, while D3 offers more customization and flexibility. Plot allows you to quickly gain insights from your data, while D3 requires more effort to create visualizations. We've explored grouping, aggregations, faceting, and other advanced techniques. Now, let's move on to the next exercise.

No, it's not ignoring it. I think it's just getting confused because it's probably... Yeah. I'm not entirely sure, but it's sort of, it's gonna depend on like what the type of variable is that you're working with, what the type of property is, whether or not you can have a minimum and maximum or whether you can have like a longer array. So I think where we would want to look to find out more about that is in the documentation for these scales. So for like the X scale, I believe it's always gonna expect a minimum and maximum value. And so we can look for the scale documentation to see kind of what is expected here. So let's- Min on the left, max on the right, I see it in the docs. Exactly here. So yeah, so these are gonna be either min, max, min on the left, max on the right or an array of categorical AKA ordinal values as we had in the success, failure, skipped or success, failure, canceled kind of values that we had in the last visualization. Do you guys consider like, would you guys consider changing the API by feedback or I get considering just to, or is it like content? Yeah, I mean, feedback is always welcome. Feedback is definitely welcome. So as we said before, you can search through the issues on the GitHub repo to see if other people sort of agree or have already raised the points whether it's bug report or future request or request for change the API and things like that. And if not, you can submit an issue and let us know what you think. And, but hopefully, so, but in general, like I think the important takeaway for us for today is just this concept of that, like we're mapping on these scales we're mapping from a domain to a range. And even if you don't use plot after this, even if you go on and use Vega-Lite or D3 or some other library, like that core idea that we need to define, okay, here's the domain of data that we're taking in. And then here's the range of pixel values or color, opacities, levels, or what have you that we're putting out. Like that transformation is the kind of main thing I'd like everybody to take away, but absolutely open to feedback on kind of how this API could be more helpful to you.

No, but to be completely honest, for example, in my case, I've never had this experience of the visualization. I think you definitely put that point out, what you were trying to, right now, explain. And I am actually curious about this and seeing more. Awesome. Yeah, thanks again. To follow up on a question, though, I think it's unrelated, but you see the 1400 on the Y axis? The 1400 on the Y axis, around here? Yeah, all the way to the end, there's like a dot. Oh, yeah, uh-huh. Do you know why? Is it because of the 5000 a little bit and it's like reaching it or? Well, in this case, it might be that it's just sort of reaching it. So this is gonna be kind of like, this might be sort of one of those things where if plot was a little bit older of a library and had gone through a few more like edge cases and things that might, that we might be, there might be different defaults for, for example, not showing anything past a certain level. But in general, this is, and you can see it perhaps even more pronounced in the next exercise when we look at the faceting, like how it sort of pops onto the next facet. So there might be some bugs in here. But it's also something to consider of like, when we're talking about the mapping of the domain to the range, we are kind of telling plot like how to do that transformation in that animation that we saw before, we're not necessarily telling plot, hey, I guarantee you, I won't be giving you any values outside of that domain. So that's another thing to consider is like, how does the domain that we're specifying actually correspond to what is in our data? And that's where things like filtering and maybe making, like doing that data wrangling upfront to make sure that you only have the types of values that you're expecting. Like you wanna maybe throw out data points above a certain level or with a certain characteristic or something like that. That's another thing to consider in terms of the sort of correspondence between what you're telling plot to expect and what you're actually throwing at it, if that makes sense.

Thanks for the answer. And to get into more in detail, to get into more of the subject of datavis, as you mentioned, to get the broader idea and everything and to understand how can I use this in my advantage or in my needs. You did mention D3 twice at the start right now with another library, which I have not actually heard, which is why I'm focusing on D3. And my question is, how does this relate to plot, right? How do they, are they like, is one based on the other, is one solve a different problems or? Yeah, there, I would say they're kind of solving different problems. So plot and D3 are both created by Mike Bostock. So like literally the same person came up with both of them. And D3 is a very, it's a much lower level library. So it's very powerful and very customizable, but it can also be kind of a lot of like legwork and a lot of boiler plate or just a lot of a kind of thinking that you have to do to even get to like a very simple visualization. Or for example, doing this kind of faceting out, right? It's like takes a fair bit of thinking and work and cognitive load in D3. And so plot is intended to not be quite as customizable, not be quite as flexible, or maybe as powerful, and maybe it doesn't have all of the bells and whistles and like supercharged capabilities of D3, but it's intended to allow you to get to a meaningful visualization as quickly as possible. So that's where as we can see, in the faceting, we added one line basically to our plot and that turned one plot into three, right? So when we added that one facet line, so we go from our having our regular scatter plot to having three different ones that are broken up based on a category of the data or a feature of the data. And the idea with plot is that it is more geared towards like data exploration. So here we're kind of quickly trying to get up to speed, understanding this dataset and seeing what's going on. And what we're gonna talk about before we put this exercise to bed is sort of seeing like, what can we see here? Well, certain patterns emerge, right? Like we can see, we can quickly see some aspect ratios that are popular. We can see that, for example, people on mobile seem to be very much in portrait mode and not in landscape mode so much. And that's maybe a little bit easier to see once we split these out, we can see like same story or actually on tablet it seems like they're more split between the two different orientations. So those kinds of things, like trying to just very quickly go from, I have no code in front of me right now to I wrote five lines of code and now I understand my data better, that data exploration is really what PLott is intended for. And that's why I think it's great for folks like us who are maybe web developers. We're not like full-time data visualization developers that are working on really complex data visualizations. Maybe, maybe we are, maybe we're gonna go on to be. But we wanna understand our data as quickly as possible. And so that's really what PLott is intended to be able to do. Whereas D3 is sort of intended to be able to, like you can build your own Lego spaceship versus, like out of little tiny pieces versus like a Lego kit where it comes with more, bigger units. And you can just kind of quickly put them together and get a really cool X-Wing fighter or whatever. So if that helps, that's sort of one of the ways you can think about their different use cases. And I think we have some more documentation that I can send to you all about PLott and D3 as well to answer that question. So yeah, okay. Oh yes. Our workshop is going until two minutes from now. So I'm a little bit afraid that we are a bit behind schedule and may not be able to get through the next exercise. But let's take a look at it together. And yes, I will share it with you all right now. Okay. So before we move on from the browser sizes, I just wanted to, again, sum up. So what we've done here is we've gotten a little bit more used to some more advanced stuff like grouping, aggregations, faceting to split things apart. We've looked at a few more edge cases of like scales and transforms and things like that.

19. Making Informed Decisions with Data Visualization

Short description:

This is the type of data visualization where I can put it into practice right now. I can make decisions based on the patterns in the data and quickly make informed decisions. Whether using plot, Tableau, Power BI, or custom D3 visualizations, understanding the data helps me do my job better.

We've looked at a few more edge cases of like scales and transforms and things like that. But hopefully, what we're seeing is that this is the type of data visualization where I can actually, and I think, you know, if we're operating on, if we're designing UIs, for example, I can actually put this into practice like right now. For example, a while back, I was designing, I was implementing a design for a mobile layout and there was this one really annoying width of the screen where it just wasn't working with the tablet design and the mobile design. And I can, when I'm frustrated in a situation like that, I can look at one of these charts and see like, okay, there actually are like quite a few users in that weird bucket, so it's worth my frustration right now, trying to get this one pixel width to work. Or maybe, nope, you know what? There's nobody using this, I'm gonna deprioritize it and not stress myself out. So really trying to make decisions based on the patterns that we see in our data and if we can quickly go from numbers in the data to like, oh, I see, this is, everybody's on this aspect ratio, look at that. Then we can hopefully more quickly make more informed decisions. So that is the idea of what we're trying to get into here. And whether you go on to use plot or something else like a no code solution like Tableau or Power BI or some of the other things that are out there or whether you get really nitty-gritty and build really custom D3 visualizations, either way, like this is the idea. Like, how can I do my job better if I can understand the data better?

20. Exploring API Responses and Adding Interactivity

Short description:

In this exercise, we will explore API responses, focusing on server response times and status codes. We will work with data similar to server logs, containing information about requested paths, request times, status codes, and response durations. The exercise involves creating a chart using channels and scales, and there is a possibility for adding interactivity using observable inputs. These inputs provide HTML widgets like checkboxes, sliders, and search boxes that can be used to filter and modify the chart. The goal is to make the chart more dynamic and allow users to quickly explore specific data points of interest. The exercise includes hints, solutions, and references for further learning. Feel free to customize the chart and explore different possibilities. The aim is to combine web development skills with data visualization to create powerful and interactive visualizations. If you have any questions or need assistance, feel free to ask. There are additional resources and readings available for further exploration. Thank you for participating in the workshop!

Does anybody have any questions about this exercise before we really quickly share the third one? I'm gonna take that as a no. Of course, questions are always welcome. Okay, so hopefully this one is also loading for you all. If you can't see it, please let me know, but it should be available now. So this one is gonna have to be more of like a homework assignment, I guess. But, and if folks have to drop off, please feel free. I have a couple more minutes, so I'm just gonna kind of walk us through and we can kind of talk about a couple things here. The topic of this one is kind of API responses. So how quickly is the server responding to people's requests and what status code is it giving back after how much time? So what we have is some data, and this is like, we could imagine it's made up data, but we can imagine it's very similar to the type of data that you would get from your server logs, whether you're using like a hosted environment, like, I don't know, maybe AWS, if you're on the cloud, or if you're hosting things on Netlify or Heroku or one of these services, you usually can get information about like, what are the requests being made and then what are the responses that your service is giving back? So what we have is some information about the paths that people have requested, when they requested them, the status codes that the server returned, and then the duration in milliseconds that it took to give that response. So this is, again, gonna be homework, but by now, hopefully we have enough tools. We know how channels work. We know how scales work. We know how to do, how to kind of do manipulations to the data if we wanna wrangle it out a little bit or if we feel we need to. So you can go through and kind of hand craft a chart for this stuff. There is a possible solution, but the possibilities are huge here. So you can take a look at that. But what I really wanna show y'all, and I'm just gonna skip to the end here, there's a few to dos for adding interactivity to these charts. So we have, there's some handy things called observable inputs. There's a link up here. Let me open this, which are essentially just kind of prebuilt little HTML input widgets, like form inputs, that give you some different little UI controls to things like, like checkboxes, or radio buttons, or sliders, dropdowns, et cetera, et cetera. And there's also some, so we've been looking at these table inputs this whole time. There's also a search box, which is very convenient. And observable gives you a way to kind of capture the value of a, of an HTML widget like this, where if I check off of some of these boxes, which I've set up with, I've given it some options, I've given it maybe a label. If I precede one of my little assignments here with the keyword view of, observable captures a reactive reference to whatever the currently selected values of this checkbox are. So I can see that array, that it returns updating as the user. I'm the user in this case, but as somebody interacts, this is sort of updating. So this is like showing the kind of reactivity. And then I can use that value in the rest of my code. So what we can do then, hopefully you see where this is going. We can hook up some of these inputs to our chart so that I can have like really quick ways to change what is shown on the chart. Like for example, let's say I wanna hide some of these API statuses or maybe I just wanna narrow it down to a single one, like I dunno, these 302s. There's not very many. I can filter essentially my data based on the results of the status codes that I've checked here to just really quickly modify what's seen in the chart. And then similarly, there's a search widget which allows us to search by a particular path. So maybe I wanna see what are the user or the auth requests, how are they doing? So I'm afraid we don't have a ton of time to talk through how this works, but it's all explained in this worksheet. And the idea is that when we start to really leverage the power of doing data visualization on the web and in a live webpage where we have HTML widgets and we have JavaScript so we can have the sort of reactive updates that starts to become really powerful because then we can even more quickly kind of dig into the particular things we're interested in. And it's not like, for example, when somebody at your work, maybe like makes a chart and then puts it into a PowerPoint and then you can only see that one chart that they designed. You can't mess with it super easily. And so now we know how to kind of mess with the code to change things to see how we want. But sometimes you want some other person, maybe you're creating a visualization for your boss or for someone who doesn't know how to code. Or maybe you just don't have time and you just want to be able to really quickly like change which status codes you're looking at at a given moment, hooking them up, hooking our plot up to these little HTML widgets like this gives us that possibility to just very quickly change how the chart is displayed. And again, get to the information we actually care about because we don't care about the dots. We care about like, are people having a bad time because our requests are taking 24 seconds to return something like that. So I'll let everybody work through this as an exercise for the reader, if that sounds okay. If hopefully everything is self explanatory some of us care about the dots, that's true. I care about the dots too, that's a good point, Brian. But the dots it's what the dots represent. Anyway, great point. The exercises are hopefully self explanatory and the solutions are in there along with the hints. And then you can see kind of the finished project of what we're trying to get to. But what I really hope is that then you can take this, take these ideas in here and maybe go back and take some of your charts from the exercises one and two and add some interactivity. Maybe you can filter add some checkboxes for whether you're looking at mobile or tablets or what have you, or maybe you can add some filters based on searching for particular words in the data. So hopefully you can take this and run with it and you can look at the inputs, this documentation right here about observable inputs to see all the different ones that there are. And you can also write your own arbitrary HTML widgets if you want to write like a more complicated form. So there are some links at the bottom of this of some other cool things that people have made that you can try out. Like some of these are very fancy and very like complex interactions. So hopefully this just gives some inspiration and some ideas of how you can like put your web development skills like back into the data visualization. And then it's like this snail eating its own tail of like the web development makes the visualization better and the visualization makes the web development better and it just, everybody gets better and better and it's like a virtuous cycle of improvement. So yeah, that is the hope with this third exercise. Does anybody have any, I know we haven't really had time to get into it and we're already like way over time. Does anybody have any questions about that or about any of this stuff? I'm gonna leave you all with some references here. Yeah, we've already, we sort of talked about our takeaways of wrangling and the different kind of filtering and aggregating and things like that that we need to do. There are some resources in here, some more readings you can do and some more like conference talks and tutorials and examples and videos and courses you can work through. So you can check out those in the slides that we sent at the beginning. And yeah, that is all I really wanted to share with you and just big thanks to everybody for being here today. But I am here. I'm gonna stick around for another few minutes if anybody has questions or wants to talk through anything or yeah. Thank you for the talk, it was great. Thank you, thank you for being here and for all of the hard work behind the scenes. Really fast, just curious, does Observable can you represent three axis, three axes, axis of data visualization? So like XYZ.

21. 3D Data Visualization and Tracking Visitors

Short description:

Observable is not set up for representing three axes in data visualization, but you can use libraries like 3JS for 3D work. Other third-party libraries like Mapbox integrate with 3D layers. The workshop notebooks are currently unlisted but accessible through the URLs. Forking a notebook in your Observable account creates a private copy. If you refresh the page without an account, your progress will be lost. Copying the source code can help prevent data loss. For tracking visitors on a site, Google Analytics is a popular solution, but there are other open-source alternatives and bespoke analytics tools used by companies like Netflix.

Really fast, just curious, does Observable can you represent three axis, three axes, axis of data visualization? So like XYZ. Like a 3D data base. So yeah, plot is not really set up for that. However, if you use something like 3JS, like a library that's set up for 3D work, you totally can. And there's some great, if you go to like, thanks everybody in the chat, if you search for like 3JS for example, in the search box at the top of any page, you can see some kind of like examples. Yeah, like, okay, well, this one is in Spanish, but you can see some examples of, visualizations that folks have made. I'm thinking of one by, I think it was, well, here, I can just look at this one. We don't need to read all of the words, although I'm sure some of us definitely can, but you can see like some examples of, this is sort of a 3D bar chart. I think, somebody had a, Matt Dugan I think had made a 3D like pie chart situation. Let me see, anyway, yes, is the short answer. And it's a little bit more work in the sense that it's not quite as out of the box as plot is, but okay, I can't find his 3D one. But I'll look for it after this. To a certain degree, 3JS is a lot of work to so. Yeah and so it's sort of, it's like yes, but not quite as quickly as we have been making visualizations here today. Yeah, oh, thanks very much, Louisa. Thanks for coming. And there's also, so there's also, for example, there are some other third-party libraries like Mapbox, for example, which integrate also with 3D kind of layers and things like that. So that's another thing you can like, let me just look for a 3D. That's another thing you can look for is like, for example, if you're trying to create like a topographical, yeah, topographical maps, for example, things like that, you can find various inspirational notebooks by searching around Observable. Not sure where this one is. Oh, it's taking a minute. I hope it doesn't crash my computer. I'm gonna close this just in case it does. But yeah, wow, that is very impressive. Great question, though. Oh, thanks Patricia. I hope it was very useful. Does anybody have any other thoughts or questions? So yeah, I hope you can check out. Yeah, sorry. So first of all, I heard like you were opening notebooks and so on. And my question is, are we gonna have access to those notebooks later on, or are this like private or we can know like access them or. No, no, these are, so right now, so these are, they're not like public public yet. Cause you know, you all came to this workshop and not everybody in the world did. They're kind of like unlisted like unlisted you do videos right now. I will probably make them more public later after a while because you know, you guys get first dibs. But since you know, the URLs, you can access them, they'll be up there. I mean, I can't guarantee that there'll be there in 100 years, but there'll be up there for the foreseeable future. And yeah, and you can, if you have an observable account, you can always go up to fork the notebook and then that'll make a copy in your own account and then that will be like yours forever until the end of time. Or the end of the internet. Sorry? When we fork it, we can just keep it unpublished, right? Absolutely, yeah. When you fork it, it defaults to private. So it's just gonna be like a little private, well, not little, some of these are very long, but a private notebook in your own account. And then, if you want to publish it at some point, you can, you can use the publish options here. But if it's just like kind of a personal workspace, you can totally do that. Or you can also create new notebooks and kind of pull in elements or cells from the ones that you've that you've got privately. So as long as you're working in your own account, you can even like pull things across between private notebooks. It's just other people won't be able to see them. You mentioned that if you have an account, you get to like save your progress and everything. Does that mean that if I refresh the page, I lose all my progress? If you haven't, if you don't have an account, essentially, yeah, if you refresh your page, because what's been happening is everything has been in your client. That won't be saved. However, you should see a little banner up at the top that says fork to save your changes or something similar to that. And so if you do wanna save them, you can totally do that. Otherwise, what you can do is like copy out the source code and put it into whatever your own module or something like that, so you don't lose it if the browser crashes or something like that. But that said since we've got like what you would see on refresh is what you saw the first time that you looked at this, which does have in the solutions and everything, hopefully most of the stuff that we've looked at and you might have a little bit to make up if you accidentally lose your work. But yeah, you can copy and paste it for sure, if that helps. Yeah, for sure. Thanks for answering the question. Also, as far-fetched, mostly off topic kinda question, you mentioned a lot of analysis of logs and builds and so on, which helps us become, have an output of a better product, say it this way. I was wondering, out of your personal experience, how would you recommend me tracking, for example, visitors on a site? So I know there's solutions, like for example, I forgot Jesus, the name of the Google thingy. Google Analytics, yeah. That's sort of something similar to that browser device data, like Google Analytics or sites like that would give you that type of thing. Yeah. And if you want to go more like in-house, do you have any open-source maybe, just out of curiosity, see different alternatives? Non Google Analytics. There certainly are a lot. I'm probably not the best person to ask. There are some links in those slides about folks that do a lot of really bespoke analytics. Like I think Netflix is a really amazing team for this. They've really pioneered a lot of really cool analytics work so I might refer you to folks that know more about it than me to maybe like hear about how they process that data. But essentially, it becomes a sort of own domain sort of processing metrics data and making sure it all gets ingested properly and transformed properly and stored to some kind of data warehouse properly. So depending on the scale of your operation, like that can be a very deep rabbit hole to dive into if you know what I mean.

22. Accessing Insights and Analytics

Short description:

Google Analytics, Netlify, CloudFlare, and GitHub offer individual ways of delivering insights and analytics. The GitHub actions API provides access to various data, including interactions, packages, users, and contributions. Creating custom actions allows flexibility in performing different tasks. Check out the GitHub documentation for more details. Thank you all for attending the workshop and for your engagement. Feel free to reach out on Twitter, Observable, or the user forum for further support and discussions.

But I'm afraid off the top of my head, I don't really have like I think things like Google Analytics or if you're hosting on stuff like Netlify or if you have something like CloudFlare that's like serving your traffic at the edge, that sort of thing, those all have their kind of individual ways of delivering you those insights and those analytics. So I think it just, it sort of depends on like what you're hosted on. And I don't know of like a super easy to use kind of fits all use cases library off the top of my head, I'm afraid.

No, that's not probably, actually, you give me a very good idea, so thanks again. Yeah? Yeah, so it was one last thing about this whole... Oh yeah, but this, for example, the pipelines, the failing pipelines and how long the pipeline's taking, everything that you mentioned that you, for example, get from GitHub, is this, I mean, I could just Google it and stuff, but is this actually available to GitHub? GitHub saves this and it's available like per repository or per user, et cetera?

Yes, absolutely. So if we, I think in the exercise, we have a little link out there and I know we've been moving like super fast through this, but there is a little link to the GitHub actions API here. So this is like, GitHub has this very, this is a notebook that kind of shows how you could pull this data from the API. It's a little bit, it's kind of like intended to sort of focus on these integration tasks. But if you click through to actually like the GitHub docs of their API, you can see all of the stuff that's available. Like for this, and this is just for their actions kind of endpoint or series of endpoints. I don't know, their actions like API, you can also get all kinds of other data about things like interactions on the site, the kind of different packages that are published, the users that visit, like the contributions to a repo, all kinds of information. So like the GitHub API is a huge, huge, huge source of really interesting data because you can pull it for a lot of public repos to get some of those data sets that might be fun to wrangle and play with. So would definitely recommend checking out the GitHub docs. Let me drop that link here. So you can like even have like web hooks that will let you know somehow that, hey, this action's fast or failed and so on.

Yeah, I'm not like a super big scholar on actions. But if you check out actions, you can create your own. I'm not entirely sure how it works with like free or paid or what have you, but you can essentially create your own jobs that do, as far as I understand, pretty much whatever you want. So for example, it could be writing things to logs somewhere, could be writing, or like, you know, sending a Twilio text message notification, those sort of things, like lots of different possibilities there. So would definitely recommend having a spelunk through their documentation.

That's awesome. Thanks for sharing. And I just wanted to say thank you for this awesome workshop. It was really engaging. I look forward to seeing more of your content. It's very nice how you explain things and how excited you get. It's kind of contagious. It was nice. Well, thanks so much. That's so lovely to hear. And thank you all. Thanks everybody for being here and then for asking these great questions and making these really cool charts and sharing your feedback about Plot and everything. Like please go open issues and upvote issues and things like that in the plot rep. And my contact info is in these slides here. So you can find me on Twitter. You can find me on Observable. We have a user forum too. If you keep using Observable, which you don't have to but if you do, or you keep using Plot and you wanna ask questions, we have a forum so you can, you can post, I don't know why this isn't working, help me out. And it's a really friendly community. Folks will help you out. So yeah, hope everybody can reach out. If you have any issues with the exercises or you want any support on as you, as you go further in your data visualization journey, like feel free to get involved on the forums and you know where to find me now. So yeah. Thanks so much. It's really fun. I wish we could all see each other but next year or whenever maybe, but in the meantime, thank you all for being here from all these different time zones. I know it's really late for some people. So just thank you so much. This has been really fun. Thank you.

Watch more workshops on topic

JSNation Live 2021JSNation Live 2021
130 min
Painting with Data: Intro to d3.js
D3.js is a powerful JavaScript library for building data visualizations, but anyone who has tried to use it quickly finds out that it goes deeper picking your favorite chart type. This workshop is designed to give you a hands-on introduction to the essential concepts and techniques for creating custom data visualizations with d3.js. By the end of this workshop you will have made an interactive and animated visualization on a realistic dataset that you can easily swap out with your own.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

React Advanced Conference 2021React Advanced Conference 2021
27 min
(Easier) Interactive Data Visualization in React
Top Content
If you’re building a dashboard, analytics platform, or any web app where you need to give your users insight into their data, you need beautiful, custom, interactive data visualizations in your React app. But building visualizations hand with a low-level library like D3 can be a huge headache, involving lots of wheel-reinventing. In this talk, we’ll see how data viz development can get so much easier thanks to tools like Plot, a high-level dataviz library for quick & easy charting, and Observable, a reactive dataviz prototyping environment, both from the creator of D3. Through live coding examples we’ll explore how React refs let us delegate DOM manipulation for our data visualizations, and how Observable’s embedding functionality lets us easily repurpose community-built visualizations for our own data & use cases. By the end of this talk we’ll know how to get a beautiful, customized, interactive data visualization into our apps with a fraction of the time & effort!
JSNation 2022JSNation 2022
26 min
GPU Accelerating Node.js Web Services and Visualization with RAPIDS
The expansion of data size and complexity, broader adoption of ML, as well as the high expectations put on modern web apps all demand increasing compute power. Learn how the RAPIDS data science libraries can be used beyond notebooks, with GPU accelerated Node.js web services. From ETL to server side rendered streaming visualizations, the experimental Node RAPIDS project is developing a broad set of modules able to run across local desktops and multi-GPU cloud instances.
ML conf EU 2020ML conf EU 2020
26 min
Never Have an Unmaintainable Jupyter Notebook Again!
Data visualisation is a fundamental part of Data Science. The talk will start with a practical demonstration (using pandas, scikit-learn, and matplotlib) of how relying on summary statistics and predictions alone can leave you blind to the true nature of your datasets. I will make the point that visualisations are crucial in every step of the Data Science process and therefore that Jupyter Notebooks definitely do belong in Data Science. We will then look at how maintainability is a real challenge for Jupyter Notebooks, especially when trying to keep them under version control with git. Although there exists a plethora of code quality tools for Python scripts (flake8, black, mypy, etc.), most of them don't work on Jupyter Notebooks. To this end I will present nbQA, which allows any standard Python code quality tool to be run on a Jupyter Notebook. Finally, I will demonstrate how to use it within a workflow which lets practitioners keep the interactivity of their Jupyter Notebooks without having to sacrifice their maintainability.
React Summit 2022React Summit 2022
26 min
Sharing is Caring: Reusing Web Data Viz in React Native
At Shopify, the Insights team creates visualization experiences that delight and inform. We've done a lot of great work prioritizing accessibility and motion design for web. Our mobile experiences though, were a bit of an afterthought, but not anymore! In this talk, we'll go through how we created our data viz components library; How we encapsulated core logic, animation, types and even UI components for web and mobile; and also why keeping things separate sometimes is better - to create awesome UX.