In this talk, I'll take you through my journey as I joined the team supporting our Smart TVs application and share my experience learning one of the most overlooked but essential pieces of functionality we have.
Let's build a TV Spatial Navigation
AI Generated Video Summary
Today's Talk is about building a spatial navigation library for Smart TVs. The speaker shares their experience and challenges in building applications for Smart TVs. They demonstrate the functionality of spatial navigation using React and React Router. The navigation engine class is developed to handle TV control events and navigate through elements. Circular navigation is implemented to make navigation easier for users in TV applications.
1. Introduction to Spatial Navigation
Today we're going to talk about spatial navigation and how to build our own. I'll share what I learned and how we can create our own library. Let's discuss why we need to build our library and the challenges of building applications for Smart TVs. Although the web browsers are sophisticated, spatial navigation is still a work in progress.
Welcome everyone, thank you very much for joining this session. Today we're going to talk about the spatial navigation. But rather than talking, we're going to be building our own.
My name is Sergio Abalos. I'm a husband, father, and a parent of a small little sausage dog. And when I'm not running after any of them, I'm working also as a front-end developer at Spotify, where I am part of the team that is building the Spotify client that can run on your smart TV. That's the reason why even when this is a React conference, we won't be talking about mobile, no, we won't be talking about desktop or laptop and especially we won't be talking about your mouse. Instead we're going to be talking about the TV control, that little boring device that I'm sure all of you have in your living rooms.
So in case you were wondering, special navigation refers, as the definition says it here, a user input modality that allows mouseless navigation around the page. Basically what this means is like the action that you do with the directional keys, up, down, left and right keys of your TV control is what you do, like selecting an app and selecting any item or like just executing any action. So the reason why I choose this topic is because as most of you in the audience, I bet that you've been working on mobile or desktop. It was the same for me until I came to develop an application for smart TVs and I wanted to bring those topics that I felt that they were unique to, well obviously to me, but most of you that you've been working on other platforms and I didn't need to look very far away. I saw that Spotify develop internally a library for handling this specific case, so I decided to take a look at it, just out of curiosity and I was fascinated, not because the code was wow, no, the code was fine, but I felt that it was a very interesting problem scope to solve, so this is what this talk is about, I want to share with you what I learned and how I started digging into the code and, like I said, instead of just be talking about it, we're going to build our own. Excuse me because I don't have a lot of experience, I'm not going to be doing any live coding, but I'm going to show you step by step how you can build your own library, but first of all, I think it's very important to like ask ourselves like why do we need to build our library, like are we using the time efficiently because if we're talking about building an application on a Smart TV, shouldn't the special navigation be provided by the TV platform? And the answer is like, yes, it is if we are building a full native application. So let me try to explain in the Smart TV market, there are so many brands is really quite segmented and each of these players have their own operating system. So for them, one has to be old and native applications so that you can install using their own application store. So you cannot get around these, but to make our life a lot easier, we decided to use a hybrid application where like only the user interface is done using a web application. That gave us incredibly interoperability because we can use the same source code, the same project for running on all these platforms. But of course, it came out at cost. And in this case, we lost the support for some of the features that the platform provides like the spatial navigation. After this, I was thinking like, okay, okay, is 2023. Today, the web browsers are a very sophisticated piece of software. Isn't it that the spatial navigation already supported by the web browsers? Not yet. It's a work in progress. There is a draft for proposing this new API from 2019. But again, this market is very new. I think Smart TV's application started coming out in 2017. So, it feels like yesterday. So, it's understandable that the browsers are catching up. Because I bet not all the players out there are using this hybrid most as we're doing it.
2. Building Applications and Limitations
There is an open source project for Smart TV applications, but we couldn't use it because we didn't have their library. I recommend using it. We can start building our own application by wrapping navigational notes, which are elements the user can interact with, and assigning identifiers to them. However, this approach has limitations with dynamic views, is error-prone, and makes debugging harder.
Then I was thinking, okay, okay, but we're not the only company building a Smart TV application. There must be an open source project. So, I look up and yes, there's actually one. And Norwegian media, thank you very much for contributing to that. Unfortunately we didn't have their library by the time we started building our own applications. So, that's the reason we couldn't use theirs. But I totally recommend to use that one.
So, having answered that question, we are not wasting our time, let's get started. If I ask you from top of your head, your intuition, what would you build? I ask this question because this is the first thing I had in mind when I knew there was a library for solving spatial navigation. And I even read it that the same approach was used by other companies in their blog. I mean, they put it themself. It's something similar to this. Imagine that you have, for example, in the Spotify client, the side menu where you have the different elements of it. If you remove all the images and styles, then you get the skeleton with the links to home, search your library and so on. And well, if you're using a web application, it will be a link to each of these views. So our intention was to like just wrap all those elements that we call navigational notes. Meaning like those elements that the user can interact with the TV control and just put an identifier to just identify from each other and tell them where they should go if they go up and down and so on and so forth. So this is like very naive approach. This actually gets the job done. There's no problem with that. But of course, there are a few small caveats. And the first one is that it is very difficult to work with dynamic views. And nowadays, almost every application offers some sort of recommendation personalization. So if you're building the user interface, most likely you don't know what you're going to get. And the next one is that it's error prone because, as a developer, you need to put the unique identifiers. We're human. We make mistakes. That's common. And finally, it's just that there is a lot of information that has to do with the navigational, the algorithm for the spatial navigation, but has nothing to do with the layout or the view. So it just makes debugging a lot harder.
3. Building the Navigation Library
We built a demo site to demonstrate the functionality of spatial navigation. The site has buttons that render random gifts or prizes and a link to go back. However, the site doesn't listen to TV control events. We guide you through the code, which uses React and React Router for routing. The heuristics for building the navigation library involve raising all the navigational nodes.
So we thought, we can improve this. Let's make it better. Okay, to make this demonstration, I built the demo site where we are, basically it works as any other desktop application, but we are going to add on top of it the functionality for the spatial navigation.
So let's go to the site. It looks something like this. You can see that there are only 5 buttons or 510 buttons. Each of them takes you to the same page, but it renders randomly a new gift or a new prize, and then it has also a link to go back to the previous page, and again, and it renders something random. So it works with the mouse, obviously, as you can see, but it doesn't listen to the TV control event. So let's do that. Let's add the functionality to do this. But first, I just want to guide you through the code that is running these demo sites, so you have a very clear idea like how this works.
First, as we are as this React conference, we're using React app, and it's the index that you usually get after running the Create React application where we are rendering our app component. The app component looks like this. It has a router configuration using the library React router, where we only have two different paths. The first one is the welcome page, the one that you see as soon as you load the app, and the surprise page, that is the one that you are taken to after you click in one of the question boxes. If we go to those two views, it's, again, something very similar. We have our React component. For the first component, the welcome page, we have an empty array of 10 locations where we used it just to render the same component over and over for 10 times. No, that is the question box component. And on the other side, the surprise view page, we have hard-coded the path of the different prices, the image, excuse me, of the different prices. We pick one randomly, and then we just render that image, along with a link to go back to the previous page. Finally, what we call the link element, is, again, something very simple. We have a React component for rendering the question box, that is just using a link from the React router library that is just wrapping a question box that basically you're rendering. On the other side, it's exactly the same, instead of... The only difference is that instead of rendering the question box, we are obviously rendering the children, which is just a text component. Cool.
So, what are the heuristics for rendering this... Sorry, not for rendering, for building the navigation library. You can do it in many ways, but the method that we decided to do is this. First, we raise all the navigational nodes, meaning all the nodes that the user can interact with the TV control on the app.
4. Building the Navigation Engine
We listen to the TV controls and find the selected node based on the current selection and the key pressed. The navigation engine class has methods for registering nodes and handling navigation events. We create a schema and a class for the navigational nodes, and provide a context provider for the instance. The external API uses a hook function that returns the rendered element and a boolean for focus. We develop the useFocusRef hook function to handle element references and register/unregister nodes. We test the functionality and confirm the presence of one node.
Then we listen to the TV controls coming... Sorry, to the events coming from the TV control. Then we find the selected node depending on where the current selection is, and where you want to go, depending on the key that he was pressing. And finally, we just update the state to indicate what is the element that should be selected now.
If you look at the diagram from the step one to three, this is the way we decided to build it. We'll have a navigation engine class that has two methods. One for registering all those navigational nodes, and another one will handle navigation that will be listening to the key down events coming from the TV control. Cool.
So let's start with the first step, the navigational nodes. Here we have a schema for the navigational node that will have just an identifier and a reference to the HTML element that is rendered. Then we create the class that will have a private variable, an array of these navigational nodes and methods for adding values, and another method for removing the values. And then we go to the index script where we create an instance of this NavigationEngine class. Then for only debugging purposes, we add it to the global scope. You will see how we're using it on each of the steps when we're testing. And finally, we provide a context provider of this instance, so that it's available through all the app using a context provider. Cool.
Then we go back to the question box and the navigation link where this is going to be the external API that we use. And it's something very simple compared to what we were using before when I mentioned about the naive approach. We have a hook function that returns only two values. First one is one callback to know what element we're rendering, like what HTML element. And the second is just a boolean value that tells you whether that element that you're rendering is focus or not. That is as simple as that. Cool.
So, we develop this hook function called useFocusRef that, as I mentioned, it has a callback function where we have a reference value and we also create the callback value that will be executed by the HTML element just to point a reference to this element. Then we create a unique identifier. We obtain a navigation, sorry, an instance of the navigation engine and, with the help of the useEffect hook function, every time this component is rendered we call the method registerNode and when it's unmounted we call the method unregisterNode just to make sure that there are no memory leaks. For now, we're going to leave the isFocused value as false, but in the last step we're going to come back and put the real value, but don't worry about that now. Cool, so we go back to our app and then we're going to make quickly going to test it. If I console lock the navigation engine, we can see that we have indeed ten elements, and you can see that I'm pointing to the reference of the first HTML element, and so on. If I click on any of them, then it should be renewed properly, and yes, you can see that we only have one node.
5. Listening to TV Control Event
We go back and now we have ten again. By the way, in case you are wondering like where are these screenshots, it's just a backup plan in case a demo app is not working, but luckily everything is going fine. We can go to the step number two and listen to the TV control event. This is one of the simplest steps where we are only adding an event listener for the key down event. We already have an array where we have identified what are those keys that represent these directions, and then we also have a map between the internal values that we define as one of the direction keys, and the other one that comes from the platform.
We go back and now we have ten again. So cool, it's working. By the way, in case you are wondering like where are these screenshots, it's just a backup plan in case a demo app is not working, but luckily everything is going fine. But you can just ignore this for now.
Cool, so now we can go to the step number two and listen to the TV control event. This is one of the simplest steps where we are only adding an event listener for the key down event, and the call back function that we are going to pass is the one that we create, or sorry, we obtain from the getKeyEvent handler that we develop in this way. It's basically this call back function is only using a method or function to identify if the key that is pressed is one of the four keys for the direction, and we are only caring for that in this application, and if you know there are some TV controls that has a lot of keys. So that's why we just want to focus on these four. And to do that, we already have an array where we have identified what are those keys that represent these directions, and then we also have a map between the internal values that we define as one of the direction keys, and the other one that comes from the platform. In reality, if you are running this for the desktop platform only, you don't really need this step. But I felt it was important to show it here because these values, these maps that I have here are very different from platform to platform, because each platform has their own values. So this is an example of one of those integration tasks that each of the navigation apps has to do. But like I said, this is completely necessary for this demo app.
6. Finding and Selecting Next Nodes
We test if we are listening to keyboard events. We use the getboundingclientrec method to get the coordinates and sizes of registered navigational elements. We filter the elements based on the direction of the key pressed, the main axis of the current selected element, and the distance from the current selected node. We implement the handleNavigation method in the navigation engine to handle key directions, filter methods, and selection of the next node. We replace console.log with handleNavigation for TV control directional keys. We test the functionality by pressing down and to the right.
So cool. We can go back to the number two, and we are only going to test if we are listening to the keyboard events that you can see here. The arrow left, the arrow... Let me do it once again. If I press down, you can see that it's down. If I press up, so on, left and right. Cool.
So we can go to the step number three, the one that I think this is the most fun, find the selected note. So in case you were wondering, why do we need the reference to the HTML element, it's because you can call this method called getbounding client rec that gives you the coordinates, whatever, according to the viewport of where the element is rendered, and also the dimension. I feel this is super cool because if you grab all the navigational element notes that you have registered so far, and you run this method for each of these elements, then you get all the information that you need, the coordinates and the sizes. So you can forget what you're rendering with this information. You have everything you need to just select, to create the algorithm and select the next node that should be selected. So let's do that. How do we choose the next node? First we filter by the direction of the key that it was pressed, then we filter by the main axis of the current selected element, and finally, when you narrow to only those couple, then you just pick the one that is closest to this current selected node.
Let's look at one step by step. Imagine that you have a matrix of 5 by 5, a little bit bigger than what we saw before in the demo app, and you are sitting in the middle. If you are going to the right, then you want the last two columns, but if you are going to the top, then you want the first two rows. Then you choose by the main axis, for example if you are moving horizontally, your main axis is the margin top and the margin bottom, but if you are moving vertically, then you to choose the left and the right margin. After that, then you just choose by the distance, the one that is sitting closer. Ok, let's implement this in code. First we go to the navigation engine and we create one method called handleNavigation that will take as a parameter the direction of the key that it was pressed. Then, as I mention before, we filter by the direction of that key, and to make our life easier, we already have hardcoded the filter method that we are going to call depending on the direction. We do a similar approach but in this case it is going to be for the main axis, then we pick by the distance, and finally if there is any element that is selected, then we just create a new private variable called selectedNode for the navigation engine. Finally we replace the console.log that we had before with this method called handleNavigation whenever one of the directional keys in the TV control is pressed. Cool! So we can go and test, we go to step number 3. First we are going to, this is annoying, let me just remove the warnings, engine selectedNode, so we can see that initially it is undefined. If I press down, then we are going to have a new value. And it's done. Let's go to the right now. Boom, it's there.
7. Updating the Cursor and Testing Functionality
We go down, boom, and then we go to the left. It's working. Look, I can even do this, and it can go. Let's go to the last step, update the cursor. We add a subscribe publish method to inform which node is selected. We add methods for adding subscriptions and executing callbacks. We replace the false value in use focus ref with use node focus hook to handle focus state. We test the functionality and it works.
Do you see? We go down, boom, and then we go to the left. Ok, excellent, so it's working. Look, I can even do this, and it can go. Awesome, so let's go to the last step, update the cursor. If we go back to the diagram that I showed before, there is just a new addition that we can make. It's a subscribe publish method where the node that is registered is going to be subscribed inside the navigation engine, and then whenever there is a handle navigation event that we notify each of them, just to inform which one is selected and which one is not. So we come back to the navigation engine, we add a method for adding subscription, basically, for keeping the callbacks that we're going to execute when there is a handle navigation. We also have a method for later executing all those callback and like I said every time that there is a new element that has been selected, we just call this notify all subscribers. Lastly, we go to the use focus ref where you remember that we had the false value, now we remove it and we replace it with a new hook function called use node focus that basically what it's going to do is going to keep a local state for this component where by default is going to be false but with the help of a useEffect function every time this component is mounting the first thing is go to the navigation engine and check like hey am I supposed to be focused and depending on the value you updated and if not then basically you just subscribe so that whenever there is a handle navigation you check again like a is supposed to be focused and then we just update it accordingly. We obviously pass whenever the this components unmounted obviously we want to unsubscribe again to avoid memory leaks and cool we have all the basic requirements and we can go and test real quick step number four water breakaway we go back and we can see this working it's okay, so it still works with the mouse right it's important to check that and excellent it's working.
8. Implementing Circular Navigation
In TV applications, circular navigation is a common feature that makes it easier to handle the limited control options. Instead of requiring users to navigate back to the beginning when reaching the end, circular navigation allows them to continue in the same direction. To implement circular navigation, we need to create a navigational container to mark the region, build a navigation tree, and update the node selection algorithm. We start by wrapping the desired region in a navigational container component and using the useRegisterNavigationalContainer hook. The navigational container has a new attribute called 'cycle' to specify the navigation direction. We then create a context provider to define the ancestor ID for the elements being rendered within the container.
Are you ready for a challenge? This is something that I felt it was very important to demonstrate here in this talk. If you see that there's a new button called level up and what we have is like we have two rows, where one of them it works as usual I call it normal but then there is another one where we call it circular so what do I mean by circular? In the applications running on a TV there is a very common feature called circular navigation.
Let's be honest, let's be frank. Using the TV control is not the most user-friendly. I mean a lot more natural is with the mouse where you select as your hand describes but here you just have only four keys. So that is not very user-friendly and there's this feature to just make it a little bit easier to handle. So in certain radios, for example the same menu, you want to soften the constraints, meaning that if the user is clicking down, down, down, and they reach the top, instead of asking them to like press up, up, up to go to the beginning, they just can't continue clicking down and then you just resume the course. That is why we refer to circular navigation. I think this is a total game changer and quite frankly this is the feature that actually really got me into, and I will explain why.
Imagine that we are on the demo app and we have this note where you want to distinguish those that belong to the circular navigation and the rest who don't. If you look at this example, then you realize that we cannot longer use the list that we have because you need to distinguish which one belongs to this list. So instead of a list we end up using a navigational tree and this is what I think is fascinating because it's no longer about the coordinates of the dimension, it's also about the relationship between one another. So let's build this. These are the steps that we can do for the circular navigation and again we can use many steps but this is the ones that we decided to come up with. First we create a navigational container just to mark what region belongs to the circular navigation and which one don't. Then we build a navigation tree and finally we update the node selection algorithm. Let's start with the first step.
So you remember this view that I'm showing you. This is the code Very similar to this welcome page that I showed before. Instead of having an array of 10 empty elements we will use 5. We will use it just to render twice, but for the second one we're going to wrap it on the navigational container just to distinguish the region where we want to apply this new attribute. Then we create a new React component called the navigational container where we call this hook function function called useRegisterNavigationalContainer and this function is very similar to the one that we were using to register the normal navigational node. The only difference like first we don't have any reference to the HTML element because we are not rendering anything and second we have a new attribute the sorta new parameter for the attributes of this navigational container. This is how it looks which I think is more clear if I show you the schema. We have a new type called navigational container that has a new attribute a new attribute called attribute where just for now it can be it will have the property cycle that describes whether the cycle is vertical or horizontal and since we're using TypeScript we didn't need to update the method in the nav engine so it can receive both types. Okay so let's go and check if the navigational container is render as properly. If I go to the navigation engine and I bring the node value you will see that we have the 10 elements but there is one one right here that doesn't have any HTML element compared to this one and this is has the cycle property so it is registering properly. If I go back then we have just the 11 elements as usual the last element is only the link to go to this page. cool let's go to step number two build a navigational train. For building the relationship or like so reverse at the navigational train the only thing that we're going to do is that we're going to add a parameter called ancestor ID that is pointing to the parent. Frankly it is a different approach of what we choose internally, it depends I mean this is the approach that Norwegian media decided to use in Spotify we decided to instead keep an array of the children. I think there is not much difference between them but I decided to pick this method because it's easier to demonstrate that rather than having the children's array but you can go with whatever. all right so now we create a context provider where we define what is the ancestor ID of the elements that we're going to be rendering let me try to explain. if we go to the index app sorry in the index script then we'll have the context provider wrapping the whole application where by default the ancestor ID is going to be the root is that is by default but in the navigational container we're going to update the value of the ancestor ID with the unique ID that we obtain where we were registered in this navigational container so that means that any children being rendered within this navigational container will have as ancestor ID this new ID.