Building JS Apps with Internationalization (i18n) in Mind


At Adobe we build products for the world, this talk with provide a high level overview of internationalization (i18n), globalization (g11n), and localization (l10n) best practices. Why these are important and how to implement in design, UX, and within any JS codebase - using vanilla JS examples, and top open source library recommendations.



Hello JS Nation! Thanks for joining me today. I'm really excited to share with you all some ideas about building JavaScript apps with internationalization in mind. So my name is Naomi Meyer and I work on the globalization engineering team at Adobe where I do internationalization engineering for a lot of different Adobe apps. So this is where you can find me online, my Twitter and my website. And if there's anything that you're feeling passionately about, please let me know. I would love to continue this conversation online. So here's our agenda for today. I'm going to start with a name example and then move into some definitions of localization, internationalization, globalization, just to kind of level set and make sure that we're all on the same page. And then I'll move into my top five tips to avoid the most common mistakes that we find with internationalizing JavaScript. And then end on culturalization. So overall, the aim of this kind of general presentation is to encourage you all to create experiences that are equally usable, relevant and meaningful for users all across the globe and to really amplify the voices of our global users. So I would like to invite you all and encourage you, my fellow JS coders, to go out there and really put the world in the world wide web. So let's talk about how we can do that. Let's start with a name example. So this is a very common string that we see often in English where folks introduce themselves with the syntax of, hello, my name is Naomi Meyer. So this is my first name or given name in blue, followed by my last name, family name or surname. So if we take this simple string and translate it here into Japanese, Hindi, Hebrew, Arabic, Korean and Chinese, we can see that that syntax of first name, last name is sometimes switched. Sometimes it's last name, first name. And obviously, sometimes the text is read left to right. Sometimes the text is read right to left. And so this is kind of just a really visual representation of how we can identify a user's name differently across different locales. And this is kind of a high level problem that we're trying to solve with internationalization. And my name is kind of a simple American English name. Here's an example of some other names from Brazil and Portuguese, Russia and in India, kind of common names and how they don't necessarily fit easily into this simple first name, last name paradigm and some of the challenges that we're facing here. So this is how Google handles that problem with when you create a Google account, you enter a first name and a last name as distinct data fields. So users from regions and languages that don't necessarily follow that paradigm are going to have troubles. But Twitter recently came out with a great solution to this problem where they have just a simple name field where a user can go in and enter their name and their native script and their native first name, last name, last name, first name kind of syntax and it will be stored as one distinct field. So I think this is a really cool solution to the username internationalization problem. So now that our heads are kind of thinking more deeply about internationalization, let's move on to some definitions. So what are we talking about here? When we start at the most granular level, we have translation where hello becomes Hola, Konichiwa, Bonjour. Then the next sort of level of granularity would be localization. And that's, you know, in English we spell localization with a Z in the United States. But if you go over to the United Kingdom, localization is spelled with an S. And so those are both English, but they're different regional dialectic variations. So that's sort of the level of locale that we get into with localization. The next sort of level up of granularity is internationalization. And this is more on the engineering side where we wrap the application in tools for internationalization so that they can be shipped in translated forms. So this is where we go into the pipes where we, if a software is a house, we'll reach into the pipes, change them out and create a system that can be easily translated. Then the next level of granularity is globalization. And these kind of all fall under this umbrella of globalization or G11N. And important to note that these are numeric acronyms. So for globalization we take the first character G followed by the number of characters and then the last character N. And the more I think about these sort of big ideas, culture is deeply rooted in our thinking patterns and it affects how our users interact with and benefit from digital experiences. So internationalization or globalization really go way beyond translation. And by acknowledging cultural characteristics and really celebrating the differences, we're creating with innovation and sort of accessibility and building products for the whole world of users. So if we look at these two visualizations, we can see that the majority of people on earth do not speak English as their first language. But we can see that the majority of digital content is in English right now. So this is really an opportunity for us to expand digital content online into different languages for users all across the world to use in their native mother tongues. So let's talk about the top five tips for how we can do that. So tip number one is to use Unicode everywhere. So to start, what is Unicode? I'm sure we're all very familiar with seeing this line in our HTML tags where we say meta char set equals UTF-8 for web. So what is UTF? UTF is the Unicode transformation format. Unicode, right? So here is how UTF is represented across 8, 16, and 32 bits for the character A and the character O in Japanese. And so UTF-8 is most common on the web. UTF-16 is used by Java and Windows, and 32 are used by Linux and various Unix systems. So UTF is really cool because it's reversible. And so conversions between all are algorithmically based and fast and prevent lossless round-tripping. So we know that many programming languages will directly use one of these UTF encodings. But as JavaScript engineers, which UTF is JavaScript, this is really important when we think about encoding. And whether you're in React or Angular or Vue or Svelte, they're all under the hood encoded in the same way. So if we look at the spec ECMAScript, the standardized version of JavaScript, defines how characters should be interpreted as either UCS2 or UTF-16. So UCS2 is a 2-byte universal character set versus UTF-16, which uses a 16-bit Unicode transformation format. So these are two different contrasting systems. So it creates a lot of really weird and interesting JavaScript bugs when dealing with encoding of characters. Because of these contrasting systems, one is always 2-bytes, and the other uses surrogate pairs. So let's talk about some of those common bugs that we see. So the first one is with combining marks. So combining marks is where we take a Unicode code point for the Latin small letter A, and then we add a combining mark code point. Here we have a combining ring above it. That's how it's for a Danish atomic grapheme character that has A with a little circle over it. So the A with a little circle over it are two separate code points that combine to make one. So if we take these two separate characters and consolog them, we can see what's returned is the A with a little circle above it. Great. But because these are two separate characters, we can see some problems. So similar to that A with a circle above it, we have the E with the accented E in café, you know, that we see in Spanish. So if we define that variable as drink with C-A-F-E with the final combining mark as the accent in café, and then we consolog that variable drink, we can see café is returned. But then if we check drink.length, the length is five. And if we try and split that string into an array, we can see that the final index of the array is that combining mark. So as I'm sure you can imagine, a lot of confusion will arise when trying to do string manipulation with combining marks when you forget that these code sequences exist. So how can we handle that is using the normalize method that's built into JavaScript. So string.prototype.normalize returns the Unicode normalization form of the string, which is really handy straight out of JavaScript. So if we take that same variable drink, that's café with the combining mark, and normalize that string, we can now see that the length is returned as four, and the final index of the array when split is E with the accent above it. So we don't get that extra index with the combining mark. So this is really great to keep in mind to avoid localization bugs when doing string manipulation. But it's also really important from the human perspective. So I saw this tweet recently that says, found on a US government website, everyone, this is what systemic racism is. It's when folks are excluded. So this person, Raquel Velez, her last name is in Spanish and it has an accent over the E, but she can't enter her last name in the last name field of a US government form because of the combining mark in Unicode that is creating encoding issues here. So by fixing simple string encoding issues, we can help solve these problems on the human level. So let's move on to tip number two, to wrap all strings in an object for translation. So from a really high level kind of pseudocode example here, we can take a hardcoded hello string of hello world, and then, so this is a hardcoded string, this is what not to do. But what we can do instead is take this string and wrap it in an internationalization object where we have a list of resources for each of the different locales. So here we can see English has an object with the key of hello message that equals the key value. The value is hello world. And then we can display that variable of hello message as the key and return hello world as the value. So here's this simple example in English. And then what we can do is we can add additional locales in our resource files. So here's this locale of Japanese, you know, konnichiwa. And best practice is to take each of these different resources as distinct locales and store them as different files in your list of files. And there's lots of different ways to do that. There's a lot of really great open source tools that are available. Here's a list of some that I recommend. IATNext, GlobalizedJS, PolyGotJS, and Localize. There's a lot more that are excellent systems to basically wrap all of your strings in an object for translation. So I'm not going to go into the deep guts of these individual implementations, but instead I want to focus on more solutions that are tool agnostic and talk about sort of big high level tips that you can use across all of these different tools. So the next tip, tip number three, is to test in different languages and different character sets. So a great solution to testing in different languages and different character sets is to use the pile of poo test. So this is from Mateus Bainards and he says if you include a pile of poo emoji because of, you know, the underlying Unicode encoding in JavaScript, these emojis will test for how you're handling complex Unicode characters. Or you can also include this long string of internationalization piles of poo emojis to ensure encoding is working correctly end to end in your JavaScript app. And I highly recommend testing as many complex characters as possible. So tip number four is to wrap all of your numbers, especially dates, times, and currencies in an object for internationalization. So a wonderful solution that comes out of the box with JavaScript is the JSINTL object. That is an API that has a lot of really cool stuff in here that's, you know, with JavaScript for free. So just a few examples of what's available in the INTL object for dates and times. Here we have the number format where we can input a locale and a long integer and then display it correctly based on that locale setting. So here we can see that long integer in English with a comma every third integer, German with a period, Russian with a space, and then in Tamil and Hindi, the comma is included in a different location within that long integer because they have different number ordering systems, which I think is pretty cool. And then also, you know, the age old question, what day is it today? Depends on where you're asking. Sometimes this can be January 2nd. Sometimes this can be February 1st. Here's a fun visualization of how dates change depending on where you're displaying them. And here's an example of using the JavaScript INTL object with the date time format where we can format a range of dates. So if we input a start date and an end date, this is how that will look across locales. And I think it's really exciting to see all those locales specific symbols. Also, Unicode has ISO to help you handle dates, and that's another wonderful open source project. So my tip number five is to build flexible layouts to accommodate translated text. So different languages obviously have different lengths of text. The Adobe Spectrum design system recommends using these rules. So, you know, depending on the length of your English text to allow, you know, from 300% to 30% expansion of text and to test in these different longer and shorter lengths of text is really important. And then also remember that things like italics and bold and line height sizing can really change in different languages and to consider user readability in, you know, complex Japanese or Chinese or Korean type characters here. So those are my top five big tips to keep in mind. But there's also a lot of things that are really important in internationalization that I just don't have time to go over today. So some things just to keep in mind also are to avoid concatenating strings because the order of words will change in different languages. And then also to be sure to handle sorting, filtering, and searching in non-English characters. And then be sure to use consistent locale code handling. So our recommendation is to use BCP 47 to define your locales and make sure that wherever the locale is defined is consistent across the full app. And then remember that keyboards and keyboard shortcuts can vary in different regions. And so, you know, be sure to keep that in mind and remember if you have a keyboard shortcut key to check the different regional layouts of keyboards. And then, of course, you know, left to right is changes the layout of your full UI for users who are reading left to right. And so if you are supporting those users, make sure to keep that in mind. On each of my slides, I have links to further documentation. So please feel free to go in and read that and learn more about how to handle all of these complicated cases. I just unfortunately don't have time to talk about all the amazing things I'd love to talk about today. But let's move on to cultural considerations to keep in mind in the big picture of internationalization. So, you know, it is completely if a user is searching for an image to represent a holiday, all of these different images are valid, you know, true, wonderful holidays across the world. And acknowledging different cultures is so important. Also, food, you know, if a user is searching for delicious food, I think we would all agree that having a variety, you know, spice and sweet and savory, having all of these different choices is really great and important. And similar to having a variety of delicious food choices, having a variety of languages is culturally useful and important. So this map represents, each dot represents a language that is currently at risk. So by enabling different languages and having different language users really be able to navigate our apps in their native mother tongues, we are truly innovating and building for the full world of humans across earth. So please join me, avoid your assumptions, broaden your mind and go out there and put the world in the World Wide Web. Thank you so much.
21 min
20 Jun, 2022

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career