Building JS Apps with Internationalization (i18n) in Mind

Rate this content
Bookmark

At Adobe we build products for the world, this talk with provide a high level overview of internationalization (i18n), globalization (g11n), and localization (l10n) best practices. Why these are important and how to implement in design, UX, and within any JS codebase - using vanilla JS examples, and top open source library recommendations.

21 min
20 Jun, 2022

Video Summary and Transcription

This Talk discusses building JavaScript apps with internationalization in mind, addressing issues such as handling different name formats, using Unicode for compatibility, character encoding bugs, localization and translation solutions, testing in different languages, accommodating translated text in layouts, cultural considerations, and the importance of enabling different languages for users. The speaker also mentions various open source tools for internationalization. The Talk concludes with a reminder to avoid assumptions and embrace diversity in the World Wide Web.

Available in Español

1. Introduction to Internationalization

Short description:

Hello, JS Nation! Thanks for joining me today! I'm really excited to share with you all some ideas about building JavaScript apps with internationalization in mind. My name is Naomi Meier, and I work on the globalization engineering team at Adobe where I do internationalization engineering for a lot of different Adobe apps. Let's start with a name example. So this is a very common string that we see often in English, where folks introduce themselves with the syntax of, Hello, my name is Naomi Meyer. So this is my first name or given name in blue, followed by my last name, family name or surname. So if we take this simple string and translate it here into Japanese, Hindi, Hebrew, Arabic, Korean and Chinese, we can see that that syntax of first name, last name is sometimes switched. Sometimes it's last name, first name and obviously sometimes the text is read left to right. Sometimes the text is read right to left. And so this is kind of just a really visual representation of how we can identify a user's name differently across different locales. This is how Google handles that problem with when you create a Google account, you enter a first name and a last name as distinct data fields. But Twitter recently came out with a great solution to this problem where they have just a simple name field where a user can go in and enter their name and their native script and their native first name, last name, last name, first name kind of syntax. And it will be stored as one distinct field. So I think this is a really cool solution to the username internationalization problem.

Hello, JS Nation! Thanks for joining me today! I'm really excited to share with you all some ideas about building JavaScript apps with internationalization in mind. My name is Naomi Meier, and I work on the globalization engineering team at Adobe where I do internationalization engineering for a lot of different Adobe apps.

So, this is where you can find me online, on my Twitter and my website. And if there's anything that you're feeling passionately about, please let me know. I would love to continue this conversation online. So, here's our agenda for today. I'm going to start with a name example and then move into some definitions of localization, internationalization, globalization, just to kind of level set and make sure that we're all on the same page. And then I'll move into my top five tips to avoid the most common mistakes that we find with internationalizing JavaScript. And then end on culturalization. So overall, the aim of this kind of general presentation is to encourage you all to create experiences that are equally usable, relevant, and meaningful for users all across the globe, and to really amplify the voices of our global users. So I would like to invite you all and encourage you, you know, my fellow JS coders to go out there and really put the world in the world wide web. So let's talk about how we can do that.

Let's start with a name example. So this is a, you know, a very common string that we see often in English, where folks introduce themselves with the syntax of, Hello, my name is Naomi Meyer. So this is my first name or given name in blue, followed by my last name, family name or surname. So if we take this simple string and translate it here into Japanese, Hindi, Hebrew, Arabic, Korean and Chinese, we can see that that syntax of first name, last name is sometimes switched. Sometimes it's last name, first name and obviously sometimes the text is read left to right. Sometimes the text is read right to left. And so this is kind of just a really visual representation of how we can identify a user's name differently across different locales. And this is kind of a high level problem that we're trying to solve with internationalization. My name is kind of a simple American English name. Here's an example of some other names from Brazil and Portuguese, Russia and India. Kind of common names and how they don't necessarily fit easily into this simple first name, last name paradigm. And some of the challenges that we're facing here. So this is how Google handles that problem with when you create a Google account, you enter a first name and a last name as distinct data fields. So users from regions and languages that don't necessarily follow that paradigm are going to have troubles. But Twitter recently came out with a great solution to this problem where they have just a simple name field where a user can go in and enter their name and their native script and their native first name, last name, last name, first name kind of syntax. And it will be stored as one distinct field. So I think this is a really cool solution to the username internationalization problem. So now that our heads are kind of thinking more deeply about internationalization, let's move on to some definitions.

2. Language Granularity and Unicode

Short description:

When it comes to language, there are different levels of granularity: translation, localization, internationalization, and globalization. Culture plays a significant role in how users interact with digital experiences. Expanding digital content into different languages is crucial. Tip number one is to use Unicode everywhere, ensuring compatibility across different systems and programming languages.

So what are we talking about here? When we start at the most granular level, we have translation where, you know, hello becomes ola, konichiwa, bonjour. Then the next sort of level of granularity would be localization. And that's, you know, in English we spell localization with Z in the United States. But if you go over to the United Kingdom, localization is spelt with an S. And so those are both English, but they're different regional dialectic variations. So that's sort of the level of locale that we get into with localization.

The next sort of level up of granularity is internationalization. And this is more on the engineering side, where we wrap the application in tools for internationalization so that they can be shipped in translated forms. So this is where we go into the pipes, where we if, if a software is a house, we'll reach into the pipes, change them out. And create a system that can be easily translated. Then the next level of granularity is globalization, and these kind of all fall under this umbrella of globalization or G11N. And important to note that these are numeric acronyms. So for globalization we take the first character G, followed by the number of characters and then the last character N. And the more I think about these sort of big ideas, culture is deeply rooted in our thinking patterns and it affects how our users interact with and benefit from digital experiences. So internationalization or globalization really go way beyond translation. And by acknowledging cultural characteristics and really celebrating the differences, we're creating with innovation and sort of accessibility and building products for the whole world of users.

So if we look at these two visualizations, we can see that the majority of people on earth do not speak English as their first language. But we can see that the majority of digital content is in English right now. So this is really an opportunity for us to expand digital content online into different languages for users all across the world to use in their native mother tongues. So let's talk about the top five tips for how we can do that. So tip number one is to use Unicode everywhere. So to start, what is Unicode? I'm sure we're all very familiar with seeing this line in our HTML tags where we say meta char set equals UTF 8 for web. So what is UTF? UTF is the Unicode transformation format. Unicode, right? So here's how UTF is represented across 8, 16, and 32 bits, for the character A and the character O, in Japanese. And so UTF 8 is most common on the web. UTF 16 is used by Java and Windows, and 32 are used by Linux and various Unix systems. So UTF is really cool because it's reversible, and so conversions between all are algorithmically based and fast and prevent lossless round-tripping. So we know that many programming languages will directly use one of these UTF encodings. But as JavaScript engineers, which UTF is JavaScript? This is really important when we think about encoding. And whether you're in Reactor, Angular, View, or Spelt, they're all under the hood encoded in the same way.

3. Character Encoding and Internationalization

Short description:

The spec ECMAScript defines how characters are interpreted as UCS 2 or UTF 16. Combining marks can cause bugs in JavaScript. The normalize method can handle combining marks and avoid localization bugs. Wrapping strings in an internationalization object helps with translation.

So if we look at the spec ECMAScript, the standardized version of JavaScript defines how characters should be interpreted as either UCS 2 or UTF 16. So UCS 2 is a two-byte universal character set versus UTF 16, which is a 16-bit Unicode transformation format. So these are two different contrasting systems. So it creates a lot of really weird and interesting JavaScript bugs when dealing with encoding of characters. Because of these contrasting systems, one is always two bytes and the other uses surrogate pairs.

So let's talk about some of those common bugs that we see. So the first one is with combining marks. So combining marks is where we take a Unicode code point for the Latin small letter A, and then we add a combining mark code point. Here we have a combining ring above it. That's how it's for a Danish atomic graph name character that has A with a little circle over it, so the A with a little circle over it are two separate code points that combine to make one. So if we take these two separate characters and console log them, we can see what's returned is the A with a little circle above it. Great, but because these are two separate characters, we can see some problems.

So similar to that A with a circle above it, we have the E with the, the accented E in cafe, you know that we see in Spanish. So if we define that variable is drink, with C-A-F-E with the final combining mark as the accent in cafe, and then we console log that variable drink, we can see cafe is returned. But then if we check drink.length, the length is five. And if we try and split that string into an array, we can see that the final index of the array is that combining mark. So, as I'm sure you can imagine, a lot of confusion will arise when trying to do string manipulation with combining marks, when you forget that these code sequences exist. So how can we handle that? Is using the normalize method that's built into JavaScript. So string.prototype.normalize returns the unicode normalization form of the string. Which is really handy, straight out of JavaScript. So if we take that same variable drink, that's cafe, with the combining mark, and normalize that string, we can now see that the length is returned as four, and the final index of the array when split is e with the accent above it. So we don't get that extra index with the combining mark. So this is really great to keep in mind to avoid localization bugs when doing string manipulation. But it's also really important from the human perspective. So I saw this tweet recently that says, found on a US government website, everyone this is what systemic racism is, it's when folks are excluded. So this person Raquel Velez, her last name is in Spanish and it has an accent over the e, but she can't enter her last name in the last name field of a US government form because of the combining mark in Unicode that is creating and coding issues here. So by fixing simple string and coding issues, we can help solve these problems on the human level.

So let's move on to tip number two, to wrap all strings in an object for translation. So from a really high-level kind of pseudo code example here we can take a hard-coded hello string of hello world and then, so this is a hard-coded string, this is what not to do, but what we can do instead is take this string and wrap it in an internationalization object where we have a list of resources for each of the different locales. So here we can see English has an object with the key of hello message that equals the key value, the value is hello world.

4. Localization and Translation Solutions

Short description:

We can display the variable of hello message as the key and return hello world as the value. We can add additional locales in our resource files, such as Japanese with Konnichiwa. Best practice is to store each resource as distinct locales in different files. There are many open source tools available, such as IAT next, GlobalizedJS, PolyGotJS, and Localize. Let's focus on tool agnostic solutions and high-level tips.

And then we can display that variable of hello message as the key and return hello world as the value. So here's this simple example in English and then what we can do is we can add additional locales in our resource files. So here's this locale of Japanese, you know, Konnichiwa. And best practice is to take each of these different resources as distinct locales and store them as different files in your list of files. And there's lots of different ways to do that. There's a lot of really great open source tools that are available. Here's a list of some that I recommend. IAT next, GlobalizedJS, PolyGotJS and Localize. There's a lot more that are excellent systems to basically wrap all of your strings in an object for translation. So I'm not gonna go into the deep guts of these individual implementations, but instead I wanna focus on more solutions that are tool agnostic and talk about sort of big high level tips that you can use across all of these different tools.

5. Testing, Number Formatting, and Layouts

Short description:

Tip #3: Test in different languages and character sets using the pile of poo test. Tip #4: Wrap numbers, dates, times, and currencies in an object for internationalization. Tip #5: Build flexible layouts to accommodate translated text. Also, consider italics, bold, and line height sizing in different languages. Remember to avoid concatenating strings, handle sorting/filtering/searching in non-English characters, and use consistent locale code handling.

So the next tip, tip number three, is to test in different languages and different character sets. So a great solution to testing in different languages and different character sets is to use the pile of poo test. So this is from Matheus Baynards and he says if you include a pile of poo emoji because of the underlying Unicode encoding in JavaScript, these emojis will test for how you're handling complex Unicode characters or you can also include this long string of internationalization piles of poo emojis to ensure encoding is working correctly end to end in your JavaScript app. And I highly recommend testing as many complex characters as possible.

So tip number four is to wrap all of your numbers especially dates, times and currencies in an object for internationalization. So a wonderful solution that comes out of the box with JavaScript is the JSINTL object. That is an API that has a lot of really cool stuff in here that's with JavaScript for free. So just a few examples of what's available in the INTL object for dates and times. Here we have the number format where we can input a locale and a long integer and then display it correctly based on that locale setting. So here we can see that long integer in English with a comma every third integer, German with the period, Russian with a space. And then in Tamil and Hindi, the comma is included in a different location within that long integer because they have different number ordering systems, which I think is pretty cool. And then also, the age old question, what day is it today? Depends on where you're asking. Sometimes this can be January 2nd, sometimes this can be February 1st. Here's a fun visualization of how dates change depending on where you're displaying them. And here's an example of using the JavaScript INTL object with the datetime format, where we can format a range of dates. So if we input a start date and an end date, this is how that will look across locales. And I think it's really exciting to see all those locales specific symbols. Also, Unicode has ISO to help you handle dates and that's another wonderful open source project.

So my tip number five is to build flexible layouts to accommodate translated text. So different languages obviously have different lengths of text. The Adobe Spectrum Design System recommends using these rules. So, depending on the length of your English texts to allow from 300% to 30% expansion of text and to test in these different longer and shorter lengths of text is really important. And then also remember that things like italics and bold and line height sizing can really change in different languages and to consider user readability in complex Japanese or Chinese or Korean characters here. So those are my top five big tips to keep in mind, but there's also a lot of things that are really important in internationalization that I just don't have time to go over today. So some things just to keep in mind also are to avoid concatenating strings because the order of words will change in different languages. And then also to be sure to handle sorting, filtering, and searching in non-English characters. And then be sure to use consistent locale code handling. So our recommendation is to use BCP 47 to define your locales and make sure that wherever the locale is defined is consistent across the full app. And then remember that keyboards and keyboard shortcuts can vary in different regions. And so, be sure to keep that in mind and remember if you have a keyboard shortcut key to check the different regional layouts of keyboards.

6. Cultural Considerations and Language Variety

Short description:

Left to right changes the layout of your UI for users who read left to right. Cultural considerations are important in internationalization. Having a variety of languages is culturally useful. Enabling different languages and having users navigate in their native tongues is innovative. Avoid assumptions and put the world in the World Wide Web.

And then, of course, left to right changes the layout of your full UI for users who are reading left to right. And so if you are supporting those users, make sure to keep that in mind.

On each of my slides, I have links to further documentation. So please feel free to go in and read that and learn more about how to handle all of these complicated cases. I just unfortunately don't have time to talk about all the amazing things I'd love to talk about today.

But let's move on to cultural considerations to keep in mind in the big picture of internationalization. So, you know, it is completely... If a user is searching for an image to represent a holiday, all of these different images are valid, you know, true, wonderful holidays across the world, and acknowledging different cultures is so important. Also food, you know, if a user is searching for delicious food, I think we would all agree that having a variety, you know, spice and sweet and savory, having all of these different choices is really great and important. And similar to having a variety of delicious food choices, having a variety of languages is culturally useful and important.

So this map represents, each dot represents a language that is currently at risk. So by enabling different languages and having different language users really be able to navigate our apps in their native mother tongues, we are truly innovating and building for the full world of humans across earth. So please join me, avoid your assumptions, broaden your mind and go out there and put the world in the World Wide Web. Thank you so much. Bye-bye. Thank you. Good-bye. Thank you. Good-bye. Bye. Bye. Thank you. Good-bye. Thank you. Bye-bye. Bye-bye. Bye. Thank you. Bye. Thank you.

7. Conclusion

Short description:

Good-bye. Thank you. Bye-bye.

Good-bye. Thank you. Bye-bye. Good-bye. Bye-bye. Bye. Good-bye. Bye-bye. Good-bye. Bye. Bye. Bye-bye.

Thank you. Bye-bye. Good-bye. Good-bye. Good-bye.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

React Advanced Conference 2021React Advanced Conference 2021
8 min
How do Localise and Personalize Content with Sanity.io and Next.js
Structuring your content with Sanity.io means you can query content based on signals from your visitors, such as their location. Personalisation is a tricky problem with static sites and the jamstack, this demo will show you how it can be done with Sanity.io, Next.js, and Vercel.
React Advanced Conference 2021React Advanced Conference 2021
26 min
End-to-end i18n
There are some great libraries that help with i18n in React, but surprisingly little guidance on how to handle i18n end-to-end. This talk will discuss best practices and practical advice for handling translations in React. We will discuss how to extract strings from your code, how to manage translation files, how to think about long-term maintenance of your translations, and even avoiding common pitfalls where single-language developers tend to get stuck.
React Summit Remote Edition 2021React Summit Remote Edition 2021
29 min
Internationalizing React
Learning 100 different languages is challenging, but architecting your React app to support 100 languages doesn't have to be. As your web application grows to a global audience, multilingual functionality becomes increasingly essential. So, how do you design your code such that it is flexible enough to include all of your international users? In this talk, we will explore what it means and what it looks like to build a React app that supports internationalization (i18n). You will learn several different strategies for locale-proofing your application with React contexts and custom hooks.
React Summit 2022React Summit 2022
8 min
Localization for Real-World Use-Cases: Key Learnings from Onboarding Global Brands
i18n isn't easy, but with careful planning of your content model I'll show you how to structure the setup, authoring, and querying of localized content. Covering whole-or-part translated documents, the difference between market and language-specific content, ways to author that in a CMS like Sanity, and ways to query for it on frontends like Next.js and Remix.
JSNation Live 2020JSNation Live 2020
34 min
Emoji Encoding, � Unicode, & Internationalization
Why does '👩🏿‍🎤'.length = 7? Is JavaScript UTF-8 or UTF-16? What happens under the hood when you set ? Have you ever wondered how emoji and complex scripting languages are encoded to work correctly across browsers and devices - for billions of people around the world? Or how new emoji are introduced and approved? Have you ever seen one of these: □ � “special” glyph characters before and want more information on why they might appear and how to avoid them in the future? Let’s talk about Unicode encoding in JavaScript and across the world wide web! We’ll go over best practices, common pitfalls, and provide resources to learn more - even where to go if you want to submit a new emoji proposal! :)
JSNation 2023JSNation 2023
13 min
i18n Was the Missing Piece: Let 70%+ of the Users in the World to Access Your Apps
Accessibility, better DX, and performance get a lot of attention as it improves better UX significantly. Plus, it gives satisfaction to devs by seeing the significant improvements. But how about internationalization? A fun fact: Over 70% of the users in the world access non-English content. In this talk, I'll show you more surprising facts about internationalization and what are scalable approaches. You'll see examples with libraries for frameworks with a few different logic to implement different internationalization layouts.

Workshops on related topic

React Summit 2023React Summit 2023
154 min
Localizing Your Remix Website
WorkshopFree
Localized content helps you connect with your audience in their preferred language. It not only helps you grow your business but helps your audience understand your offerings better. In this workshop, you will get an introduction to localization and will learn how to implement localization to your Contentful-powered Remix website.
Table of contents:- Introduction to Localization- Introduction to Contentful- Localization in Contentful- Introduction to Remix- Setting up a new Remix project- Rendering content on the website- Implementing Localization in Remix Website- Recap- Next Steps