JavaScript conferences

JSNation Live 2020

JSNation Live 2020

English version

Emoji Encoding, � Unicode, & Internationalization

Naomi Meyer

Why does '👩🏿‍🎤'.length = 7? Is JavaScript UTF-8 or UTF-16? What happens under the hood when you set ? Have you ever wondered how emoji and complex scripting languages are encoded to work correctly across browsers and devices - for billions of people around the world? Or how new emoji are introduced and approved? Have you ever seen one of these: □ � “special” glyph characters before and want more information on why they might appear and how to avoid them in the future? Let’s talk about Unicode encoding in JavaScript and across the world wide web! We’ll go over best practices, common pitfalls, and provide resources to learn more - even where to go if you want to submit a new emoji proposal! :)

FAQ

UTF-8 is commonly used in web development for encoding web pages. It ensures that text appears correctly across different platforms and devices, handling various character sets efficiently.

Unicode assigns a unique code point to every character, no matter the platform, program, or language, ensuring consistent representation across different systems. It supports over a million code points which cover a comprehensive range of characters and symbols.

The meta charset UTF-8 line in HTML specifies that the character encoding for the webpage is UTF-8. This is crucial for correctly displaying text that includes special characters and symbols from multiple languages.

In JavaScript, emojis might be encoded as multiple code units, especially those outside the Basic Multilingual Plane (BMP). This can make the length of a string containing emojis appear longer than the number of visible characters.

Developers can use JavaScript's string normalization features, like the normalize() method, to handle emoji encoding correctly. This method adjusts the string's encoding so that each emoji is consistently counted as a single character.

ASCII was initially designed to encode 128 English characters, which was insufficient for languages with diacritics and other symbols. This limitation led to the development of extended ASCII and eventually Unicode for broader language support.

The zero-width joiner (ZWJ) is used in emoji sequences to combine multiple emojis into a single glyph. For instance, family emojis are created by combining individual human emojis with a ZWJ, resulting in a single, composite representation.

Surrogate pairs are used in UTF-16 encoding to represent characters outside the Basic Multilingual Plane. These characters require more than 16 bits and are encoded using two 16-bit code units.

UTF-8 is recommended for the web due to its efficiency in encoding a vast range of characters using 1 to 4 bytes, compatibility across different systems, and its ability to handle any Unicode character, making it ideal for international environments.

internationalization

Naomi Meyer

34 min

18 Jun, 2021

Comments

Sign in or register to post your comment.

Video Summary and Transcription

This Talk explores the UTF-8 encoding and its relationship with emojis. It discusses the history of encoding, the birth of Unicode, and the importance of considering global usage when building software products. The Talk also covers JavaScript's encoding issues with Unicode and the use of the string.prototype.normalize method. It highlights the addition of emoji support in Unicode, the variation and proposal process for emojis, and the importance of transparency in emoji encoding. The Talk concludes with the significance of diverse emojis, the recommendation of UTF-8 for web development, and the need to understand encoding and decoding in app architecture.

Available in Español: Codificación de emojis, Unicode e internacionalización

1. Introduction to UTF-8 Encoding and Emojis

Short description:

I'm Naomi Meyer, a software development engineer at Adobe, and today I'll be talking about the UTF-8 encoding and how it relates to emojis. We'll also touch on the Unicode Consortium, the history of encoding, and the importance of considering global usage when building software products. Let's start by understanding how characters are interpreted by computers, and then delve into the first encodings, specifically ASCII.

Hi, thanks for that great introduction. I'm Naomi Meyer, and I work as a software development engineer at Adobe, where I do localization and internationalization engineering for creative products like Adobe Fonts and Adobe Portfolio. So here's where you can find me online, and if there's anything that you feel passionate about or have strong opinions on, please let me know. I'd love to continue this conversation online.

So I'm sure most of us have seen this many times in the head tag of our HTML markup, and we all know to add this meta charset UTF-8 line. But I've been fascinated lately about the underlying details of what this UTF-8 thing really means and does. So I'm excited to share some more details about what it is and why I think it's so cool. Also, this connects with this JavaScript funniness that we can see here with these seemingly character emojis, when as strings in JavaScript have a length a lot longer than expected. And feel free to try this yourself in your dev tools. I know I wanted to test it out first when I first saw something like this. And part of my goal today is to talk about why this is, why you know familyEmoji.length is equal to 11 why that's true. And provide some more details about the underlying encoding happening here and what we can do to handle it correctly.

So speaking of emojis, I personally find them both delightful and intriguing from an engineering perspective, a linguistic perspective, a creative design perspective, a cultural sociological international perspective, and so much more. Our agenda for today is to start with a bit of encoding history to understand more of where we came from. Then we'll get into the Unicode Consortium and the UTF-8 algorithm. Then we'll talk about how this allows us to encode emojis and different languages across platforms, devices and operating systems. Overall, I think it's so important for us to keep these big ideas in mind when we build software products that are being used globally. This is kind of a timeline, a broad timeline, of what we'll touch on today. We've got a lot to cover in these 20 minutes. Let's jump right in with encoding to start.

Of course, when we're on our computers and we type an emoji, a letter, a character in any language, these are ultimately interpreted by the machine as zeros and ones. Let's get into a bit about how that works. In order to understand how it's working today, let's go back in the past to the 1960s of the first encodings. Back in the 60s, there were these big computers that filled a whole room. This is a picture of one from NASA. Engineers back then came up with a system. That system is called ASCII, the American Standard Code for Information Interchange. This image is from the first version that was published in 1963. ASCII was developed from telegraph code. It was originally built for more convenient sorting of lists, alphabetically by ascending, descending characters.

2. ASCII and the Birth of Unicode

Short description:

In 1963, ASCII was encoded in 128 English-only characters into 7-bit integers. Over time, bugs arose due to the inclusion of non-English characters with diacritics and accents. The internet exacerbated the problem of conflicting encodings, leading to errors and question marks. In 1991, Unicode version 1 was introduced as a universal encoding standard to address this issue. Unicode Today's mission is to enable everyone to use their own language on devices. Unicode version 1.0, published in October 1991, had 7,161 characters. Understanding Unicode requires a shift in thinking about abstract characters and code points.

In 1963, ASCII was encoded in 128 English-only characters into 7-bit integers. So, I think ASCII is pretty cool because it makes sense on how it was built.

So first, we take a character, like the letter A, and we assign an ASCII decimal number, 65, which in binary is equal to 1, five zeros, one, in the original 7-bit system. And then after 65 goes 66, which is B, and we continue alphabetically, all the way to Z, which is ASCII decimal 90. Then to go from uppercase to lowercase characters, we only change one bit, 32 letter, 32 later, which is ASCII decimal number 97. And that continues alphabetically. So I think it's a cool system. And shout out to Tom Scott on this video that he has to explain it really clearly.

So ASCII makes sense, but then, and it kind of worked in America in the 60s. But over time, there were lots of bugs. And those bugs came because non-English characters, like those pictured here, include diacritics and additional accents that were added. So ASCII was originally worked with seven bits. But then computers moved to eight bits, and we went from 128 characters to 256 characters. And different countries and different language systems added more characters with those extra 128. And different languages like Japanese, for example, did its entire own thing. They had a separate multi-byte encoding system, you know, and Japanese, Russian, all these languages had a different encoding system. And that was fine when they worked independently. But then came the internet. And with the worldwide web, the internet kind of broke computers because there was this problem with no universal encoding system. When two different non-compatible encoding systems encountered one another, we got these types of errors like we see here with a lot of question marks and a lot of bugs.

So, in 1991, with the worldwide web, we got Unicode version 1. And Unicode was designed to be a universal encoding standard to solve this problem of conflicting character encodings. So, Unicode Today is a non-profit whose mission is that everyone in the world should be able to use their own language on phones and computers. Unicode version 1.0 was published in October 1991 and had 7,161 characters. To understand and think about Unicode, you kind of have to make a mental shift of your assumptions about language and characters. So, there's three kind of big ideas to think about. Props to Dimitri Pavloutin who has this great article I recommend called What Every Javascript Developer Should Know About Unicode. So, the first idea to keep in mind is abstract characters. So, instead of thinking about letters in an alphabet, it's good to think about abstract characters and Unicode deals with characters in these abstract terms. Second, we have code points.

3. Understanding Code Points and Unicode Planes

Short description:

A code point is a number assigned to a single character. Unicode character sets map each abstract character in the world to a unique number. Unicode divides over a million code points into 17 planes or groupings. The first plane, plane 0, is the basic multilingual plane, also known as the BMP. The BMP is four hexadecimal digits. Outside the BMP is the astral plane, or the supplementary planes. Computers translate from Unicode code points into physical bits using a character encoding translation, which takes the transformation from code point into physical bits. Unicode has this popular open source character encoding translation algorithm called the Unicode Transformation Format or UTF, which does this job for us.

A code point is a number assigned to a single character. So, this is an example of a code point where we have U plus 0041, an atomic unit of information. So, in a code point, we have 0041, which is a hexadecimal number, and then we have the prefix of U plus, where U equals Unicode, because each code point number is given a meaning by the Unicode standard, and Unicode character sets map each abstract character in the world to a unique number.

So, U plus 0041, we look it up in Unicode, we get Latin capital letter A. Currently, the Unicode standard defines over a million code points. And they all have a one-to-one mapping, so that ensures that there's no collision between alphabets of different languages. The third point to keep in mind is a plane. So, basically, long story short, Unicode divides over a million code points into 17 planes or groupings. These planes are represented here.

So, the first plane, plane 0, is the basic multilingual plane, also known as the BMP. And that's the unification of all the prior character sets. So, that includes ASCII, and Chinese, Japanese, and Korean characters. And this is what the BMP looks like. And I think it's fascinating to see the breakdown of different scripts included. You know, you can see East Asian scripts and the South, the Chinese, Japanese, Korean characters include a lot of additional code points. So, the BMP is four hexadecimal digits. And then, outside the BMP, BMP plane one is five hexadecimal digits. And plane 16 is six hexadecimal digits. And outside of the BMP is the astral plane, or the supplementary planes.

So, how does this relate to our head tag with UTF-8? That we've all, you know, are used to seeing. So, we know about code points. Abstract characters, like U plus 0041 is A. And we know about code units, or physical bits, because, you know, computers, at the memory level, don't use code points or abstract characters. They need a physical way to represent Unicode code points. So, computers translate from Unicode code points into physical bits using a character encoding translation, which takes the transformation from code point into physical bits. And Unicode has this popular open source character encoding translation algorithm called the Unicode Transformation Format or UTF, which does this job for us. And popular encodings of UTF are UTF-8, UTF-16, and UTF-32. So, UTF is really cool. It's reversible, so the conversions between all of them are algorithmically based. That's hard to say.

4. JavaScript and Unicode

Short description:

They're fast and support lossless round-tripping. JavaScript engine encoding can have strange negative effects when working with Unicode. Understanding how strings are encoded reduces bugs. JavaScript can be strange with strings, as seen with combining marks and accents. The string.prototype.normalize method is a great tool to avoid localization bugs and build internationalized apps.

They're fast and support lossless round-tripping. UTF-8 is most common on the web. UTF-16 is used by Java and Windows. And both UTF-8 and UTF-32 are used by Linux and various Unix systems.

But JavaScript is weird. So, there's these two great articles I highly recommend from Matias Banards that go more into depth of JavaScript and Unicode. But basically, long story short, the ECMAScript engine itself exposes characters according to UCS2, not UTF-16. So, JavaScript engine encoding can have these strange negative effects when JavaScript works with Unicode.

And this is really important to keep in mind because, you know, as developers, we do a lot of string manipulation. We do sorting, filtering, searching, lookups. A lot of business logic involved with strings. So, it's great to understand how these strings are encoded to reduce bugs. This side looks weird. Let's move to the next one.

So, something that's a good example of how JavaScript is kind of strange with strings is the example of a combining mark. So, in Danish, you have the letter A with a little circle over it that is, in Unicode, two separate characters. Where the circle over it is a combining mark that is designed to modify the preceding character. So, if we console log these two Unicode code points, we get the combining mark A. This is one example with a combining mark.

Another example we have here is cafe with the accented E. And this one is different because if we define, you know, this variable as drink with cafe with the E, we can see that drink.length, while it looks like it should be four characters, actually comes out in JavaScript as five characters. And then if we split this string into an array, we can see that the final element of the array is that final accent. And so a lot of confusion can arise when we kind of ignore the concept of code unit sequences. So this is really important to get to keep in mind to avoid string bugs like we see here in cafe with an accent.

So there's a great solution that we can use and that's string.prototype.normalize, which is a method that returns the Unicode normalization form of the string. And there's the link to the MDN docs about it. I think it's fascinating. So if we take that same example with cafe with an accent and we normalize it, we see that the length is now four. And if we split it into an array, we don't get that accent as the final element. So this is a great tool to avoid localization bugs and to really think deeply about how we're building our apps so that they can be internationalized.

5. Emoji Support and Character Length

Short description:

In 2010, Unicode added support for emoji. Emoji characters are outside of the BMP, requiring two code units in UTF-16. This can lead to unexpected character lengths when splitting emoji strings. The zero-width joiner is used in emoji sequences to attach different code points together.

So remember that timeline. There's another important element in the timeline, and that's in 2010 on Unicode version 6.0, Unicode added support for emoji.

So let's move on to talk about emoji. So remember the Unicode planes. Outside of the BMP is the astral plane, and the astral plane or the supplementary plane is greater than four hexadecimal characters.

So because all emoji are outside of the BMP, they're in the astral plane, all emoji are five hexadecimal digits which require 21 bits to save in memory. But UTF-16 requires two code units of 16 bits in order to handle that. So with this example of the smiley face, which is the Unicode code point of U plus one f600, we can see that we require two code units. So they're split into surrogate pairs where we have a high surrogate code unit and a low surrogate code unit. And this is the math to get the surrogate pair where we input the astral code point of the smiley face and we return the two surrogate pairs of high and low. So because of this, if we take smiley face and split it we can see that there's two characters there that are garbled. And smiley face as a string.length is two, which is not what you would think. So, and this is true also for longer emoji.

Let me just quickly share a fun example. If we open up the console and you know you can do this in your browser if you want. Let me close all these warnings. So, if we do simple family, right? This is family. And on Mac you can do control command space to open up your emoji. We can see that family.length is 11. And if we try family.split, there's those 11 characters split into an array. And then if we try a different one, let's do like singer with darkskin.length, we see seven. And if we do singer.split, again, we see seven. And you can try this with lots of emoji, and the lengths are surprising, which is fun. So let me pull up that slide again. So the question is, why are these seven? Why are they 11? And the answer is multifaceted. But one of the… Let me close the inspector here. There we go. Okay. So one of them is the zero-width joiner. So the zero-width joiner is not an emoji itself, but it's used in emoji sequences as glue to attach different emoji code points together and create compound emoji sequences.

6. Emoji Variation and Proposal Process

Short description:

The variation selector stylizes plain text characters with colorful emoji representation. Skin tone modifiers were added in Unicode 8.0. Emoji have variation across platforms, like fonts. The placement of cheese in the hamburger emoji caused controversy. Legal experts discuss the permissibility of emoji as evidence. Anyone can propose new emoji to Unicode.

Another important emoji to keep in mind is the variation selector, which takes a plain text representation of emoji like this heart that we see here and stylizes it to display plain text characters with their colorful emoji representation. So we have a styled heart, but when we take the heart emoji plus the variation selector, which is U plus F E O F to create the styled heart.

Another emoji variant is the emoji modifier. And this is the skin tone modifier, and it's based on Fitzpatrick scale, which is actually used in dermatology about UV light exposure differences. So it's not really a race or ethnicity scale here. It's a dermatology scale. And this skin tone modifier was added in Unicode 8.0. So if we again, open up that singer example, we can see that the combination of the singer is we take woman emoji, which again, with the high and low surrogate pairs is a length of two plus skin tone modifier emoji, plus zero width joiner emoji, plus microphone emoji. We get the total combination of singer emoji equal to all of those combined, which I just think is so cool and awesome about emoji.

So let's end, you know, with emoji today. So emoji are like fonts and they have variation across platforms. Here we see in the browser in Apple, Google, Facebook, all these different emoji and how they can be represented differently just like fonts in the browser. And again with the hugging face, party face and face the tears of joy emoji, which is one of my favorites. And then I think this is funny because the cell phone emoji or the mobile phone emoji looks like the phone of the provider across different fonts. And the hamburger emoji, this was a big internet controversy because the placement of the cheese varied whether it was on the meat or on the bun. And people were very upset about it. And then the gun emoji, this one actually has sparked a lot of discussion among legal experts on whether or not emoji can be permissible as evidence in a court trial. And the different providers actually changed it from looking like sort of a realistic handgun to instead looking like a toy water gun to try and avoid this controversy.

So with that in mind, I think it's really important to know that anyone can propose a new emoji for the next version of Unicode. You can go on this link and it is a process but it's possible. So for example, these four emoji that we see here were all sort of externally proposed to Unicode. So the blood drop emoji we see here, it's featured in the documentary Beyond the Emoji, which I recommend. And the goal of this one was to break the stigma of menstruation. The woman with the hijab headscarf emoji, there was someone who said, hey, I wear a headscarf. I don't have any emojis that look like me. I want to get one into Unicode. And they made that happen. And Jennifer A. Lee, who's a really incredible emoji leader, she is from Chinese background and she wanted a dumpling that was part of her culture. So she worked hard to get a dumpling included in emoji.

7. Importance of Transparency in Emoji Encoding

Short description:

The Unicode conference highlighted the importance of transparency in the emoji encoding process. It's not just about icons for lunch, but about cultural representation in our digital language. Understanding Unicode and proposing new emojis can help shape the future of digital communication.

So I've been to the Unicode conference, and while they're doing great work from an engineering the encoding is fascinating. The Unicode consortium, the people who are determining emojis and making changes for future versions of Unicode, it's pretty male and pale. So I think it's really important to have more transparency about this process so that we can get more emoji out there and encode them into bits because it's not just a matter of having kind of the right icon to describe what you ate for lunch. It's about digital acknowledgment of culture, about who gets to be represented in our future digital language. How do those representations take into account different things like ethnicity, religion, and more? I highly encourage people to learn more about Unicode, learn more about how they can propose a new emoji, and to get involved in the guts of the encoding process.

Q&A about Emoji Customization

Short description:

That's all I've got. Thank you so much. We do have quite a number of questions from the audience. The first question is from Nix, who wants to know if you can replace elements of compound emojis to produce something different. Another question is from Rin and Amir, asking about your favorite emoji and why.

That's all I've got. Thank you so much.

Awesome, thanks so much. Hey, how's it going? Hi, doing well. Good, good, good. As I was saying to some of my co-chefs what an informative presentation that was. I'm looking forward to having a chat with my friends tomorrow and randomly bringing up a question as to why the family emoji is 11 characters long. You're just going to be like, oh Chris, you're so smart. I'm like, yep, I am. Nice.

But yeah, we do have quite a number of questions from the audience that I'll get into. So first, the first question is from Nix. Nix wants to know, so because some emojis is like a compound, can you replace elements of it to produce something different? Oh, that's a really fun question. You know what? I would love to try that. I don't see why not. It makes sense. Explore, give it a try. I think it would have to be a valid, it would need a design in the font face that you've selected. So if you're on a Mac, you know, there would need to be an Apple design for that emoji. And if there is, then you can probably change them using the different Unicode characters if you want. Instead of using the emoji picker. Cool.

And then this question was from Rin and Amir. They wanted to know what is your favorite emoji and why? I love emoji because there's so many different ways that you can use them. And I often will use a wide variety. And I try and use the least, the less popular emojis. But I think my all time favorite is probably the the face with tears, the joy face with tears, just because it's happy and fun. Do you have one, Chris? Oh man, mine is usually, the best one is just this. You know, the sign just saying like, OK, that's mine. That's mine.

Emojis, Diversity, and Encoding

Short description:

Twitter counts each individual emoji as two characters. Emojis should represent the actual people using them, with different gender and skin tone modifiers. Learning about Unicode and the consortium behind it is useful to add more diverse emojis. UTF-8 is recommended for the web, while conflicts may arise with different encodings.

Because pretty much it's always a long thread and I just can't get through it. And I'm just like, all right, cool. Yeah, right. Get right back at you. Yeah, I like the thumbs up one too. Exactly. Exactly.

People understand, like, I'm not busy, I'm just too lazy to respond.

We have a question from Cynthia and the question is, so if someone is building a Twitter like application and then we were limiting the characters in the message using string length to calculate character amount, calculate character count, how could you make sure that an emoji counts as one character instead of sometimes seven? Yes, so this is really interesting. I was thinking about doing a Twitter demo because Twitter counts each individual emoji as two characters. So you can try it out yourself, you know, in Twitter you can have a string of all As and then if you add an emoji at the end, you'll see that one emoji is two characters. So that's something to keep in mind. And another thing I find fascinating about Twitter emojis is that certain hashtags, you know, include an emoji. So for example, I've been using that, you know, hashtag Black Lives Matter a lot lately. And at the end, Twitter includes the sort of fist emoji. And so I don't know the exact number of characters, but you'd think it would just be the ABC characters and not the emoji when you're counting for character count. But the emoji is included as two characters in the hashtags that have emojis, if that makes sense.

And then a question I also had as well, because I was quite fascinated by your talk, is I wanted to find out what role a diversity of culture and traditions play in the design of emojis. A big role. That's something that I think is really important with emoji, is to try and keep them representative of the actual world of people who are using them. You know, if these little tiny pictures that exist on every phone, every computer, all across the world, all across different languages, I think to be more accessible, they should represent the actual people using them. And that's why, you know, using different gender modifiers or different skin tone modifiers is really important to include and to handle that in coding, because then we can, you know, make emojis as diverse as the people using the emojis. And I think it's really useful for everyone to learn more about Unicode and the sort of quiet consortium that's working behind the scenes to add new emoji, because things like, you know, the women with the hijab or the rickshaw or the dumpling emoji, these are representative of different cultures that maybe the original Unicode consortium didn't include, and I think it's really important to get more of those out there. For sure, for sure.

And then I have a question from Drum. And Drum wants to know, so can we use UTF-16 or UTF-32 in the web? Yeah, that's a good question. So it's recommended to use UTF-8 online, on the web. And it's important to remember that the web is encoding in UTF-8. So if you have a database or a backend process that's using UTF-32 or might be using ASCII or a different encoding altogether, to keep that in mind. Because when a string gets encoded and decoded, there could be conflicts if they're different systems.

UTF-8 Encoding and Emojis Conclusion

Short description:

The benefit of UTF-8 is its compatibility with different systems, but JavaScript's UCS-2 encoding can cause issues. Understanding encoding and decoding in your app architecture is crucial for bug-free development. Emojis are widely used and represent individuals accurately, making this talk relevant and engaging. Naomi will continue in her own Zoom room due to the overwhelming number of questions. One final food-related question asks about her comfort food choice, which is matzo ball soup, a potential candidate for a new emoji. Thank you, Naomi, for the informative presentation.

So the benefit of UTF-8, UTF, is that it works if you're using UTF-32 or 16 or 8. But if you're using a different system, like we talked about, JavaScript under the hood has some UCS-2, that's where the problem arises. So always know at each point in your full app architecture what the encoding and decoding is so that you can remove any of those bugs.

Yeah, for sure. And I feel like this talk, just based off of, one, the fact that emojis are something that we all utilize, and two, just the fact that when they all represent us as individuals quite accurately, a lot of them as well. I feel like this talk is one that could go on and on and on and on, even beyond.

But, guys, Naomi is not going, she's simply going to be moved over to her own Zoom room. So, there are so many questions still piling in that we just don't have time to carry on, to finish with here. But, yeah, I guess for myself I just had one final question and I promise this would be, this is only food-related because we are doing a pseudo chef show but I guess myself and the other co-chefs wanted to find out what is your comfort food choice.

That's a great question, and my family is Jewish and we have matzo ball soup, which is like the Jewish chicken noodle soup, and that's definitely my comfort food, and I was thinking that could actually be another emoji, a matzo ball soup emoji. There you go, there you go. You know, ideas matter, ideas matter, now you know, now you know. But Naomi, thank you very much for an informative presentation. You've definitely, you're definitely going to make me, you know, I'm going to be much smarter in front of my friends because of it, for at least the next week or so, because I'm just going to be relaying everything you just mentioned, and then some. And yeah, please, she will be in, she'll be going to her Zoom room, so feel free to keep the question coming, you know, peeps. And yeah, Naomi, till we speak again. It's been a pleasure. Great, thank you so much, Chris. Thank you.

Check out more articles and videos

We constantly think of articles and videos that might spark Git people interest / skill us up or help building a stellar career

How do Localise and Personalize Content with Sanity.io and Next.js

React Advanced Conference 2021

8 min

How do Localise and Personalize Content with Sanity.io and Next.js

Simeon Griggs

Structuring your content with Sanity.io means you can query content based on signals from your visitors, such as their location. Personalisation is a tricky problem with static sites and the jamstack, this demo will show you how it can be done with Sanity.io, Next.js, and Vercel.

jamstack next.js internationalization

End-to-end i18n

React Advanced Conference 2021

26 min

End-to-end i18n

Luke Ehresman

There are some great libraries that help with i18n in React, but surprisingly little guidance on how to handle i18n end-to-end. This talk will discuss best practices and practical advice for handling translations in React. We will discuss how to extract strings from your code, how to manage translation files, how to think about long-term maintenance of your translations, and even avoiding common pitfalls where single-language developers tend to get stuck.

react internationalization

Building JS Apps with Internationalization (i18n) in Mind

JSNation 2022

21 min

Building JS Apps with Internationalization (i18n) in Mind

Naomi Meyer

At Adobe we build products for the world, this talk with provide a high level overview of internationalization (i18n), globalization (g11n), and localization (l10n) best practices. Why these are important and how to implement in design, UX, and within any JS codebase - using vanilla JS examples, and top open source library recommendations.

internationalization

Internationalizing React

React Summit Remote Edition 2021

29 min

Internationalizing React

Daria Caraway

Learning 100 different languages is challenging, but architecting your React app to support 100 languages doesn't have to be. As your web application grows to a global audience, multilingual functionality becomes increasingly essential. So, how do you design your code such that it is flexible enough to include all of your international users? In this talk, we will explore what it means and what it looks like to build a React app that supports internationalization (i18n). You will learn several different strategies for locale-proofing your application with React contexts and custom hooks.

react internationalization

Localization for Real-World Use-Cases: Key Learnings from Onboarding Global Brands

React Summit 2022

8 min

Localization for Real-World Use-Cases: Key Learnings from Onboarding Global Brands

Simeon Griggs

i18n isn't easy, but with careful planning of your content model I'll show you how to structure the setup, authoring, and querying of localized content. Covering whole-or-part translated documents, the difference between market and language-specific content, ways to author that in a CMS like Sanity, and ways to query for it on frontends like Next.js and Remix.

case study next.js remix internationalization

i18n Was the Missing Piece: Let 70%+ of the Users in the World to Access Your Apps

JSNation 2023

13 min

i18n Was the Missing Piece: Let 70%+ of the Users in the World to Access Your Apps

Arisa Fukuzaki

Accessibility, better DX, and performance get a lot of attention as it improves better UX significantly. Plus, it gives satisfaction to devs by seeing the significant improvements. But how about internationalization? A fun fact: Over 70% of the users in the world access non-English content. In this talk, I'll show you more surprising facts about internationalization and what are scalable approaches. You'll see examples with libraries for frameworks with a few different logic to implement different internationalization layouts.

internationalization

Workshops on related topic

Localizing Your Remix Website

React Summit 2023

154 min

Localizing Your Remix Website

WorkshopFree

Harshil Agrawal

Harshil Agrawal

Localized content helps you connect with your audience in their preferred language. It not only helps you grow your business but helps your audience understand your offerings better. In this workshop, you will get an introduction to localization and will learn how to implement localization to your Contentful-powered Remix website.
Table of contents:- Introduction to Localization- Introduction to Contentful- Localization in Contentful- Introduction to Remix- Setting up a new Remix project- Rendering content on the website- Implementing Localization in Remix Website- Recap- Next Steps

web development remix internationalization headless cms

Follow us

Upcoming events

Korben
Dallasvisa@gitnation.org

Want to have access to all events for 4x less?

JSNation US 2024

November 18 - 21, 2024

React Summit US 2024

November 18 - 22, 2024

React Advanced Conference 2024

October 25 - 28, 2024

Productivity Conference 2024

November 7 - 8, 2024

React Day Berlin 2024

December 13 - 16, 2024

Node Congress 2025

February, 2025

JSNation 2025

June, 2025

React Summit 2025

June, 2025

C3 Dev Festival 2025

June, 2025

TechLead Conference 2025

June, 2025

React Advanced Conference 2025

October, 2025

JSNation US 2025

November, 2025

React Summit US 2025

November, 2025

TestJS Summit 2025

November, 2025

React Day Berlin 2025

December, 2025