jealous markup
Software and language, mostly

Of emoji, Hanzi and alchemy

January 28, 2015 Software Chinese

I hate emoji. You know, those colorful icons: smiley faces, sad faces, thumbs up and their countless friends that we use all the time in short text messages and email. It’s not that I’m a purist that thinks emoji spell the end of literacy, particularly among those horrible teenagers. No, quite the contrary: I know full well that language and culture have been in constant decline since at least the time of the ancient Greek; I don’t buy into this kind of stuff. I actually like icons in text.

It’s about typography. I hate emoji because in almost every single messaging app and word processor, they aggressively replace my painstakingly typed ASCII emoticons and ruin the aesthetics of the message. I created a couple of examples; for starters, here’s WhatsApp’s emoji crime, taken straight from the Windows Phone app. Check out that uneven line spacing because of the smiley face:


You could say, and you might be justified, that this is only my exaggerated sensitivity and I should seek counsel elsewhere, but read on.

Chinese/English hybrid text

Any Chinese-English dictionary will have a mix of English text and Chinese Hanzi in the “body” of entries, i.e., in the part that provides translations, paraphrases and other meta-information about each headword. There’s a post dedicated entirely to the presentation of CEDICT’s seemingly flat, structure-less entries, but let’s take a quick glance at a fairly typical example here:


That might seem easy enough, but in reality just about all the odds are against you if you want to make this entry look good. For starters, Latin text is aligned to the baseline, a horizontal line you could draw under a line of text that would just touch the bottom of each character. Chinese characters are built differently; they don’t have a baseline. There is a lot of typographical detail about Latin and Chinese typefaces in this post if you care to know.

What happens if you simply use the built-in text rendering routines and pass them a string of mostly Latin letters, with a few Hanzi thrown in there? This is what happens – you can try it yourself, the screenshot is from Microsoft Word:


Does that look familiar? Yup. It’s the emoji effect. The Latin font in the text is 11pt Calibri; the Chinese font is 11pt Noto Sans Thin. The Hanzi are not really aligned to anything in particular; they’re sort of glued to the Latin font’s baseline, but not precisely. If you switch to a different font – say, MingLiU, the Hanzi slide down a bit below the baseline, and the extra line height is reduced, but not completely eliminated. Remember, this is still 11pt MingLiU; I’m keeping the font size constant for now.


In reality, you need a somewhat larger typeface for Chinese text. An 11pt sans-serif typeface is perfectly fine on screen, but in my experience, you need about 1.3 times the size for readability with Hanzi. (In fact, the Chinese and Japanese editions of Windows XP were shipped with a higher base font size, which effectively meant a higher DPI setting – everything from text to window headers to icons was somewhat larger than on a typical 96 DPI English system. This was something we specifically had to deal with in the software I’m designing in my day job, but curiously, I cannot find any relevant links via Google now.)

But back to hybrid text in Zydeo. With a larger font size for Hanzi, the naïve and easy way of drawing text would give you this:


Reconciliation and magic constants

Let’s take an inventory of the issues you need to solve to render hybrid Chinese/English text so that both the Latin runs and the Hanzi are readable and the whole still looks balanced and pleasant:

There are several “magic constants” in this. I can’t explain scientifically why I feel 1.3 times the point size is right for Hanzi, nor why I think it’s most pleasant to align them against the Latin text just the way I described it. But this is a recurring experience, whether I’m dealing with the finer points of visual design or natural language processing: at the end of the day, the mix typically includes a handful of magic constants that you cannot explain “scientifically” – you just keep experimenting until you are satisfied with the results. If I’m pressed to be honest, then often what us designers and computer geeks are doing is as much alchemy as science.

Particularly, finding out about the characteristics of each Chinese typeface I considered was a matter of trial-and-error, and I constantly had the feeling I was working against whatever system routines I had to rely on, not with them. But that, too, is a familiar feeling to anyone who’s been creating software for Windows.

One good word for Skype

I can’t refrain from returning to emoji at the end of this post. I wrote earlier how practically all messaging apps out there frustrate me with their disregard for typographic balance when they throw in icons in the middle of running text. Skype was no exception, until the very last update this January. Then, much to my surprise, I discovered they did something unexpected and dealt exactly with this peeve of mine. The screenshot below speaks volumes.