Machine translation and how it deals with Chinese characters

The inability of Google Translate, Bing Translator, Baidu Fanyi, and other translation services to correctly convert jī nián dàjí 鸡年大吉 (“may the / your year of the chicken be greatly auspicious!”) into various languages raises a vital point that we have long wanted to make, and now seems the perfect moment to do so.

This is that you could not expect these translation services to deal with Cantonese, Shanghainese, Taiwanese, etc. (unless they were specifically and separately programmed to do so), therefore we should not expect them to be able to handle variations such as Literary Sinitic* / Classical Chinese** (LS / CC), etc.

Machine translation and how it deals with Chinese characters

These are all different languages, and electronic translation software, just like the human brain, cannot be programmed and trained in such a way that it can simultaneously translate material originating from different languages.

The only exception is when fragments of these other languages have been embedded here and there into Modern Standard Mandarin*** (MSM) and regularized in such a way that they have for all intents and purposes been borrowed as part of MSM vocabulary, e.g., mǎidān 买单 / máidān 埋单 / màidān 卖单 (“pay the bill”) from Cantonese maai4daan1 埋单 (“call for the bill / check”).

Note that, even though MSM mǎidān 买单 / máidān 埋单 / màidān 卖单 (“pay the bill”) is written in three different ways with three separate pronunciations, translation software can deal with all of them because they occur in MSM with sufficiently high frequency to be recognized as an integral, naturalized part of MSM vocabulary.

Machine translation and how it deals with Chinese characters

Machine translation also continues to advance when converting simplified Chinese characters to traditional Chinese characters. It is now possible to convert a simplified Chinese text into traditional Chinese at the touch of a button, with various Microsoft Word add-ons which are widely available. However, it is strongly recommended to have the text checked by a native of the target variation. The technology may detect and convert the characters correctly, but the sentence structure and vocabulary used may not be suitable for the target audience, and changes to make the text more culturally relevant will no doubt be required.


*The Sinitic languages are often synonymous with the group of Chinese varieties, are a family of Sino-Tibetan languages.

**Classical Chinese, also known as Literary Chinese,[a] is the language of the classic literature from the end of the Spring and Autumn period through to the end of the Han Dynasty, a written form of Old Chinese. Classical Chinese is a traditional style of written Chinese that evolved from the classical language, making it different from any modern spoken form of Chinese.

***Standard Chinese, also known as Modern Standard Mandarin, Standard Mandarin, or simply Mandarin, is a standard variety of Chinese that is the sole official language of both China and Taiwan, and also one of the four official languages of Singapore. Its pronunciation is based on the Beijing dialect, its vocabulary on the Mandarin dialects, and its grammar is based on written vernacular Chinese.

Stéphane ChouryMachine translation and how it deals with Chinese characters