Category: Unicode

Khmer Unicode: 17D2 aka ជើង

7/2/2014

Every Khmer Unicode writer should know about this character code 17D2.
It is a character we have to type before every single Consonant character to make it become subscript aka ជើង. Even Cambodians see these subscripts as distinct character forms.

This 17D2 code was introduce and issued to Khmer script (Official language for Kingdom of Cambodia) in Unicode version 3.0 in September 1999. It was a big update and encoded 49,259 characters.
Khmer has 103 characters added (In including number, and various sign / symbol)

Technical drawback

There are 2 drawbacks that I could think of. First one is Storage: each Unicode character is 2 bytes space and we need 2 Unicode to represent 1 subscript letter in Khmer, that toke 4 bytes. If we have 1 distinct Unicode to represent 1 subscript, then we would save a lot of storage space in document or transferring data across internet.
Second drawback is Processing: Imagine in every NLP application (Natural Language Processing) such as Spell checking, Word segmentation, etc... will have to check 2 characters instead of 1 to determine whether a letter is a consonant or subscript. This will consume overhead processing time and memory.

Bright side

Because we have our own Khmer Unicode, we can display and store Khmer text in digital form properly, no need to depend of ASCII Font problem any more.

0 Comments

Khmer Unicode: 17D2 aka ជើង

Technical drawback

Bright side

Author

Archives

Categories