Surrogate Pairs

[Back]  [Home]  [CJK]
 

This page only gives you an example of how you can encode characters in UTF-16 whose Unicode values are higher than 0xFFFF.  Read RFC 2781 for first-hand information.

  1. Take the Unicode value of the character and subtract 0x10000 from it.
  2. Convert the remainder from hexadecimal to binary form and extend the binary number to 20 digits with initial zeros if necessary.
  3. Fill in the empty slots of the following template with the twenty digits:
    110110xxxxxxxxxx 110111xxxxxxxxxx

Example

  1. Say you want to encode

    U+2a6d6

    whose Unicode value is 2A6D6.  0x10000 subtracted from 0x2A6D6 gives 0x1A6D6.

  2. Convert 0x1A6D6 to a 20-digit binary number to get

    0001101001 1011010110

  3. Fill in the empty slots of the following template

    110110xxxxxxxxxx 110111xxxxxxxxxx

    with the twenty digits:

    1101100001101001 1101111011010110

    and you will get this surrogate pair: 0xD869 0xDED6.


Source of Information

Paul Hoffman & François Yergeau. 2000. UTF-16, an encoding of  ISO 10646. RFC 2781.


© 2000-2002 Gyula Zsigri [Back]  [Home]  [CJK] Last updated:  December 22, 2002