Character Encoding

  • The character encoding used to store the XML Document.

  • Defined by the encoding property in the XML Declaration.

    Possible values "US-ASCII", "UTF-8", "UTF-16", "ISO-10646-UCS-2", "ISO-10646-UCS-4" or "ISO-8859-1" ...

  • Problems:

    When saved on the file system, it is unknown what the encoding is, until the XML Declaration has been read (XML Tools can auto detect most encodings).

    Tools do not save using correct encodings, so XML Declaration / real encoding mismatch.

    java.io.FileWriter if no encoding specified will use the default Java encoding, which is different when saved on HP, SUN or Windows and is different to UTF-8.

    Notepad by default saves the characters as ANSI (UTF-8/UTF-16 option available).

    TextPad can only display ANSI characters, and will save a '?' character when it cannot display the character.

  • Constraint:

    Each XML Document not accompanied by external encoding information and not in UTF-8 or UTF-16 encoding must begin with an XML encoding declaration, in which the first characters must be '<?xml'.