(Inclusive) Canonicalization

  • The document is encoded in UTF-8

  • Line breaks normalized to #xA on input, before parsing.

  • Attribute values are normalized, as if by a validating processor.

  • Character and parsed entity references are replaced

  • CDATA sections are replaced with their character content

  • The XML declaration and document type declaration (DTD) are removed

  • Empty elements are converted to start-end tag pairs

  • Whitespace outside of the document element and within start and end tags is normalized

  • All whitespace in character content is retained (excluding characters removed during line feed normalization)

  • Attribute value delimiters are set to quotation marks (double quotes)

  • Special characters in attribute values and character content are replaced by character references

  • Superfluous namespace declarations are removed from each element

  • Default attributes are added to each element

  • Lexicographic order is imposed on the namespace declarations and attributes of each element

  • Canonical XML Version 1.0