OWASP Java HTML Sanitizer Change Log

  1. Fixed bug: tag balancer allowed </p> to close a table, so rewrote tag balancer to recognize scoping elements per HTML5.
  2. Fixed bug: missing bit in HTML schema led to text in <option> elements being elided even when the elements themselves were white-listed.
  3. Fixed bug: requireRelNoFollowOnLinks() was implicitly allowing the a element. Changed this to be consistent with document: no elements are allowed that do not appear in a call to allowElements.
  4. Add methods to policy builder to specify which elements are allowed to contain text and change default to disallow text in CDATA elements whose content is often not plain text. If custom element policies that change the element type fail, make sure the policy allows the output element type.
  5. Restrict where text-nodes can validly appear in output per HTML5 rules and changed the tag balancer to do better error recovery on misplaced phrasing content.
  6. Changed rendering to ensure that the output HTML is valid XML when the policy prohibits HTML raw text & RCDATA elements as is almost always the case.
  7. Changed lexer to treat <?…> using the HTML5 bogus comment state grammar which agrees with XML's processing instruction production. Previously, the token ended at the first "?>" or end-of-file instead of the first ">".
  8. Fixed problem with URL protocol white-listing that caused legitimate URLs to be rejected.
  9. Cleaned up raw-text tag handling. XMP, LISTING, PLAINTEXT now handled by substitution in the renderer and changed NOSCRIPT and friends so they are treated consistently when elided as when present in output. Added workaround for IE8 innerHTML wierdness.
  10. Prevent DoS of browsers via extremely deeply nested tags. In sanitized CSS, allow CSS property background-color andfont-sizes specified in px.
  11. Added convenient pre-packaged policies in Sanitizers. Fixed bug in how warnings are reported via the badHtml Handler.
  12. Better handling of supplementary codepoints to avoid UTF-16/UCS-2 confusion in browsers.
  13. Added new HTML5 URL attributes to list used to safeguard URL attributes in HtmlPolicyBuilder.
  14. Changed HtmlSanitizer.sanitize to allow null as a valid value for the HTML snippet.