Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Public Properties

PropertyTypeDescriptionDefined By
$tracksLineNumbers Whether or not this lexer implements line-number/column-number tracking. HTMLPurifier_Lexer

Protected Properties

PropertyTypeDescriptionDefined By
$_special_entity2str Most common entity to raw value conversion table for special entities. HTMLPurifier_Lexer

Public Methods

MethodDescriptionDefined By
__construct() HTMLPurifier_Lexer_DOMLex
callbackArmorCommentEntities() Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them HTMLPurifier_Lexer_DOMLex
callbackUndoCommentSubst() Callback function for undoing escaping of stray angled brackets in comments HTMLPurifier_Lexer_DOMLex
create() Retrieves or sets the default Lexer as a Prototype Factory. HTMLPurifier_Lexer
extractBody() Takes a string of HTML (fragment or document) and returns the content HTMLPurifier_Lexer
muteErrorHandler() An error handler that mutes all errors HTMLPurifier_Lexer_DOMLex
normalize() Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff. HTMLPurifier_Lexer
parseData() Parses special entities into the proper characters. HTMLPurifier_Lexer
tokenizeHTML() HTMLPurifier_Lexer_DOMLex

Protected Methods

MethodDescriptionDefined By
CDATACallback() Callback function for escapeCDATA() that does the work. HTMLPurifier_Lexer
createEndNode() HTMLPurifier_Lexer_DOMLex
createStartNode() HTMLPurifier_Lexer_DOMLex
escapeCDATA() Translates CDATA sections into regular sections (through escaping). HTMLPurifier_Lexer
escapeCommentedCDATA() Special CDATA case that is especially convoluted for

