Class HTMLPurifier_Lexer_DOMLex
Inheritance | HTMLPurifier_Lexer_DOMLex » HTMLPurifier_Lexer |
---|---|
Subclasses | HTMLPurifier_Lexer_PH5P |
Parser that uses PHP 5's DOM extension (part of the core).
In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.
Public Properties
Property | Type | Description | Defined By |
---|---|---|---|
$tracksLineNumbers | Whether or not this lexer implements line-number/column-number tracking. | HTMLPurifier_Lexer |
Protected Properties
Property | Type | Description | Defined By |
---|---|---|---|
$_special_entity2str | Most common entity to raw value conversion table for special entities. | HTMLPurifier_Lexer |
Public Methods
Method | Description | Defined By |
---|---|---|
__construct() | HTMLPurifier_Lexer_DOMLex | |
callbackArmorCommentEntities() | Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them | HTMLPurifier_Lexer_DOMLex |
callbackUndoCommentSubst() | Callback function for undoing escaping of stray angled brackets in comments | HTMLPurifier_Lexer_DOMLex |
create() | Retrieves or sets the default Lexer as a Prototype Factory. | HTMLPurifier_Lexer |
extractBody() | Takes a string of HTML (fragment or document) and returns the content | HTMLPurifier_Lexer |
muteErrorHandler() | An error handler that mutes all errors | HTMLPurifier_Lexer_DOMLex |
normalize() | Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff. | HTMLPurifier_Lexer |
parseData() | Parses special entities into the proper characters. | HTMLPurifier_Lexer |
tokenizeHTML() | HTMLPurifier_Lexer_DOMLex |
Protected Methods
Method | Description | Defined By |
---|---|---|
CDATACallback() | Callback function for escapeCDATA() that does the work. | HTMLPurifier_Lexer |
createEndNode() | HTMLPurifier_Lexer_DOMLex | |
createStartNode() | HTMLPurifier_Lexer_DOMLex | |
escapeCDATA() | Translates CDATA sections into regular sections (through escaping). | HTMLPurifier_Lexer |
escapeCommentedCDATA() | Special CDATA case that is especially convoluted for | HTMLPurifier_Lexer |
removeIEConditional() | Special Internet Explorer conditional comments should be removed. | HTMLPurifier_Lexer |
tokenizeDOM() | Iterative function that tokenizes a node, putting it into an accumulator. | HTMLPurifier_Lexer_DOMLex |
transformAttrToAssoc() | Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. | HTMLPurifier_Lexer_DOMLex |
wrapHTML() | Wraps an HTML fragment in the necessary HTML | HTMLPurifier_Lexer_DOMLex |
Method Details
public void __construct ( ) |
Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
public string callbackArmorCommentEntities ( $matches ) | ||
$matches | array |
Callback function for undoing escaping of stray angled brackets in comments
public string callbackUndoCommentSubst ( $matches ) | ||
$matches | array |
protected void createEndNode ( $node, <b>&</b>$tokens ) | ||
$node | DOMNode | |
$tokens | HTMLPurifier_Token[] |
protected bool createStartNode ( $node, <b>&</b>$tokens, $collect ) | ||
$node | DOMNode | DOMNode to be tokenized. |
$tokens | HTMLPurifier_Token[] | Array-list of already tokenized tokens. |
$collect | bool | Says whether or start and close are collected, set to
|
return | bool | If the token needs an endtoken |
---|
An error handler that mutes all errors
public void muteErrorHandler ( $errno, $errstr ) | ||
$errno | int | |
$errstr | string |
Iterative function that tokenizes a node, putting it into an accumulator.
To iterate is human, to recurse divine - L. Peter Deutsch
protected HTMLPurifier_Token tokenizeDOM ( $node, <b>&</b>$tokens ) | ||
$node | DOMNode | DOMNode to be tokenized. |
$tokens | HTMLPurifier_Token[] | Array-list of already tokenized tokens. |
return | HTMLPurifier_Token | Of node appended to previously passed tokens. |
---|
public HTMLPurifier_Token[] tokenizeHTML ( $html, $config, $context ) | ||
$html | string | |
$config | HTMLPurifier_Config | |
$context | HTMLPurifier_Context |
Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
protected array transformAttrToAssoc ( $node_map ) | ||
$node_map | DOMNamedNodeMap | DOMNamedNodeMap of DOMAttr objects. |
return | array | Associative array of attributes. |
---|
Wraps an HTML fragment in the necessary HTML
protected string wrapHTML ( $html, $config, $context ) | ||
$html | string | |
$config | HTMLPurifier_Config | |
$context | HTMLPurifier_Context |