Class cebe\jssearch\tokenizer\StandardTokenizer
Inheritance | cebe\jssearch\tokenizer\StandardTokenizer |
---|---|
Implements | cebe\jssearch\TokenizerInterface |
StandardTokenizer
Public Properties
Property | Type | Description | Defined By |
---|---|---|---|
$delimiters | string | A list of characters that should be used as word delimiters. | cebe\jssearch\tokenizer\StandardTokenizer |
$stopWords | array | A list of stopwords to remove from the token list. | cebe\jssearch\tokenizer\StandardTokenizer |
Public Methods
Method | Description | Defined By |
---|---|---|
tokenize() | Tokenizes a string and returns an array of the following format: | cebe\jssearch\tokenizer\StandardTokenizer |
tokenizeJs() | Returns a javascript equivalent of tokenize() that will be used on client side to tokenize the search query. | cebe\jssearch\tokenizer\StandardTokenizer |
Property Details
A list of characters that should be used as word delimiters.
A list of stopwords to remove from the token list.
public array $stopWords = ["a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"]
Method Details
Tokenizes a string and returns an array of the following format:
[['t' => 'word', 'w' => 2], ['t' => 'other', 'w' => 1]]
where the first part is the token string and the second is a weight value.
Also removes $stopWords from the list.
public array tokenize ( $string ) | ||
$string | string | The string to tokenize |
Returns a javascript equivalent of tokenize() that will be used on client side to tokenize the search query.
This is used to ensure the same tokenizer is used for building the index and for searching.
public string tokenizeJs ( ) |