Class cebe\jssearch\tokenizer\StandardTokenizer

Inheritancecebe\jssearch\tokenizer\StandardTokenizer
Implementscebe\jssearch\TokenizerInterface

StandardTokenizer

Public Properties

Hide inherited properties

PropertyTypeDescriptionDefined By
$delimiters string A list of characters that should be used as word delimiters. cebe\jssearch\tokenizer\StandardTokenizer
$stopWords array A list of stopwords to remove from the token list. cebe\jssearch\tokenizer\StandardTokenizer

Public Methods

Hide inherited methods

MethodDescriptionDefined By
tokenize() Tokenizes a string and returns an array of the following format: cebe\jssearch\tokenizer\StandardTokenizer
tokenizeJs() Returns a javascript equivalent of tokenize() that will be used on client side to tokenize the search query. cebe\jssearch\tokenizer\StandardTokenizer

Property Details

$delimiters public property

A list of characters that should be used as word delimiters.

public string $delimiters '.,;:\\/[](){}'
$stopWords public property

A list of stopwords to remove from the token list.

public array $stopWords = ["a""an""and""are""as""at""be""but""by""for""if""in""into""is""it""no""not""of""on""or""such""that""the""their""then""there""these""they""this""to""was""will""with"]

Method Details

tokenize() public method

Tokenizes a string and returns an array of the following format:

[['t' => 'word', 'w' => 2], ['t' => 'other', 'w' => 1]]

where the first part is the token string and the second is a weight value.

Also removes $stopWords from the list.

public array tokenize ( $string )
$string string

The string to tokenize

tokenizeJs() public method

Returns a javascript equivalent of tokenize() that will be used on client side to tokenize the search query.

This is used to ensure the same tokenizer is used for building the index and for searching.

public string tokenizeJs ( )