The Zend_Search_Lucene_Analysis_Analyzer class is used by the
indexer to tokenize document text fields.
The Zend_Search_Lucene_Analysis_Analyzer::getDefault() and
Zend_Search_Lucene_Analysis_Analyzer::setDefault() methods are used
to get and set the default analyzer.
You can assign your own text analyzer or choose it from the set of predefined analyzers:
Zend_Search_Lucene_Analysis_Analyzer_Common_Text and
Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive
(default). Both of them interpret tokens as sequences of letters.
Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive
converts all tokens to lower case.
To switch between analyzers:
<?php
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new Zend_Search_Lucene_Analysis_Analyzer_Common_Text());
...
$index->addDocument($doc);
The Zend_Search_Lucene_Analysis_Analyzer_Common class is designed
to be an ancestor of all user defined analyzers. User should only define the
reset() and nextToken() methods, which
takes its string from the $_input member and returns tokens one by one (a
NULL value indicates the end of the stream).
The nextToken() method should call the
normalize() method on each token. This will allow you to use
token filters with your analyzer.
Here is an example of a custom analyzer, which accepts words with digits as terms:
Example 686. Custom text Analyzer
<?php
/**
* Here is a custom text analyser, which treats words with digits as
* one term
*/
class My_Analyzer extends Zend_Search_Lucene_Analysis_Analyzer_Common
{
private $_position;
/**
* Reset token stream
*/
public function reset()
{
$this->_position = 0;
}
/**
* Tokenization stream API
* Get next token
* Returns null at the end of stream
*
* @return Zend_Search_Lucene_Analysis_Token|null
*/
public function nextToken()
{
if ($this->_input === null) {
return null;
}
while ($this->_position < strlen($this->_input)) {
// skip white space
while ($this->_position < strlen($this->_input) &&
!ctype_alnum( $this->_input[$this->_position] )) {
$this->_position++;
}
$termStartPosition = $this->_position;
// read token
while ($this->_position < strlen($this->_input) &&
ctype_alnum( $this->_input[$this->_position] )) {
$this->_position++;
}
// Empty token, end of stream.
if ($this->_position == $termStartPosition) {
return null;
}
$token = new Zend_Search_Lucene_Analysis_Token(
substr($this->_input,
$termStartPosition,
$this->_position -
$termStartPosition),
$termStartPosition,
$this->_position);
$token = $this->normalize($token);
if ($token !== null) {
return $token;
}
// Continue if token is skipped
}
return null;
}
}
Zend_Search_Lucene_Analysis_Analyzer::setDefault(
new My_Analyzer());




