PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Tokens Filtering

The Zend_Search_Lucene_Analysis_Analyzer_Common analyzer also offers a token filtering mechanism.

The Zend_Search_Lucene_Analysis_TokenFilter class provides an abstract interface for such filters. Your own filters should extend this class either directly or indirectly.

Any custom filter must implement the normalize() method which may transform input token or signal that the current token should be skipped.

There are three filters already defined in the analysis subpackage:

  • Zend_Search_Lucene_Analysis_TokenFilter_LowerCase

  • Zend_Search_Lucene_Analysis_TokenFilter_ShortWords

  • Zend_Search_Lucene_Analysis_TokenFilter_StopWords

The LowerCase filter is already used for Zend_Search_Lucene_Analysis_Analyzer_Common_Text_CaseInsensitive analyzer by default.

The ShortWords and StopWords filters may be used with pre-defined or custom analyzers like this:

<?php
$stopWords 
= array('a''an''at''the''and''or''is''am');
$stopWordsFilter =
    new 
Zend_Search_Lucene_Analysis_TokenFilter_StopWords($stopWords);

$analyzer =
    new 
Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);
<?php
$shortWordsFilter 
= new Zend_Search_Lucene_Analysis_TokenFilter_ShortWords();

$analyzer =
    new 
Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($shortWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);

The Zend_Search_Lucene_Analysis_TokenFilter_StopWords constructor takes an array of stop-words as an input. But stop-words may be also loaded from a file:

<?php
$stopWordsFilter 
= new Zend_Search_Lucene_Analysis_TokenFilter_StopWords();
$stopWordsFilter->loadFromFile($my_stopwords_file);

$analyzer =
   new 
Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive();
$analyzer->addFilter($stopWordsFilter);

Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer);

This file should be a common text file with one word in each line. The '#' character marks a line as a comment.

The Zend_Search_Lucene_Analysis_TokenFilter_ShortWords constructor has one optional argument. This is the word length limit, set by default to 2.

Zend Framework