PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Encoding

Zend_Search_Lucene works with UTF-8 strings internally. So all strings returned by Zend_Search_Lucene are UTF-8 encoded.

You shouldn't be concerned with encoding if you work with pure ASCII data, but you should be careful if this is not the case.

Wrong encoding may cause error notices at the encoding conversion time or loss of data.

Zend_Search_Lucene offers a wide range of encoding possibilities for indexed documents and parsed queries.

Encoding may be explicitly specified as an optional parameter of field creation methods:

<?php
$doc 
= new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('title',
                                              
$title,
                                              
'iso-8859-1'));
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
                                                  
$contents,
                                                  
'utf-8'));

This is the best way to avoid ambiguity in the encoding used.

If optional encoding parameter is omitted, then the current locale is used. The current locale may contain character encoding data in addition to the language specification:

<?php
setlocale
(LC_ALL'fr_FR');
...

setlocale(LC_ALL'de_DE.iso-8859-1');
...

setlocale(LC_ALL'ru_RU.UTF-8');
...

The same approach is used to set query string encoding.

If encoding is not specified, then the current locale is used to determine the encoding.

Encoding may be passed as an optional parameter, if the query is parsed explicitly before search:

<?php
$query 
=
    
Zend_Search_Lucene_Search_QueryParser::parse($queryStr'iso-8859-5');
$hits $index->find($query);
...

The default encoding may also be specified with setDefaultEncoding() method:

<?php
Zend_Search_Lucene_Search_QueryParser
::setDefaultEncoding('iso-8859-1');
$hits $index->find($queryStr);
...

The empty string implies 'current locale'.

If the correct encoding is specified it can be correctly processed by analyzer. The actual behavior depends on which analyzer is used. See the Character Set documentation section for details.

Zend Framework