PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Document and Field Objects

Zend_Search_Lucene operates with documents as atomic objects for indexing. A document is divided into named fields, and fields have content that can be searched.

A document is represented by the Zend_Search_Lucene_Document class, and this objects of this class contain instances of Zend_Search_Lucene_Field that represent the fields on the document.

It is important to note that any information can be added to the index. Application-specific information or metadata can be stored in the document fields, and later retrieved with the document during search.

It is the responsibility of your application to control the indexer. This means that data can be indexed from any source that is accessible by your application. For example, this could be the filesystem, a database, an HTML form, etc.

Zend_Search_Lucene_Field class provides several static methods to create fields with different characteristics:

<?php
$doc 
= new Zend_Search_Lucene_Document();

// Field is not tokenized, but is indexed and stored within the index.
// Stored fields can be retrived from the index.
$doc->addField(Zend_Search_Lucene_Field::Keyword('doctype',
                                                 
'autogenerated'));

// Field is not tokenized nor indexed, but is stored in the index.
$doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
                                                   
time()));

// Binary String valued Field that is not tokenized nor indexed,
// but is stored in the index.
$doc->addField(Zend_Search_Lucene_Field::Binary('icon',
                                                
$iconData));

// Field is tokenized and indexed, and is stored in the index.
$doc->addField(Zend_Search_Lucene_Field::Text('annotation',
                                              
'Document annotation text'));

// Field is tokenized and indexed, but is not stored in the index.
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
                                                  
'My document content'));

Each of these methods (excluding the Zend_Search_Lucene_Field::Binary() method) has an optional $encoding parameter for specifying input data encoding.

Encoding may differ for different documents as well as for different fields within one document:

<?php
$doc 
= new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('title',
                                              
$title,
                                              
'iso-8859-1'));
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
                                                  
$contents,
                                                  
'utf-8'));

If encoding parameter is omitted, then the current locale is used at processing time. For example:

<?php
setlocale
(LC_ALL'de_DE.iso-8859-1');
...
$doc->addField(Zend_Search_Lucene_Field::UnStored('contents'$contents));

Fields are always stored and returned from the index in UTF-8 encoding. Any required conversion to UTF-8 happens automatically.

Text analyzers (see below) may also convert text to some other encodings. Actually, the default analyzer converts text to 'ASCII//TRANSLIT' encoding. Be careful, however; this translation may depend on current locale.

Fields' names are defined at your discretion in the addField() method.

Java Lucene uses the 'contents' field as a default field to search. Zend_Search_Lucene searches through all fields by default, but the behavior is configurable. See the "Default search field" chapter for details.

Zend Framework