PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Creating A Fulltext Search Engine In PHP 5 With The Zend Framework's Zend Search Lucene

Indexing All The Articles On PhpRiot

Now that you’ve seen how to create a basic index, we will extend this script slightly so it can index all the documents in PhpRiot.

Additionally, we will be extending the base Zend_Search_Lucene_Document class to simplify our code slightly. This will also demonstrate ways you can take advantage of the OOP style of programming that Zend_Search_Lucene uses.

Extending Zend_Search_Lucene_Document

On the previous page, after we opened the index, we created a new instance of Zend_Search_Lucene_Document to hold the index data for a single document. Instead of calling this class directly, we’re going to extend this class to encapsulate all of the adding of data we also did.

In other words, we’re going to move the calls to addField() into our class, rather than calling it for each field after we create our Zend_Search_Lucene_Document item.

Listing 10 PhpRiotIndexedDocument.php
<?php
    class PhpRiotIndexedDocument extends Zend_Search_Lucene_Document
    {
        /**
         * Constructor. Creates our indexable document and adds all
         * necessary fields to it using the passed in document
         */
        public function __construct($document)
        {
            $this->addField(Zend_Search_Lucene_Field::Keyword('document_id', $document->id));
            $this->addField(Zend_Search_Lucene_Field::UnIndexed('url',       $document->url));
            $this->addField(Zend_Search_Lucene_Field::UnIndexed('created',   $document->created));
            $this->addField(Zend_Search_Lucene_Field::UnIndexed('teaser',    $document->teaser));
            $this->addField(Zend_Search_Lucene_Field::Text('title',          $document->title));
            $this->addField(Zend_Search_Lucene_Field::Text('author',         $document->author));
            $this->addField(Zend_Search_Lucene_Field::UnStored('content',    $document->body));
        }
    }
?>

Building the full index

Now that we have our class, we can create our index, loop over the documents, and then save our index:

Listing 11 listing-11.php
<?php
    require_once('Zend/Search/Lucene.php');
    require_once('PhpRiotIndexedDocument.php');
 
    // where to save our index
    $indexPath = '/var/www/phpriot.com/data/docindex';
 
    // fictional function used to retrieve data from the database
    $documents = GetAllDocuments();
 
    // create our index
    $index = Zend_Search_Lucene::create($indexPath);
 
    foreach ($documents as $document) {
        $index->addDocument(new PhpRiotIndexedDocument($document));
    }
 
    // write the index to disk
    $index->commit();
?>

The index has now been created! This can take some time if you have many documents or if each document has a large amount of content. We generate this by using PHP on the command line, which allows us to see its progress in real-time if we need to (we also output the title and a status message as each document is indexed).

In This Article


Article History

Apr 27, 2006
Initial article version
Dec 17, 2007
Updated to use Zend Framework 1.0.3