PhpRiot
Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...


When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Creating A Fulltext Search Engine In PHP 5 With The Zend Framework's Zend Search Lucene

Extending Zend_Search_Lucene

There are several aspects of Zend_Search_Lucene that can be extended, allowing a fully customized search solution. These include:

  • A custom tokenizer for determining keywords in a document
  • Custom scoring algorithms to determine how well a document matches a search query
  • A custom storage method, to your index is stored however and wherever you please

A custom tokenizer

There are many reasons why a custom tokenizer can be useful. Here are some ideas:

  • PDF tokenizer – a tokenizer that can parse a PDF file and find all the keywords
  • Image tokenizer – a tokenizer that can perform Optical Character Recognition (OCR), thereby allowing you to index words in an image (and you could store the image also, using the Binary field type)
  • HTML tokenizer – a tokenizer than can read HTML data, thereby knowing not to index HTML keywords but only the actual content. You could make further improvements on this also, such as finding all headings and treating them with higher preference to the rest of the content.

Custom scoring algorithms

Using a custom scoring algorithm, you can determine how favourably different fields in a document are looked upon. For example, you might want to treat matches in the ‘title’ field (if you have one) much more favourably than matches in the ‘content’ field.

Custom storage method

You can change how indexes are written to disk by extending the Zend_Search_Lucene_Storage_Directory and Zend_Search_Lucene_Storage_File classes.

It may or may not be possible to change this store all indexed data in a database, but I haven’t actually tried this so I’m not sure.

More information on extending Zend_Search_Lucene can be found at http://framework.zend.com/manual/en/zend.search.lucene.extending.html.

In This Article


Article History

Apr 27, 2006
Initial article version
Dec 17, 2007
Updated to use Zend Framework 1.0.3