Creating A Fulltext Search Engine In PHP 5 With The Zend Framework's Zend Search Lucene
Querying Our Index
On the previous page we looked at how to write queries to search the index. We learned how to include and exclude terms, and also how to search different fields in our indexed data.
Now we will look at actually pulling documents from our index using that term.
There are essentially two ways to query the index: passing the raw query in and letting Zend_Search_Lucene parse the query (ideal when you’re writing a search engine where you’re not sure what the user will enter), or by manually building up the query with API function calls.
In either case, you use the find() method on the index. The find() method returns a list of matches from your index.
Firstly though, you must open your existing index. To do this we use the static open() method from the Zend_Search_Lucene class. Like the create() method, this takes the filesystem path of the index as the first argument.
require_once('Zend/Search/Lucene.php'); $indexPath = '/var/www/phpriot.com/data/docindex'; $index = Zend_Search_Lucene::open($indexPath); $hits = $index->find('php +author:Quentin');
open() in a try / catch.This sample code searches our index by also articles containing ‘php’, written by me. Note that when we opened our index, we did not pass the second parameter as we did when we created the index. This is because we are not writing the index, we are querying it.
We could also manually build this same query with function calls like so:
require_once('Zend/Search/Lucene.php'); $indexPath = '/var/www/phpriot.com/data/docindex'; $index = Zend_Search_Lucene::open($indexPath); $query = new Zend_Search_Lucene_Search_Query_MultiTerm(); $query->addTerm(new Zend_Search_Lucene_Index_Term('php'), null); $query->addTerm(new Zend_Search_Lucene_Index_Term('Quentin', 'author'), true); $hits = $index->find($query);
The second parameter for addTerm() used determines whether or not a field is required. true means it is required (like putting a plus sign before the term), false means it is prohibited (like putting a minus sign before the term), null means it isn’t required or prohibited.
The second parameter for Zend_Search_Lucene_Index_Term specifies the field to search index. By default this is contents.
On the whole, it is easier to simply allow Zend_Search_Lucene to parse the query.
Dealing with returned results
The results found from your query are returned in an array, meaning you can simply use count() on the array to determine the number of hits.
Each of the indexed fields are available as a class property.
So to loop over the results as we indexed them previously (with a title, author and teaser), we would do the following:
require_once('Zend/Search/Lucene.php'); $query = 'php +author:Quentin'; $indexPath = '/var/www/phpriot.com/data/docindex'; $index = Zend_Search_Lucene::open($indexPath); $hits = $index->find($query); $numHits = count($hits); <p> Found $hits result(s) for query $query . </p> foreach ($hits as $hit) { <h3> $hit->title (score: $hit->score )</h3> <p> By $hit->author </p> <p> $hit->teaser <br /> <a href=" $hit->url ">Read more...</a> </p> }
Here we also used an extra field called score. As mentioned previously, this is used as an indicator as to how well a document matched the query. Results with the highest score are listed first.
Creating a simple search engine
Using our code above, we can easily transform this into a simple site search engine. All we need to do is add a form and plug in the submitted query. Let’s assume this script is called search.php:
require_once('Zend/Search/Lucene.php'); $query = isset($_GET['query']) ? $_GET['query'] : ''; $query = trim($query); $indexPath = '/var/www/phpriot.com/data/docindex'; $index = Zend_Search_Lucene::open($indexPath); if (strlen($query) > 0) { $hits = $index->find($query); $numHits = count($hits); } <form method="get" action="search.php"> <input type="text" name="query" value=" htmlSpecialChars($query) " /> <input type="submit" value="Search" /> </form> if (strlen($query) > 0) { <p> Found $hits result(s) for query $query . </p> foreach ($hits as $hit) { <h3> $hit->title (score: $hit->score )</h3> <p> By $hit->author </p> <p> $hit->teaser <br /> <a href=" $hit->url ">Read more...</a> </p> } }
Error handling
The one thing we haven’t dealt with yet are errors in the search. For instance, if we were to type in title: with no query behind it then an error would occur. We handle this by catching the Zend_Search_Lucene_Exception exception.
require_once('Zend/Search/Lucene.php'); $query = isset($_GET['query']) ? $_GET['query'] : ''; $query = trim($query); $indexPath = '/var/www/phpriot.com/data/docindex'; $index = Zend_Search_Lucene::open($indexPath); try { $hits = $index->find($query); } catch (Zend_Search_Lucene_Exception $ex) { $hits = array(); } $numHits = count($hits);
This means now that if an error occurs in the search, we simply assume zero hits were returned, thereby handling the error without indicating to the user that anything went wrong.
Of course, you could also choose to get the error message from the exception and output that instead ($ex->getMessage()).




