Creating A Fulltext Search Engine In PHP 5 With The Zend Framework's Zend Search Lucene
How Querying Works In Zend Search Lucene
Now comes the most important part — actually finding stuff! There are quite a lot of options when it comes to querying your index, allowing you and your users to have a lot of control in the returned results.
When we created the indexed, we added six fields, but only three of those were actually searchable: document title, document content, and the author.
Each of these items are stored separately for each indexed documents, meaning you can search on them separately. The syntax used to search each section is somewhat like Google’s, in that you specify the field, followed by a colon, followed by the term (with no spaces).
So to search the author field for
Quentin, the search query would be
author:Quentin. (Note that the search is case-insensitive. To make it case-sensitive we would need to change some options when creating our index. For full details on this, please read the Zend_Search_Lucene manual section on Extensibility)
Likewise, to search in the
title field for
php, we would use
Including and excluding terms
By default, all specified terms are searched on using a boolean 'or’. This means that any of the terms can exist for a document to be returned. To force results to have a particular term, the plus symbol is used. This force results to not have a particular term, the minus symbol is used. If you’re searching in a different field, you can put the plus or minus either before the field name or the term name. In other words,
author:+Quentin are identical.
Searching for phrases
It is possible to search for exact phrases with Zend_Search_Lucene, so if you wanted to search for the exact phrase “PHP Articles” you could. Because this is somewhat complicated to achieve, we will not be including this in our examples or implementation, however, there is alot of information on this on the Zend_Search_Lucene manual section on query types.
Here are some queries you can pass to Zend_Search_Lucene and their meanings.
php // search the index for any article with the word php php -author:quentin // find any article with the word php not written by me author:quentin // find all the articles by me php -ajax // find all articles with the word php that don't have the word ajax title:mysql // find all articles with MySQL in the title title:mysql -author:quentin // find all articles with MySQL in the title not by me
And so on. Hopefully you get the idea.
Scoring of results
All results returned from a search are assigned a score. This is a measure of how well the document matched the search term.
The results are ordered by their score, from highest to lowest.
You can customize the scoring algorithm (and hence the ordering of results). Please see the section later in the article on extending Zend_Search_Lucene.