The score of a document d for a query q
is defined as follows:
score(q,d) = sum( tf(t in d) * idf(t) * getBoost(t.field in d) *
lengthNorm(t.field in d) ) * coord(q,d) * queryNorm(q)
tf(t in d) - Zend_Search_Lucene_Search_Similarity::tf($freq) -
a score factor based on the frequency of a term or phrase in a document.
idf(t) - Zend_Search_Lucene_Search_Similarity::idf($input,
$reader) - a score factor for a simple term with the specified index.
getBoost(t.field in d) - the boost factor for the term field.
lengthNorm($term) - the normalization value for a field given the total number of terms contained in a field. This value is stored within the index. These values, together with field boosts, are stored in an index and multiplied into scores for hits on each field by the search code.
Matches in longer fields are less precise, so implementations of this method usually return smaller values when numTokens is large, and larger values when numTokens is small.
coord(q,d) - Zend_Search_Lucene_Search_Similarity::coord($overlap,
$maxOverlap) - a score factor based on the fraction of all query terms
that a document contains.
The presence of a large portion of the query terms indicates a better match with the query, so implementations of this method usually return larger values when the ratio between these parameters is large and smaller values when the ratio between them is small.
queryNorm(q) - the normalization value for a query given the sum of the squared weights of each of the query terms. This value is then multiplied into the weight of each query term.
This does not affect ranking, but rather just attempts to make scores from different queries comparable.
The scoring algorithm can be customized by defining your own Similarity class. To do
this extend the Zend_Search_Lucene_Search_Similarity class as
defined below, then use the
Zend_Search_Lucene_Search_Similarity::setDefault($similarity);
method to set it as default.
<?php
class MySimilarity extends Zend_Search_Lucene_Search_Similarity {
public function lengthNorm($fieldName, $numTerms) {
return 1.0/sqrt($numTerms);
}
public function queryNorm($sumOfSquaredWeights) {
return 1.0/sqrt($sumOfSquaredWeights);
}
public function tf($freq) {
return sqrt($freq);
}
/**
* It's not used now. Computes the amount of a sloppy phrase match,
* based on an edit distance.
*/
public function sloppyFreq($distance) {
return 1.0;
}
public function idfFreq($docFreq, $numDocs) {
return log($numDocs/(float)($docFreq+1)) + 1.0;
}
public function coord($overlap, $maxOverlap) {
return $overlap/(float)$maxOverlap;
}
}
$mySimilarity = new MySimilarity();
Zend_Search_Lucene_Search_Similarity::setDefault($mySimilarity);




