A Lucene index consists of many segments. Each segment is a completely independent set of data.
Lucene index segment files can't be updated by design. A segment update needs full segment reorganization. See Lucene index file formats for details (http://lucene.apache.org/java/2_3_0/fileformats.html) . New documents are added to the index by creating new segment.
Increasing number of segments reduces quality of the index, but index optimization restores it. Optimization essentially merges several segments into a new one. This process also doesn't update segments. It generates one new large segment and updates segment list ('segments' file).
Full index optimization can be trigger by calling the
Zend_Search_Lucene::optimize() method. It merges all index
segments into one new segment:
// Open existing index
$index = Zend_Search_Lucene::open('/data/my-index');
// Optimize index.
Automatic index optimization is performed to keep indexes in a consistent state.
Automatic optimization is an iterative process managed by several index options. It merges very small segments into larger ones, then merges these larger segments into even larger segments and so on.
MaxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new segment.
MaxBufferedDocs can be retrieved or set by
Default value is 10.
MaxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10.000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.
MaxMergeDocs can be retrieved or set by
Default value is PHP_INT_MAX.
MergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.
MergeFactor is a good estimation for average number of segments merged by one auto-optimization pass. Too large values produce large number of segments while they are not merged into new one. It may be a cause of "failed to open stream: Too many open files" error message. This limitation is system dependent.
MergeFactor can be retrieved or set by
Default value is 10.
Lucene Java and Luke (Lucene Index Toolbox - http://www.getopt.org/luke/) can also
be used to optimize an index. Latest Luke release (v0.8) is based on Lucene v2.3 and
compatible with current implementation of
component (Zend Framework 1.6). Earlier versions of
Zend_Search_Lucene implementations need another versions of
Java Lucene tools to be compatible:
 The currently supported Lucene index file format is version 2.3 (starting from Zend Framework 1.6).