Zend_Search_Lucene offers a Excel 2007 parsing feature. Documents
can be created directly from a Excel 2007 file:
<?php
$doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
$index->addDocument($doc);
Zend_Search_Lucene_Document_Xlsx class uses the
ZipArchive class and simplexml methods to parse the source
document. If the ZipArchive class (from module php_zip) is not available,
the Zend_Search_Lucene_Document_Xlsx will also not be available
for use with Zend Framework.
Zend_Search_Lucene_Document_Xlsx class recognizes document meta
data and document text. Meta data consists, depending on document contents, of filename,
title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
created.
The 'filename' field is the actual Excel 2007 file name.
The 'title' field is the actual document title.
The 'subject' field is the actual document subject.
The 'creator' field is the actual document creator.
The 'keywords' field contains the actual document keywords.
The 'description' field is the actual document description.
The 'lastModifiedBy' field is the username who has last modified the actual document.
The 'revision' field is the actual document revision number.
The 'modified' field is the actual document last modified date / time.
The 'created' field is the actual document creation date / time.
The 'body' field is the actual content of all cells in all worksheets of the Excel 2007 document.
The loadXlsxFile() methods of
Zend_Search_Lucene_Document_Xlsx class also have second optional
argument. If it's set to TRUE, then body content is also stored
within index and can be retrieved from the index. By default, the body is tokenized and
indexed, but not stored.
Parsed documents may be augmented by the programmer with any other field:
<?php
$doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
$doc->addField(Zend_Search_Lucene_Field::UnIndexed(
'indexTime',
time()));
$doc->addField(Zend_Search_Lucene_Field::Text(
'annotation',
'Document annotation text'));
$index->addDocument($doc);




