Zend_Search_Lucene offers a Powerpoint 2007 parsing feature.
Documents can be created directly from a Powerpoint 2007 file:
<?php
$doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
$index->addDocument($doc);
Zend_Search_Lucene_Document_Pptx class uses the
ZipArchive class and simplexml methods to parse the source
document. If the ZipArchive class (from module php_zip) is not available,
the Zend_Search_Lucene_Document_Pptx will also not be available
for use with Zend Framework.
Zend_Search_Lucene_Document_Pptx class recognizes document meta
data and document text. Meta data consists, depending on document contents, of filename,
title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
created.
The 'filename' field is the actual Powerpoint 2007 file name.
The 'title' field is the actual document title.
The 'subject' field is the actual document subject.
The 'creator' field is the actual document creator.
The 'keywords' field contains the actual document keywords.
The 'description' field is the actual document description.
The 'lastModifiedBy' field is the username who has last modified the actual document.
The 'revision' field is the actual document revision number.
The 'modified' field is the actual document last modified date / time.
The 'created' field is the actual document creation date / time.
The 'body' field is the actual content of all slides and slide notes in the Powerpoint 2007 document.
The loadPptxFile() methods of
Zend_Search_Lucene_Document_Pptx class also have second optional
argument. If it's set to TRUE, then body content is also stored
within index and can be retrieved from the index. By default, the body is tokenized and
indexed, but not stored.
Parsed documents may be augmented by the programmer with any other field:
<?php
$doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
$doc->addField(Zend_Search_Lucene_Field::UnIndexed(
'indexTime',
time()));
$doc->addField(Zend_Search_Lucene_Field::Text(
'annotation',
'Document annotation text'));
$index->addDocument($doc);




