Become Zend Certified

Prepare for the ZCE exam using our quizzes (web or iPad/iPhone). More info...

When you're ready get 7.5% off your exam voucher using voucher CJQNOV23 at the Zend Store

Word 2007 documents

Zend_Search_Lucene offers a Word 2007 parsing feature. Documents can be created directly from a Word 2007 file:


Zend_Search_Lucene_Document_Docx class uses the ZipArchive class and simplexml methods to parse the source document. If the ZipArchive class (from module php_zip) is not available, the Zend_Search_Lucene_Document_Docx will also not be available for use with Zend Framework.

Zend_Search_Lucene_Document_Docx class recognizes document meta data and document text. Meta data consists, depending on document contents, of filename, title, subject, creator, keywords, description, lastModifiedBy, revision, modified, created.

The 'filename' field is the actual Word 2007 file name.

The 'title' field is the actual document title.

The 'subject' field is the actual document subject.

The 'creator' field is the actual document creator.

The 'keywords' field contains the actual document keywords.

The 'description' field is the actual document description.

The 'lastModifiedBy' field is the username who has last modified the actual document.

The 'revision' field is the actual document revision number.

The 'modified' field is the actual document last modified date / time.

The 'created' field is the actual document creation date / time.

The 'body' field is the actual body content of the Word 2007 document. It only includes normal text, comments and revisions are not included.

The loadDocxFile() methods of Zend_Search_Lucene_Document_Docx class also have second optional argument. If it's set to TRUE, then body content is also stored within index and can be retrieved from the index. By default, the body is tokenized and indexed, but not stored.

Parsed documents may be augmented by the programmer with any other field:

'Document annotation text')

Zend Framework