News Archive
PhpRiot Newsletter
Your Email Address:

More information

PHPCR on Doctrine DBAL

Note: This article was originally published at Planet PHP on 6 August 2012.
Planet PHP

So I have noticed that people don't like it when I talk about all the cool stuff Jackrabbit can do. Many people are still scared of running Java stuff in production which I guess is to be expected since PHP shops tend to .. guess what .. PHP. So in this post I just want to talk about all the cool features we have ready to use in the pure PHP Doctrine DBAL based implementation of PHPCR. Just to say it again: PHP, no Java. So first up the implementation with all its features works with MySQL, PostgreSQL and SQLite. Given that we started with MySQL we ended up relying on few specific MySQL behaviors. These are all gone now, so adding another RDBMS is likely just a half days work, maybe a day if you look at the code base for the first time, then again the relevant code to edit are just a few places in two classes (Jackalope\Transport\DoctrineDBAL\Client and Jackalope\Transport\DoctrineDBAL\Query\QOMWalker). At any rate the implementation essentially gives you a tree document store on top of an RDBMS with support for references and so called node types allowing you to optionally add constraints to your document structure. Documents can have any number of properties which can choose from a wide set of datatypes like string, integer, date, url and binary. You also get both an OO and string based query API.

Lets briefly step back and discuss why the features I just mentioned are even defined in PHPCR. First up I think the fact that this API gives you native support for tree structures is very important. If all you are dealing with is a couple of pages in a flat structure then likely your needs are so trivial that you likely don't need a CMS to begin with though you might still choose to use a CMS due to expected future needs. But if you get to a half a dozen or more pages, quickly you will want to structure the pages into a tree structure. Furthermore a page itself might even be constructed of a tree structure of elements. Now you might be thinking great, but I want to reuse content pieces all over the site without having to duplicate the content. For this purpose PHPCR supports the concept of WEAKREFERENCEs and REFERENCEs. The difference is that the later also checks for referential integrity. So you can simple references documents to reuse documents inside your tree structure. This is supported in the implementation by all of the above listed RDBMS.

Now the next big feature is node types. In the opening paragraph I called this solution a "document store". I intentionally didn't call it a NoSQL database because while there is a quite a hype around them, followed by quite a bit of backlash. But what the term implies is more or less irrelevant to what people associate with NoSQL database. If CouchDB uses SQL or not isn't what makes it unique from an RDBMS, instead the fact that it doesn't have the concept of a fixed number of columns per table into which every document needs to fit in is what differentiates it. MongoDB came up with their own query syntax, but it could just as well been SQL. Well actually it gets worse, the departure from tables and columns is then also coined "schema less", which is even more confusing. MongoDB has the concept of collections, that is a schema. Most other "NoSQL" solutions do not require this, so at most one could say that they have no explicit schema, but they certainly have an implicit schema, ie. the structure of their real world content defines a de-facto schema. But actually many solutions even provide ways to define an explicit schema. For example in CouchDB you can add validation rules and you can define views etc. In PHPCR this feature is provided via node types. This optional feature can be used to prevent the content structure to get out of control. It can also be used to make it possible for tools introspecting the data structure to be able to rely on the availability of certain properties.

Lets talk about how to use this in practical terms. Your CMS might define a "page" node type. As children you might only allow nodes of type "widget" or type "media". You might define additional node types that extend from "media" like "image", "video" etc. This way people could use a generic browser on top of their content repository but could still be prevented from breaking the application by setting up a bogus content structure. Furthermore the type "media" would also define that certain properties must be set like "mime type", "creation time" etc. It is now also possible to wri

Truncated by Planet PHP, read more at the original (another 6240 bytes)