Symfony CMF: why, how, when (part II)
Continuing with my post from the other day I will try to answer some of the questions that seem a bit reoccurring.
The first question I want to address is why JCR?
While we were making some decisions over the summer it became clear fairly quickly that we want a clear separation of the storage and the frontend layers. One big reason there was to become database agnostic (ORM vs. ODM vs ..). But also to ease in scaling and deployment and most importantly to ease development (single server vs. cluster). So it became clear that we will need a high level API to enable such an abstraction. JCR provides us with such an API that has already been matured over a period of a decade. Furthermore it provides reference implementation for the backend, called Jackrabbit. This way we could focus on the frontend, implementing a simplified pure PHP backend, while giving power users a full featured scalable solution out of the box.
But Java, wtf!?!
Indeed JCR was born in the Java world, though mainly because day.com wanted to leverage the JCP system to develop a truly open standard. To some extend they have failed there since in the end this caused the standard to be mostly confined to the Java world, but the lack of scripting language support is something they want to remedy in the next version of the standard. Also with Jackalope we have a compatibility layer that is able to translate the deep object heavy structures common in the Java world to more agile PHP friendly structures we defined for PHPCR that make use of simpler arrays and iterators where ever possible. Furthermore Java based Lucene solutions like Solr and ElasticSearch has been taking the PHP world by storm for fulltext searching. Installing a JVM as part of a PHP project isn't as insane as it used to be. And of course the plan is to eventually provide a pure PHP backend as an alternative meaning in the end one can also run a Symfony CMF application without PHP and thanks to the standardize API it will be possible to move between different content repository implementations as needed!
Doesn't MongoDB/CouchDB ODM already offer all of that?
First up it should be mentioned that all of these NoSQL solutions are still very young as well. So many people will be just as hesitant to adopt for example MongoDB as they would be of any other database they have not yet used. That being said a NoSQL store does lend it self better as the storage layer of a CMS than a RDBMS. The reason being is that a CMS has very specific requirements. One of the biggest clashes with RDBMS is the need to handle mostly unstructured content. At the same time a CMS usually doesn't need reporting queries (aka GROUP BY or CUBE functions) done on its content. Bulk updates are also fairly rare. Where RDBMS are optimized for equality conditions or prefix searches, a CMS has to also excel at full text searching, which out of the box even most NoSQL solutions do not fare too well with either. As a matter of fact MongoDB clearly admits this and CouchDB uses Lucene based solutions to enable full text search. JCR takes a very similar approach to CouchDB by basically providing two search API's, one is an SQL inspired full text enabled search API, which Jackrabbit implements by integrating Lucene and a PK/traversal API. The "traversal API" is something that is very important to note, since most NoSQL solutions to not really offer anything equivalent when it comes to managing tree's or even graph's of nodes and node references. Versioning is another topic that most NoSQL solutions do not offer out of the box, but JCR provides this.
In other but related news, community contributions to PHPCR ODM have been picking up. I cleaned up the unit tests yesterday. All the while we have been looking at pull implementing child nodes, which led to big changes in how we deal with ODM metadata which now culminated in the contribution on a patch to Jackrabbit. Then yesterday a new pull was opened to add lifecycle support. Awesome!