Symfony CMF: why, how, when (part III)
And now for the final post in this series, unless I start to get some serious questions :)
But if other NoSQL databases are not trustworthy, why are JCR implementations?
First up, plenty of people run very success large scale mission critical applications on top of MongoDB and friends. The interesting bit is that JCR is an API and the implementations are free to choose any database for final persistence. Jackrabbit for example can be installed using any RDBMS for storage or even the file system. Our own PHP backend implementation can easily support MongoDB, CouchDB or any RDBMS.
Wait a second, so why couldn't we have gone with a NoSQL database again?
Again JCR is an API that is intended to solve the CMS problem and exactly that. Emphasis on API. Because its an API you can implement it on top of different storage layers maintaining whatever preferences you might have. So if you also need an RDBMS to storage reporting data, then you might prefer to store the CMS content also inside the same RDBMS. In the same way in case you are already using another NoSQL solution for some other content, then you might appreciate using it also for the CMS content. So with JCR we simply ensure we have a high level API that enables this choice while leverage a common code base in the frontend enabling the entire community.
Ok, so how do JCR implementations and databases play together?
Lets take Jackrabbit as an example, which basically uses two "databases", one is used for final persistence and the other for full text search. The choice of final persistence layer is mostly depended on what sort of tools one requires. For example transaction capabilities are depend on the storage layer. Same goes for replication and clustering. Backup and failover are also at least to some extend depend on the storage layer. Again JCR is an API (I know I am repeating myself). Now Jackrabbit basically stores the contents of nodes as a blob indexes by its PK and some ways to efficiently provide tree traversal. It delegates to Lucene for the implementation of the normal search. Again this model works well because content in a CMS isn't used in reporting queries. JCR implementations can use these assumptions to focus on efficiently solving the CMS problem rather than trying to be general purpose solutions.
But I have user stats, orders and inventory, that I do need to reporting queries on! What now?
Simple: Do not use the JCR API for this data. Simply store references where needed. Use the best tool for the job. So in a webshop you might use Symfony CMF to store the production information, while using an RDBMS to manage orders and inventory. Eventually we will likely be able to provide a solution that will help manage these references so that for example it would be possible to lazy load references in both directions.
All dandy, but I really do not want to install Java
Fair enough. Right now work hasn't started on a PHP backend. That being said a simple implementation shouldn't take too long to write. Adding versioning might not make that initial implementation, though with CouchDB one could come up with a solution for simple versioning fairly easily. So just start it if you need it. People will be ready to help. Liip for example doesn't have that big of a need for this, so we haven't started it. Most of our projects use Java based Solr anyway, so there isn't any reason to not simply use Jackrabbit. But we do recognize that its important for the long term success of the Symfony CMF to give as much flexibility in terms of the requirements as possible. After all this was one of the main reasons for choosing JCR in the first place!
So when will this all be ready?
I am hoping that we can have a first release this summer that provides a basic toolchain and by the end of the year I hope we have a decent PHP backend implementation. Then again all of this could come sooner if we get more contributions from the community. The recent increase in progress on PHPCR ODM makes me quite hopeful.


