News Archive
PhpRiot Newsletter
Your Email Address:

More information

Creating Pluggable Applications Using Data Sourcing

Note: This article was originally published at Planet PHP on 12 July 2010.
Planet PHP

The first versions of most projects are self-contained applications. They work as-is, without any connection to other applications. It often isn't until a later release that there is focus on interoperability. They build import/export functionality into their applications or add webservices that allow other applications to interact with it.

While this is an important steps toward application interoperability, there is often an important step missing. Most interoperable applications lack one final feature that allows full seamless integration: data sourcing, or the ability to get the data it needs from elsewhere.

With data sourcing, we are not just importing data into our applications, we are using outside sources as the source for that data without creating redundancy. A simple example is the data sourcing of the user information. Most applications have their own user table. Applications that feature data sourcing of users make it possible to tell the system to get the user data not from its internal database but from a different source, for example the database of another application, an LDAP server or a web service that provides the user data. In the case of user data, if you have five applications that each have a database of users, it would be a lot simpler to integrate those applications if you could use one of them as the master source for the user data and configure the others to refer to it.

The principle is applicable to more than just users. Groups comes to mind (particularly groups within an organization that you may want to use within your applications), and friends is another common topic (aren't you tired of befriending all your friends on every new social website?). In the case of ecommerce systems it would be great if you could use data sourcing to get the actual product data from different systems. Magento, the popular ecommerce application, has product import/export functionality, but there is no easy way to tell it to connect to a web service to get the data for products instead of looking at its own database. This makes it hard to plug Magento as an ecommerce module into a larger system; most implementations you will find have Magento at its core and other, more flexible, systems plugging into it.

Data sourcing can help applications such as Magento reach a wider audience and especially help it be used in enterprise scenarios where there are many components that together form a big system.

The concept of data sourcing is comparable to Dependency Injection; instead of hardwiring the dependencies within the software, you tell it its dependencies so that at runtime it can connect to the correct components and get its data.

Two Flavours of Data Sourcing

When you want to use data sourcing, there are in essence two ways to do this.


The first one is synchronisation. This means that the data is still local to the application, but it is (periodically or on the fly) synchronized with external applications. For the application being plugged into a system, this generally means that hardly any modifications are necessary; a script needs to be written that simply synchronises the data between the sources.

While this works in some situations, it is undesirable in most situations. If you have multiple applications you will end up with multiple copies of the data, and run the risk of having data out of sync between the sources. It can also lead to ownership or privacy issues. One application should be the owner of a data set; if that data set is exported to other applications then you lose a certain amount of control over it.


The better option is federation. Federation basically means that you will get the data from its source when you need it. You could still cache for performance reason but there is no mass synchronisation going on. This is the method we will be looking at in the rest of this article, as it is the one that is the most interesting, but also the one that requires work in the applications that want to make use of federation.

Implementing Data Sourcing

Imagine you have built an application that shows you a person's wishlist. You may have a query in there somewhere that joins the wishlist table with the user table and a few auxiliary tables with categorisation information. If the application is built like that, and you decide to install the application within an environment with multiple applications, it will be hard to make the application use user accounts from another application; you will have to not only rewrite all the queries, you will also have to find a way to connect the application data with the external user accounts. There are a few

Truncated by Planet PHP, read more at the original (another 15932 bytes)