PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

The EAV Data Model

Note: This article was originally published at Planet PHP on 21 October 2010.
Planet PHP

One problem many developers encounter while defining and analysing data requirements is the situation where a number of different attributes can be used to describe an object, but only few attributes actually apply to each one. One option is to create a table with a column representing each attribute; this is suitable for objects with a fixed number of attributes, where all or most attributes have values for a most objects. However, in our case we would end up with records where majority of columns would be empty, because attributes may be unknown or inapplicable.

To solve the above problem you can apply the EAV (Entity, Attribute, Value) model. This pattern is also known under several alternative names including aoobject-attribute-valuea model and aoopen schemaa. In the EAV data model only non-empty values are stored in database, where each attribute-value (or key-value) pair describes one attribute of a given entity. EAV tables are often characterized as aolong and skinnya; aolonga refers to multiple rows describing entity, and aoskinnya to the small number of columns used. In this article we will describe the EAV model, its implementation, and show applications of using it in the real world - including examining how Magento makes use of this pattern.

Object attributes are stored in a table with three columns: entity, attribute and value. The entity represents data item being described, for instance a product or a car. The attribute represents data that describes an entity, for instance a product will have a price, weight and many more characteristics. The value is the value of that attribute, for example our product might have an attribute price of A9.99. Additionally values can be segregated based on data type, so there would be separate EAV tables for strings, integer numbers, dates and long text. Splitting the types is done to help support indexing and let the database perform type validation checks where possible.

Sparseness of Attributes

In maths and computer science, if an object only has a few attributes from a potentially large number, we call that a aosparse matrixa. When we talk about the EAV model, we use the term aosparsea to describe attributes where most have no value.

To illustrate this, lets look at a receipt from a supermarket. The supermarket has thousands of products in stock, with new ones being introduced on daily basis and others withdrawn from sale. When customer buys 5 products the receipt only lists details of items actually purchased, one product per row. The receipt does not list every product in the store that customer could have purchased, so we say the customer's receipt is sparse.

In database terms the entity is the sales receipt, with information such as transaction id, date and time, store location, etc. Each detailed line in the receipt corresponds to a record in sale line table and stores an attribute and one or more values. In this scenario an attribute is a product purchased by the customer. The values are quantity, unit price, discount and total price.

The above example illustrates the sparseness of attributes (a customer buys only a selection of the available products) and introduces us to a new term: row modelling. The table is row modelled where the series of facts describing an entity are recorded as multiple rows. Every new set of facts is stored in database as additional rows rather than addtional columns. Row modelling is a standard data modelling technique in designing databases. It should only be used where following two conditions are met:

  • the data for a particular entity is sparse
  • the data is vulnerable to change

Row modelling is inappropriate where sparseness and volatility don't exist, in such case traditional column modelling should be used.

Example of Row Modelling

In this example we have three entities: Product, Customer and Invoice. Both Products and Customers are standard relational tables.

As mentioned earlier in this article, there are two conditions that need to be met for row modelling to be a good choice; the data for a particular entity is sparse, and it is vulnerable to change

We know that products are constantly fluctuating with new being introduced and old withdrawn. At the same time, the invoice table cannot have a column for each product as this would be impractical.

The invoice table contains main information about the event of sale; the customer, the date and the time, and the invoice id. Each invoice is then described by rows recorded in the invoice_lines table. Each row specifies which product was purchased, the price per unit, and the quantity.

"/

Truncated by Planet PHP, read more at the original (another 19708 bytes)