News Archive
PhpRiot Newsletter
Your Email Address:

More information

A Stitch in Time Saves Nine

Note: This article was originally published at Planet PHP on 5 December 2011.
Planet PHP

The N+1 Problem

It is easiest to describe the N+1 problem by example. Take a look at the following bit of PHP in a hypothetical blog app:

fetchAll($stmt); // 10 queries (1 per blog post) to get // the comments for each post foreach ($posts as &$post) { $stmt = 'SELECT * FROM comments WHERE post_id = :post_id';$bind = array('post_id' = $post['id'],); $post['comments'] = $sql-fetchAll($stmt, $bind); }

We have a master set of rows (the blog posts), but we need to additionally fetch many rows of comments for each blog post. To do that, we loop through the master set of rows and issue one query for each. This gets us the many rows of comments for that blog post.

This is the N+1 problem. In order to build up a list of 10 records, we issued 11 queries: 1 for the master set of 10 records, and then 10 more (N) queries to fetch the comments. If we also needed to get multiple tags for each post, it would be 20 more, 10 queries for the comments, and an additional 10 queries for the tags, for a total of 21 queries to build 10 records.

This can quickly become a performance problem even for small sets of data with many relationships. It is especially bad when you need to fetch, for example, 5 additional sets of related data for a master set of 40,000 rows. There will 1 query to get the master set of 40,000 rows, plus 5 queries per row for the related data, for 200,001 queries in total.

The Solution

There are at least two ways to solve the N+1 problem and reduce the number of queries dramatically.

One way is to write a query that joins the comments to the posts in a single massive result set. We will see repetition of the same blog posts over and over, so we would need to loop through the result set and pick out the distinct posts and retain the multiple comments for each one. There are several drawbacks:

  1. It's hard to do LIMIT/OFFSET on the master set of rows that we want to retrieve.

  2. We get back an oversized result from the database with lots of repeated information. To extract the related data, we need to loop through the result set in PHP and discard the repeated parts of rows, keeping track of which ones were repeated and which ones are new.

  3. If we have two or more aoto-manya relationships, it becomes difficult to construct the query and then pick apart the result set into your domain objects. The repetition problem from point 2 becomes exponentially more troublesome.

Instead, I suggest a way to solve the N+1 problem that is easier to implement and more scalable across several to-many relationships. After we query for the master set of rows, we issue a single additional query to get all of the related rows in one shot, and then stitch them into the master set in a PHP loop. The following is a rewritten version of the earlier example using the query-and-stitch technique:

fetchAll($stmt);// Find the ID of each post and key a // $posts array on them. $posts = array(); foreach ($rows as $post) { $id = $post['id']; $posts[$id] = $post; } // 1 additional query to get all // comments for all the posts. $stmt = 'SELECT * FROM comments WHERE post_id IN (:post_ids)';$bind = array('post_ids' = array_keys($posts),);$rows = $sql-fetchAll($stmt, $bind); // Stitch the comment rows into the posts. // It is easy to find the right post for // the comment, because we have keyed the // posts on their ID. foreach ($rows as $comment) { $id = $comment['post_id']; $posts[$id]['comments'][] = $comment; }

We now have one additional loop in the PHP code compared to the original solution. The tradeoff is that we have saved ourselves 9 additional queries overall; we have gone from 11 (1 for the posts and 10 for the comments) to 2 (1 for the posts and 1 for the comments).

Extending The Solution

We can add rows from as many relationships as we like using the same technique. For example, if we also needed to fetch multiple tags in a many-to-many relationship for each post, we would do something like this:

Truncated by Planet PHP, read more at the original (another 4261 bytes)