Collecting Garbage: Cleaning Up
Part one is here.
In this second part of the three part column on the new garbage collecting mechanism in PHP 5.3, we'll dive into a solution to the problem with circular references. If we look quickly back, we found that by using code like the following, an in-request memory leak is created:
Traditionally, reference counting memory mechanisms such as PHP uses, fail to address those circular reference memory leaks. Back in 2007 while looking into this issue, somebody pointed me to a paper by David F. Bacon and V.T. Rajan titled "Concurrent Cycle Collection in Reference Counted Systems" . Although the paper was written with Java in mind, I started to play around with it to see if it was feasible to implement the synchronous algorithm as outlined in the paper in PHP. At that moment I didn't have a lot of time, but along came the Google Summer of Code and we put up the implementation of this paper as one of our ideas. Yiduo (David) Wang picked up this idea and started hacking on the first version as part of the Summer of Code project.
Explaining how the full algorithm works goes slightly too far for this column, but I will try to explain the basics. First of all we have to establish a few ground rules. If a refcount is increased, it's still in use and therefore not garbage. If the refcount is decreased and hits zero, the zval can be freed. This means that garbage cycles can only be created when a refcount argument is decreased to a non-zero value. Secondly, in a garbage cycle it is possible to discover which parts are garbage by checking whether it is possible to decrease their refcount by one, and then check which of the zvals have a refcount of zero.
To avoid having to call the checking of garbage cycles with every possible decrease of a refcount the algorithm instead puts all possible roots (zvals) in the "root buffer" (marking them "purple"). It also makes sure that each possible garbage root ends up in the buffer only once. Only when the root buffer is full, the collection mechanism starts for all the different zvals inside. This diagram shows this in step A.
In step B runs the algorithm a depth-first search on all possible roots to decrease the refcounts of each zval it finds by one, making sure not to decrease a refcount on the same zval twice (marking them as "grey"). In step C the algorithm runs again a depth-first search from each root node, to check the refcount of each zval again. If it finds that the refcount is zero, the zval is marked "white" (blue in the diagram). If it's larger than zero, it reverts the decreasing of the refcount by one with a depth-first search from that point on and they are marked "black" again. In the last step (D) the algorithm walks over the root buffer removing the zval roots from there, and in the mean while checks which zvals have been marked "white" in the previous step. Every zval marked as "white" will be freed.
Now that you have a slight understanding of how the algorithm works, we will look back on how this weaves in with PHP. By default, PHP's garbage collector is turned on. There is however a php.ini setting that allows you to change this: zend.enable_gc.
When the garbage collector is turned on, the cycle finding algorithm as mentioned above is executed whenever the root buffer runs full. The root buffer has a fixed size of 10.000 possible roots (although you can change this easily by re-compiling PHP and changing a constant). When the garbage collector is turned off, the cycle finding algorithm will never run. However, possible roots will always be recorded in the root buffer, no matter whether the garbage collection mechanism has been activated with this configuration setting. If the root buffer becomes full with possible roots while the garbage collection mechanism is turned off, further possible roots will simply not be recorded. Those possible roots that are not recorded
Truncated by Planet PHP, read more at the original (another 2268 bytes)