What is Mutation Testing?
Image via Wikipedia
Some time ago, in between working on Zend Framework, I booted up a couple of libraries that I really wanted to integrate into my workflow. Recently, I've been being putting these through the grindmill so they can be properly released and supported for public consumption across PEAR. Just as Mockery fell out of older work on PHPMock, Mutagenesis will fall out of another project called MutateMe. This is a short introductory article as to what Mutagenesis will do and why. In other words, what the heck is Mutation Testing?
First, some background.
The most common means of measuring confidence in a test suite is the Code Coverage metric. Code Coverage essentially checks, on a per class basis, how many of the lines of code in the class are executed by a test suite and expresses this as a percentage. For example, a Code Coverage of 85% means 85% of the lines of code in a class was executed and 15% were not. The greater the number of lines of code executed, the more confidence one can presumably have that a test suite is doing its job, i.e. verifying class behaviour, preventing the introduction of bugs, supporting refactoring, and so on.
I have a huge and insurmountable problem with Code Coverage. For starters, my average Code Coverage is closer to 80% than the 90% expected of projects such as Zend Framework. The gap is explained by me not testing what I call aobraindeada functions, i.e. methods which are either ridiculously simple, where a malfunction would quickly become self-evident, or which are marginalised (on the borders of deprecation). So Code Coverage actually increases the amount of work I need to do for very little gain and a lot of frustration.
Secondly, Code Coverage is easy to spoof or misinterpret. Since it's a metric measuring the execution of source code, you need onlyawellaexecute the source code. It's a simple matter to construct a series of wonderfully useless tests to do just that and obtain a high Code Coverage result - it's done all the time in my experience once someone's patience in writing quality unit test runs out. It is particularly evident in cases where unit tests are written after the source code is completed - a still too common practice in PHP. The less villainous flipside is that certain nuggets of source code are fundamentally difficult to test. For example, a complex algorithm suffering from poor documentation may make composing a suitable unit test near impossible. The rollout of OAuth was filled with such examples.
This leads into my opinion of Code Coverage. I view the venerable Code Coverage metric as a near pointless exercise. While it may tell how much source code a test suite exercises, it tells you nothing about the actual quality of those unit tests. They could be good tests, sort-of-good tests or absolutely horrendous tests - Code Coverage will never tell you either way. I say near pointless because there are precious few alternatives. We need something to give us a reason to trust and have confidence in test suites and Code Coverage is easy to implement and has been a part of PHPUnit since forever. So, by and large, we make do. We measure Code Coverage just to make certain some kind of unit testing was performed.
Is there nothing better?
A good unit test serves a simple purpose. It verifies a behaviour of an object. In PHP, we're more likely to verify umpteen million behaviours in a single test (count your assertions!) but we'll let that slide. Since a test verifies behaviour, it follows that a test should fail when that behaviour is changed. If a test does not fail when class behaviour is changed, it also follows that the original behaviour was not fully tested, i.e. there is a gaping hole in our test suite whether due to a flawed or missing test that could allow bugs entry into our application. So, to really stick unit test
Truncated by Planet PHP, read more at the original (another 6450 bytes)