News Archive
PhpRiot Newsletter
Your Email Address:

More information

Benford's Law

Note: This article was originally published at Planet PHP on 2 April 2011.
Planet PHP

Benfords Law is not an exciting new John Nettles based detective show, but an interesting observation about the distribution of the first digit in sets of numbers originating from various processes. It says, roughly, that in a big collection of data you should expect to see a number starting with 1 about 30% of the time, but starting with 9 only about 5% of the time. Precisely, the proportion for a given digit can be worked out as:

function benford($num) {
A A A A return log10(1+1/$num);

Real data does tend to fit this pretty well. For example, just leaping onto at random and grabbing a dataset - in this case a list of spending in the Science and Technology Facilities Council, I can compare the first digit to Benford's expected ones (I grabbed the Amount column out of the april 2010 data and put it into a text file, one amount per line):

$fh = fopen("data.txt", 'r');
$score = array();
$total = 0;
$nums = range(1, 9);
// Count up appearances of digits
while($data = fgets($fh)) {
A A A A $total++;
A A A A $digit = substr(trim($data), 0, 1);
A A A A if(!in_array($digit, $nums)) {
A A A A A A A A continue;
A A A A }
A A A A if(!isset($score[$digit])) {
A A A A A A A A $score[$digit] = 0;
A A A A }
A A A A $score[$digit]++;
echo "# - Data A- Benford", PHP_EOL;

Truncated by Planet PHP, read more at the original (another 2343 bytes)