PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

Joshua Thijssen's Blog: Bloom Filters

Note: This article was originally published at PHPDeveloper on 10 April 2012.
PHPDeveloper

In this new post to his blog Joshua Thijssen describes something that can help when processing large amounts of data (like, in his example, the text of a book) to search through the information and find if a certain piece of data is in the set - a bloom filter.

Most of my co-workers never really heard of bloom filters, and I'm continuously need to explain what they are, what their purpose is and why it's a better solution than other ones. So let's do an introduction on bloom filters. [...] Bloom filters have the property of being exceptionally fast AND exceptionally small compared to other structures but it comes with a price: it MIGHT be possible that our bloom filter thinks that an element is inside our set, when it really isn't. Luckily, the reverse is not possible: when a bloom filter says something is NOT in the set, you are 100% sure that it isn't part of the set.

He explains how the filter works, noting how it's better for memory consumption and how it's possible for it to give a "maybe" response instead of ab absolute "yes" or "no". He also points out a PHP extension, bloomy that takes the hard work out of it for you.