PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

PHP Base-62 encoding

Note: This article was originally published at Planet PHP on 11 August 2011.
Planet PHP

There's a really horrible bug (though they won't call it that!) in Apache's mod_rewrite that means that urlencoded inputs in rewrites get unescaped in their transformation to output patterns. The bug actually remains unfixed, though a workaround first appeared in Apache 2.2.12 (which wasn't all that long ago). Put it like this: if you're not using the [B] flag in your mod_rewrite rules, your site is probably only working due to blind luck.

With that in mind, a few years ago I spent ages looking for a base-62 encoder/decoder for PHP to replace mod_rewrite's broken urlencoding handling. Nobody seemed to have the slightest interest in writing one. Base-62 is interesting as it can be made both URL and DNS safe, unlike Base-64 as it only includes [0-9a-zA-Z]. As a workaround for the above bug, I was interested in base-62 encoding URLs for embedding in redirects. At the time I wrote something using bc_math, but it was very slow. I eventually gave up on that and switched to base-64, which led to occasional URL corruption. If you include hashes in URLs, keeping them in the default hex representation is quite wasteful, and can cause issues with line length in email. Having hashes in base-62 is a nice way of reducing the size.

There are a few posts on base-62 in PHP, notably this one and this one, but they make the assumption that you're talking about a numeric value, and while a hash is a numeric value, it's way too big for PHP to handle as an integer. I wrote my own at the time, but it was very slow (and weirdly got ripped off by some dickhead and passed off as his own, despite that fact that I said it was crap!).

Since then, the gmp and bc_math extensions were improved in PHP 5.3.2, and now they handle (usefully) up to base-62. So here's a simple function for getting a hash in base-62:

function base62hash($source) {
A A A A return gmp_strval(gmp_init(md5($source), 16), 62);
}

and for converting to and from base-16 hashes:

function hash16to62($hash) {
A A A A return gmp_strval(gmp_init($hash, 16), 62);
}

function hash62to16($hash) {
A A A A return gmp_strval(gmp_init($hash, 62), 16);
}

I could still use a proper base-62 encoder for longer arbitrary strings, but at least now it should be simpler to write something iterative now that these extensions have (ahem) their bases covered.