PhpRiot
News Archive
PhpRiot Newsletter
Your Email Address:

More information

Open Letter to Gareth Heyes: Regex html Sanitisation Doesn't Work

Note: This article was originally published at Planet PHP on 18 April 4120.
Planet PHP


Image by bertboerland via Flickr

Dear Gareth Heyes,

I thank you for your response that claims Regex html Sanitisation can work.

However, I should clarify that my article, Regex html Sanitisation: Off With Its Head!, was written in the context of using Perl regular expressions in PHP to both parse and filter html. Your challenge to test htmlReg was unusual since htmlReg is written in Javascript, operates as a client side library, and utilises the browser DOM to bypass html parsing with regular expressions.

As such, htmlReg and your article title falls outside the context of my original article. I do, however, applaud the concept of using the browser DOM. While I cannot comment on the efficacy of client side filtering for cross-site scripting (XSS), the use of a DOM is a reliable strategy to bypass parsing problems. A similar approach accounts for the success of htmlPurifier. Obviously, I do not begrudge some minimal use of regular expressions on pre-parsed normalised input.

This did, however, prompt me to ponder whether such an inapplicable challenge appearing on Planet-PHP undermines my argument anyway by its mere existence and blunt title in a world populated by A.D.D. sufferers. I believed it might and so I found myself determined to crack your Javascript library over a cup of coffee and a biscuit.

The result of this quick examination cannot be publicly reported here as this would be poor reporting practice. Therefore, I will report the resulting security vulnerability by email. You now have six weeks from today's date in which to release a fixed version of htmlReg and publicly disclose this vulnerability. I trust you will ensure that all similar or related potential vulnerabilities are also fixed. It would also, optionally, be interesting to see a blog post on the effectiveness of a client side Javascript filter.