James Morris' Blog: Parsing HTML with DOMDocument and DOMXPath::Query
In the latest post to his blog James Morris looks at using XPath's query() function to locate pieces of data in your XML.
The other day I needed to do some html scraping to trim out some repeated data stuck inside nested divs and produce a simplified array of said data. My first port of call was SimpleXML which I have used many times. However this time, the son of a bitch just wouldn't work with me and kept on throwing up parsing errors. I lost my patience with it and decided to give DomDocument and DOMXpath a go which I'd heard of but never used.
He includes a code (and XML document) example showing how to extract out some content from an HTML structure - grabbing each of the images from inside a div and associating them with their description content.