Spatial Indexes: Fetching Data/SQLite
In a previous article I introduced 'The Flat Earth Model' and the 'The spherical Earth model'. In this article we're going to have a look at fetching a data set and importing them into a SQLite database to query from PHP. What better data set is there to import than all of the UK's pubs?
Getting the Data
In order to get a suitable data set we are going to use the data from the OpenStreetMap project. This project is mainly concerned with making an open map of the entire globe, but it also contains a vast database of points-of-interest (POI). OpenStreetMap contributes seems to be a big fan of pubs, and hence they are mapped really well. POIs can either be stored as a node (a single point with a geographical location) such as The Long Acre or as a closed way (an ordered collection of nodes where the first and last node are the same), such as Brondes Age.
I will demonstrate two methods to fetch the pubs in an XML format containing nodes and ways. The first method is with XAPI. XAPI is an interface to the OpenStreetMap database to allow users to query and filter items. In order to fetch data through it, you specify a bounding box and a predicate. A bounding box specifies the most Eastern, Southern, Western and Northern coordinates. The UK has roughly as bounding box 9.05AW, 48.77AN, 2.19AE, 58.88AN, or short: -9.05, 48.77, 2.19, 58.88. Fetching the data can simply be done by querying the server with wget:wget -O pubs.xml 'http://xapi.openstreetmap.org/api/0.6/*[amenity=pub][bbox=-9.05,48.77,2.19,58.88]'
This is going to take a long time; and will most likely just not work or time-out. The current XAPI server, written in an obscure programming language called MUMPS, is extremely unreliable and is really slow. A new version of XAPI build in Java is on the way, but right now it's limited to 10 square degrees.
Luckily, there is an alternative in the form of parsing (an extract of) the planet file. The planet file is an enormous database dump of OpenStreetMap's data. The people at Geofabrik have extracts for specific parts of the world at http://download.geofabrik.de/osm/. From them I downloaded the europe/great_britain.osm.pbf file (274 MB) for my example. With a tool called Osmosis we can then filter out all pubs into a similar formatted XML file. I am only giving the command to do our task at hand, but a detailed usage guide for Osmosis is available too. The command runs in about four minutes and looks like:./osmosis-0.38/bin/osmosis -v 5 \ \ --read-pbf file=great_britain.osm.pbf \ --tf reject-relations \ --tf accept-nodes amenity=pub,bar \ --tf reject-ways \ outPipe.0=POI \ \ --read-pbf file=great_britain.osm.pbf \ --tf reject-relations \ --tf accept-ways amenity=pub,bar \ --used-node \ outPipe.0=area \ \ --merge inPipe.0=POI inPipe.1=area \ --write-xml file=great_britain_pubs.osm
Importing the Data
The resulting XML file has two important elements: nodes and ways. The XML for The Long Acre looks like:
Important here are the latitude (51.51AN) and longitude (0.12AW) and of course, the name of the pub in the tag element.
And the XML for Brondes Age (a way) is a bit more complex, and looks like:"/
Truncated by Planet PHP, read more at the original (another 5739 bytes)