Spatial Indexes: Solr
In two previous articles I introduced the spherical Earth model, using SQLite as a geographical data storage and using MySQL as a geographical data storage. In this article we're going to have a look at importing the data into something else than a relational database: the search platform Solr. (Yes, I know I've skipped PostgreSQL, but I'll come back to that).
Solr is "the popular, blazing fast open source enterprise search platform from the Apache Lucene project." Since version 3.1, Solr has support for spatial search, including geospatial search. It can store coordinates in two different field types: solr.PointType for n-dimensional points, and solr.LatLonType for a two-dimensional point for geospatial search. The main difference is that with solr.PointType, calculations are done according to the flat Earth model, and solr.LatLonType does calculations for a spherical Earth model-which is just what we need.
In this example we will use the default Solr configuration, unless noted otherwise. First we download Solr 3.1 and then untar it with tar -xvzf apache-solr-3.1.0.tgz. Then we edit example/solr/conf/solrconfig.xml to comment out the section on Query Elevation Component so that it looks like:
See here for the reason.
Setting Up Solr
Before we can start using Solr, we need to define a schema. This schema is quite analogous to a database schema, except that Solr only has one table. By default Solr comes with a schema defining lots of fields in example/solr/conf/schema.xml. I've changed the whole section to look like:
I've defined specific fields for the type of source (1=node, 2=way/area), the name of the amenity (name of the pub, hospital, etc), what sort of amenity it is (cafe, bank, etc), the address and postcode, the phone number, the type of cuisine as well as the location. For the location I've also defined two subtypes: location_0_coordinate and location_1_coordinate. Solr requires this for multi-dimensional types such as solr.LatLonType.
The types for each field are also defined in schema.xml. text is for a string that is analysed and tokenized (broken down in smaller parts) so that you can search in only parts of the text, string is for a string of characters that is not analysed or tokenized, and location contains our latitude/longitude point. Each of those types are associated with a specific Java class with an entry such as:
The subFieldSuffix in this line configures the suffix in the fields location_0_coordinate
Truncated by Planet PHP, read more at the original (another 11167 bytes)