After my last post, I decided to do some benchmarking. For this benchmark I used the US data from Geonames.org. I inserted all the data (1,886,420 records) and searched for a big area around new york (between 41.3665028663272, -72.41912841796875 and 40.113789191575236, -75.83038330078125). We're expecting to get 38259 records back for this query.
Test 1: Selecting on longitude, latitude
- SELECT SQL_NO_CACHE lat,lng FROM geotest WHERE
- lat < 41.3665028663272 AND
- lng < -72.41912841796875 AND
- lat > 40.113789191575236 AND
- lng > -75.83038330078125;
No index 1.73s. With B-Tree index on latitude 0.72s.
Test 2: Using spatial extensions and POINT field
- SET @rect = 'POLYGON((41.3665028663272 -72.41912841796875,41.3665028663272 -75.83038330078125,40.113789191575236 -75.83038330078125,40.113789191575236 -72.41912841796875,41.3665028663272 -72.41912841796875))';
- SELECT SQL_NO_CACHE astext(location) from geotest where intersects(location,GeomFromText(@rect));
Time taken without index: 9.52s. With a spatial index: 0.73s.
Test 3: Using morton number
- SELECT SQL_NO_CACHE lat,lng FROM geotest WHERE
- morton > 3667198027933142835 AND morton < 3671111582099533095 AND
- lat < 41.3665028663272 AND
- lng < -72.41912841796875 AND
- lat > 40.113789191575236 AND
- lng > -75.83038330078125;
Time taken without index: 0.78s, with index on on morton: 0.65s.
Conclusion
In the table below 'small' is around times square, 'medium' is new york city and 'large' is about 2/3rd of the US. I didn't bother doing all benchmarks for the ones I knew were slower.
| method | small | medium | large |
|---|---|---|---|
| plain select | 1.73s | ||
| index on latitude | 0.72s | ||
| using point field | 9.52s | ||
| using point field + spatial index | 0.00s | 0.73s | 18.82s |
| using morton number | 0.78s | ||
| index on morton | 0.00s | 0.65s | 3.23s |
So it seems like using the morton number is a bit faster than using the spatial index, but there's not a huge difference considering this relatively large dataset. Using the spatial index has a number of benefits, the biggest being that you're easily able to select on much more complex queries (polygons and such). The major benefit of the morton number methodology is that it's significantly faster, especially as your dataset grows and you're able to use InnoDB, which can be much better performing if you're expecting a lot of updates.
Early update: my coworker kevin mentions the spatial queries are likely slowed down because 'astext' is called for every row. I'll have to do these again with separate lat/lng fields.
Update 2: Adding a lat and lng field and selecting on those is actually even slower (consistently 0.91s).
Update 3: With a smaller resultset both the spatial index and the morton index are both pegged at 0.00s. With a much larger resultset (big chunk of the US) I got 18.82s for the spatial index, and 3.23s for the morton index.

In your Test 3: Using morton number, do you really need to include the boundary box in your query? What use is the motron if your continuing to include the boundary box?
You do,
Using just the morton number can also include items from outside the box, because the type of indexing is a Z-curve.
The absolute worst case scenario is getting 50% of all items on the map, if you are looking at the exact center of the map (regardless of your 'zoom level').
Hi, I came across your blog when doing some searches on indexing geodata... Your posts have been extremely insightful, but I have a few questions (which may/may not be stupid)
My current situation is that I'm storing lat/lng as regular float columns in mysql. I'm searching this data based on a center point and a radius.
1. My query uses the spherical law of cosines to calculate distance from the center point
2. In my application code I calculate a bounding box based on the earth's radius/center point/search radius
3. Not surprisingly, the query rarely uses my lat/lng indexes
The bounding box is essentially useless in my current code, but after reading about your experiments with the Morton number I wonder if it would be useful if I converted those coordinates to Morton numbers and used that instead of the bounding box (or in addition to)?
My understanding of this and the math behind it is quite slim, and just enough to get my application working so bear with me ;)
Also, PHP is a bit of a pain when it comes to dealing with unsigned integers, any tips for calculating the Morton number in this case?
Thank you!
@Daniel,
The biggest benefit of this technique is speeding up the bounding-box index. So yes, I do believe you will benefit from this.
You do need to include the bounding box on top of this though, as I mentioned in my first comment, you will definitely retrieve items outside your box.