Indexing geo-data 3: In practice

Since my last post I've found out that using the 'morton' number to index spatial number is also referred to as the Z-order.

To index using this order, you can use this stored function:

  1. CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC
  2. BEGIN
  3.  
  4. -- 11930464 is round(maximum value of a 32bit integer / 360 degrees)
  5.  
  6. DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;
  7.  
  8. SET @lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
  9. SET @lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
  10. SET bit = 1;
  11.  
  12. WHILE bit <= @lat || bit <= @lng DO
  13.  
  14. IF(bit & @lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
  15. IF(bit & @lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
  16.  
  17. SET pos = pos + 1;
  18.  
  19. SET bit = 1 << pos;
  20.  
  21. END WHILE;
  22.  
  23. RETURN morton;
  24. END;

Some caveats

  • Since the function is using floating-point numbers, there will be rounding errors. These are generally very small, and for us well within the acceptable margin of error.
  • More significantly, this function assumes euclidean geometry (i.e.: a flat surface). The earth obviously isn't, so as you get closer to the poles you might get back results from outside your rectangle, or miss results from within.
  • It's not recommended to use this function directly in your WHERE clause. Even though the function is marked deterministic (i.e.: it will always yield the same results for the same arguments), the MySQL query optimizer currently ignored that modifier. So, set it in a temporary variable first (SET @number =) and use the variable in your where clause.
  • Don't forget dealing with queries spanning over the -180°, 180° longitude line.
  • Also store your actual longitude and latitude values in the DB. This function is not intended to be exact.
  • When you do your queries, select both on the morton number, and longitude and latitude.

A better way to do it?

In my research I've found the Hilbert Curve to be an even better algorithm, but haven't yet gone through the effort of trying to express it in SQL.


5 Responses to Indexing geo-data 3: In practice

  1. 941 Roland Bouman 2009-03-22 12:17 pm

    Hi!

    I just read your posts on geo indexing, and from there i ended up reading wikipedia on the matter. Interesting stuff, thanks!

    I just want to point out there is an error in your MySQL getGeoMorton function. The line

    RETURN @lng;

    Should not be there. I also think there are a number of things that could be improved in your implementation. For example, there is no need to use the user-defined varibles (the @ variables). You will find that these are quite a bit slower than proper local variables.

    There are more performance optimizations possible (and with MySQL stored functions, you need all the speed you can get). Here's what I came up with:

    DELIMITER go
    DROP FUNCTION IF EXISTS f_morton;
    go

    CREATE FUNCTION f_morton(
    p_longitude DOUBLE
    , p_latitude DOUBLE
    )
    RETURNS BIGINT UNSIGNED
    DETERMINISTIC
    BEGIN
    DECLARE v_bit BIGINT UNSIGNED
    DEFAULT 1;
    DECLARE v_shift BIGINT UNSIGNED
    DEFAULT 1;
    DECLARE v_morton BIGINT UNSIGNED
    DEFAULT 0;
    DECLARE v_latitude BIGINT UNSIGNED
    DEFAULT CAST((p_latitude + 90) * 11930464 AS UNSIGNED);
    DECLARE v_longitude BIGINT UNSIGNED
    DEFAULT CAST((p_longitude + 180) * 11930464 AS UNSIGNED);

    WHILE v_bit <= v_latitude="" ||="" v_bit="" <="v_longitude" do="" if="" &="" v_longitude="" then="" set="" v_morton="" :="v_bit" |="" v_shift;="" end="" if;="" v_shift="" <<="" 1;="" while;="" return="" v_morton;="" end;="" go="" delimiter="" ;="" some="" testing="" indicated="" this="" is="" about="" 20-25%="" faster.="" i="" hope="" it="" helps!="" cheers,="" roland="">

  2. 942 Evert 2009-03-22 4:02 pm

    Dropped that return line. I must have left it in after debugging; Thanks for that Roland!

  3. 943 Evert 2009-03-23 1:52 am

    jbland,

    From what I read there is that geohashing uses the exact same concept for encoding lat/long and shares the same limitations. The difference is that I encode to a 64bit integer, geohashing encodes this number further as a string.

    Using the integer will give you better performance.

    Evert

  4. 944 jbland 2009-05-12 4:44 pm

    Evert,
    im thinking of going with morton numbers to help with clustering, but im unsure about how to dial up/down precision... any pointers

  5. 945 Evert 2009-05-12 5:51 pm

    It's not really easy to calculate range, what you could do is simply drop bits off the bottom, like: morton XOR 0x1 to drop the last bit, morton XOR 0x2 for the 2 last bits, etc..

    That way you might not get the best possible clusters, but it could definitely push you in the right direction.



About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.