Home > MySQL > Fast MySQL Geo-Referencing IPs with MaxMind GeoIP Database

Fast MySQL Geo-Referencing IPs with MaxMind GeoIP Database

The fastest way I found when performing “ip to country” lookups using MySQL and MaxMind GeoIP database and others is what I describe on this post. Read on.

The most important thing is the index. Add a primary or unique index to (`ip_to`, `ip_from`, `country`), assuming our table has at least the 3 most needed fields (`ip_from`, `ip_to`, `country`).

This is the CREATE TABLE for our table:

CREATE TABLE `ip2country` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`, `ip_from`, `country`)
) ENGINE=MYISAM;
 

Note that the index has to start with `ip_to` and not `ip_from`… and by adding both `ip_from` and `country` together with `ip_to`, MySQL will do the “Using index” optimization. In order to use the “Using index” optimization, the index has to be on all the 3 fields (`ip_to`, `ip_from`, `country`).

And the query to use is the following:

SELECT `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;

The other trick is to add the LIMIT 1. This tells MySQL to stop searching once it finds the first match, if we don’t MySQL will keep searching for more matching rows. We know there is only one possible match, the index is even unique, but MySQL doesn’t see it that way so we need to tell MySQL LIMIT 1. MySQL doesn’t stop on the first match automatically because it assumes there are other possible matches, there could be overlapping ranges. Only we know there can’t be overlapping IP ranges, MySQL doesn’t.

This is what EXPLAIN tells us when executing our query:

mysql> EXPLAIN SELECT `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | ip2country | range | PRIMARY       | PRIMARY | 4       | NULL | 134807 | Using where; Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
 

An added bonus is to use SQL_CACHE assuming we have query_cache_type=2 in the my.cnf file. So the query would look like this:

SELECT SQL_CACHE `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;

Using MEMORY tables for even faster queries

For even faster queries we can use the MEMORY engine. Because MEMORY doesn’t take advantage of the “Using index” optimization we only need an index on `ip_to`. MEMORY tables are faster than MYISAM even without that optimization, but make sure the index is of type BTREE and not HASH as that’s the default for MEMORY tables.

The only disadvantage of MEMORY tables is that once MySQL is restarted or the server goes down all data in them is lost. Not a problem, the solution is the following:

  1. To have two tables, one MYISAM named `ip2country_disk`, serving as backup, and another MEMORY named `ip2country_mem`.
  2. Create a simple text file with the following content INSERT INTO `ip2country_mem` SELECT * FROM `ip2country_disk`;. This is known as a MySQL initialization file.
  3. Add init-file=/path/to/filename.ini (use your own path here) to my.cnf under the [mysqld] section.

This way we never have to worry about losing the data in the MEMORY table, but both tables will have to maintained when updating the data.

 

This is the CREATE TABLE for the MYISAM table:

CREATE TABLE `ip2country_disk` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`, `ip_from`, `country`)
) ENGINE=MYISAM;
* The index on the `ip2country_disk` table is not really necessary if we are going to query only the `ip2country_mem` table. We can save a few bytes by avoiding the index here.

And this for the MEMORY table:

CREATE TABLE `ip2country_mem` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`) USING BTREE
) ENGINE=MEMORY;
 

This is what EXPLAIN tells us:

mysql> EXPLAIN SELECT `country` FROM `ip2country_disk` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | ip2country_disk | range | PRIMARY       | PRIMARY | 4       | NULL | 134807 | Using where; Using index |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+

mysql> EXPLAIN SELECT `country` FROM `ip2country_mem` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table          | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | ip2country_mem | range | PRIMARY       | PRIMARY | 4       | NULL | 134722 | Using where |
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
 

You can skip the whole MEMORY stuff and just use the MYISAM disk based table together with SQL_CACHE which is already pretty good.

All queries were tested with MySQL 5.1.47, you may get different results with older versions.

Hope this helps someone seeking the best performance out of MySQL for “ip to country” lookups. Feel free to leave comments below.

Advertisements
  1. September 22, 2010 at 5:22 pm

    João,

    You claim that the above method is the “fastest”, but I don’t see that you’ve done any latency (ms) or throughput (qps) measurements. Have you?

    Regards,

    Jeremy

    • September 22, 2010 at 5:49 pm

      Hi Jeremy,

      Thanks for stopping by. I did do all kinds of tests and it’s the method I’m currently using on the websites I develop. It is the fastest for me, not necessarly the fastest possible 🙂 At least on my environment it works the best.

      I don’t have data on the tests to post, not like you did on your site which was pretty well done. As you can see the only diference from your post when using BETWEEN is that you should also use LIMIT 1 and have the index in ip_to, not ip_from.

      I think it’s also better than what Andy Skelton posted on his and on your site because you don’t have to fill the gaps nor do you have to check for the result after the row is retrieved. Also you don’t have to use ORDER BY! However his solution’s performance may be similar to mine, my tests indicated that.

      Would be great if you updated your post with this method because you do good benchmarking 🙂

      João

  2. September 22, 2010 at 6:43 pm

    João,

    It’s been several years since I ran those benchmarks, so I’m not really in a position to re-run them easily. However, I did test nearly every possible variant of BETWEEN, and the benchmark numbers I included were from the fastest variant I could find. It did have an index, and was using it well (although I don’t remember what exactly it was on). I do remember though, that at the time, using LIMIT actually made the query performance worse–I’m not sure if that’s still true.

    In any case, it’s not that interesting to talk about “fastest” unless you do actual benchmarking, as in terms of human time they are all too fast to humanly measure. It’s when you put them under load or you try to process a lot of IP addresses at once that one would falter and the other succeed. What will happen in those cases is not terribly obvious from the single query performance or the EXPLAIN output — that’s why I did the benchmark runs in the first place.

    Regards,

    Jeremy

    • September 22, 2010 at 7:39 pm

      Hi Jeremy,

      I don’t really have a test bench like you did on your blog, but the method I’m using currently in production is indeed faster than what I was using before.
      The LIMIT 1 does make a difference but only together with the index on ip_to, it won’t perform well if the index is on ip_from. I’m using MySQL 5.1.47, perhaps the version you used at the time of your tests was not well optimized for this kind of query as you said LIMIT just made it worse.

      Regards,
      João

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: