Fast MySQL Geo-Referencing IPs with MaxMind GeoIP Database

September 22, 2010 4 comments

The fastest way I found when performing “ip to country” lookups using MySQL and MaxMind GeoIP database and others is what I describe on this post. Read on.

The most important thing is the index. Add a primary or unique index to (`ip_to`, `ip_from`, `country`), assuming our table has at least the 3 most needed fields (`ip_from`, `ip_to`, `country`).

This is the CREATE TABLE for our table:

CREATE TABLE `ip2country` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`, `ip_from`, `country`)
) ENGINE=MYISAM;
 

Note that the index has to start with `ip_to` and not `ip_from`… and by adding both `ip_from` and `country` together with `ip_to`, MySQL will do the “Using index” optimization. In order to use the “Using index” optimization, the index has to be on all the 3 fields (`ip_to`, `ip_from`, `country`).

And the query to use is the following:

SELECT `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;

The other trick is to add the LIMIT 1. This tells MySQL to stop searching once it finds the first match, if we don’t MySQL will keep searching for more matching rows. We know there is only one possible match, the index is even unique, but MySQL doesn’t see it that way so we need to tell MySQL LIMIT 1. MySQL doesn’t stop on the first match automatically because it assumes there are other possible matches, there could be overlapping ranges. Only we know there can’t be overlapping IP ranges, MySQL doesn’t.

This is what EXPLAIN tells us when executing our query:

mysql> EXPLAIN SELECT `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | ip2country | range | PRIMARY       | PRIMARY | 4       | NULL | 134807 | Using where; Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+--------+--------------------------+
 

An added bonus is to use SQL_CACHE assuming we have query_cache_type=2 in the my.cnf file. So the query would look like this:

SELECT SQL_CACHE `country` FROM `ip2country` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;

Using MEMORY tables for even faster queries

For even faster queries we can use the MEMORY engine. Because MEMORY doesn’t take advantage of the “Using index” optimization we only need an index on `ip_to`. MEMORY tables are faster than MYISAM even without that optimization, but make sure the index is of type BTREE and not HASH as that’s the default for MEMORY tables.

The only disadvantage of MEMORY tables is that once MySQL is restarted or the server goes down all data in them is lost. Not a problem, the solution is the following:

  1. To have two tables, one MYISAM named `ip2country_disk`, serving as backup, and another MEMORY named `ip2country_mem`.
  2. Create a simple text file with the following content INSERT INTO `ip2country_mem` SELECT * FROM `ip2country_disk`;. This is known as a MySQL initialization file.
  3. Add init-file=/path/to/filename.ini (use your own path here) to my.cnf under the [mysqld] section.

This way we never have to worry about losing the data in the MEMORY table, but both tables will have to maintained when updating the data.

 

This is the CREATE TABLE for the MYISAM table:

CREATE TABLE `ip2country_disk` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`, `ip_from`, `country`)
) ENGINE=MYISAM;
* The index on the `ip2country_disk` table is not really necessary if we are going to query only the `ip2country_mem` table. We can save a few bytes by avoiding the index here.

And this for the MEMORY table:

CREATE TABLE `ip2country_mem` (
  `ip_from` int unsigned NOT NULL,
  `ip_to` int unsigned NOT NULL,
  `country` char(2) NOT NULL,
  PRIMARY KEY (`ip_to`) USING BTREE
) ENGINE=MEMORY;
 

This is what EXPLAIN tells us:

mysql> EXPLAIN SELECT `country` FROM `ip2country_disk` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+
| id | select_type | table           | type  | possible_keys | key     | key_len | ref  | rows   | Extra                    |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+
|  1 | SIMPLE      | ip2country_disk | range | PRIMARY       | PRIMARY | 4       | NULL | 134807 | Using where; Using index |
+----+-------------+-----------------+-------+---------------+---------+---------+------+--------+--------------------------+

mysql> EXPLAIN SELECT `country` FROM `ip2country_mem` WHERE 123456789 BETWEEN `ip_from` AND `ip_to` LIMIT 1;
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table          | type  | possible_keys | key     | key_len | ref  | rows   | Extra       |
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
|  1 | SIMPLE      | ip2country_mem | range | PRIMARY       | PRIMARY | 4       | NULL | 134722 | Using where |
+----+-------------+----------------+-------+---------------+---------+---------+------+--------+-------------+
 

You can skip the whole MEMORY stuff and just use the MYISAM disk based table together with SQL_CACHE which is already pretty good.

All queries were tested with MySQL 5.1.47, you may get different results with older versions.

Hope this helps someone seeking the best performance out of MySQL for “ip to country” lookups. Feel free to leave comments below.