![]() In SQL/GIS terms, each IP range is represented by a 5-point rectangular POLYGON like this one, representing the IP range of 3.0.0.0 – 4.17.135.31: The way this works is that each IP range of ( ip_from, ip_to) is represented as a rectangular polygon from ( ip_from, -1) to ( ip_to, +1) as illustrated here: Geo-referencing an IP address to a country boils down to “find which range or ranges this item belongs to”, and this can be done quite efficiently using spatial R-tree indexes in MySQL’s GIS implementation. While it probably isn’t the first thing that would come to mind, MySQL’s GIS support is actually perfect for this task. In fact I haven’t been able to meaningfully improve on the performance at all. The reason for this is that it’s an open-ended range, and it is impossible to close the range by adding anything to the query. Unfortunately, while simple and natural, this construct is extremely inefficient, and can’t effectively use indexes (although it can use them, it isn’t efficient). WHERE INET_ATON("4.2.2.1") BETWEEN ip_from AND ip_to A simple query to search for the IP 4.2.2.1 would be: The natural thing that would come to mind (and in fact the solution offered by MaxMind themselves 2) is BETWEEN. For example, one row from the GeoIP data (without the redundant columns) looks like: ip_from ![]() Once the data has been loaded into MySQL (which will be explained in depth later), there will be a have a table with a range (a lower and upper bound), and some metadata about that range. This is redundant with the country code if you have a lookup table of country codes (including MaxMind’s non-ISO codes), or if you make one from the GeoIP data. country name - The full country name of the same.country code - The 2-letter ISO country code for the country to which this IP address has been assigned, or in some cases other strings, such as “A2” meaning “Satellite Provider”.ip from, ip to (integer) - The same start and end IP addresses as 32-bit integers 1, e.g.This is a handy way for a human to read an IP address, but a very inefficient way for a computer to store and handle IP addresses. ip from, ip to (text) - The start and end IP addresses as text in dotted-quad human readable format, e.g.GeoIP Country is available as a CSV file containing the following fields: In this article I will refer to both GeoIP and GeoLite as “GeoIP” for simplicity. The free GeoLite versions are normally good enough, at about 98% accurate, but the for-pay GeoIP versions in theory are more accurate. This allows you look up nearly any IP and find out which country (hopefully) its user resides in. The most popular of their databases that I’ve seen used is GeoLite Country. They release both a commercial (for-pay, but affordable) product called GeoIP, and a free version of the same databases, called GeoLite. MaxMind is a great company that produces several geo-referencing databases. This is actually OK for the most part, as the geo-referencing functionality most people need doesn’t really need transactional support, and since the data tables are basically read-only (monthly replacements are published), the likelyhood of corruption in MyISAM due to any server failures isn’t very high. Unfortunately, R-tree (spatial) indexes have not been added to InnoDB yet, so the tricks in this entry only work efficiently with MyISAM tables (although they should work with InnoDB, they will perform poorly). After working with a Proven Scaling customer, I recently did some thinking and some performance testing on this problem, so I thought I would publish some hard data and advice for everyone. There is a lot of questionable advice on forums, blogs, and other sites out there on this topic. ![]() ![]() This is a very common task, but I have never actually seen it done efficiently in MySQL in the wild. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications. Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. ![]()
0 Comments
Leave a Reply. |