常用地址库
研究了下IP地址库,目前比较常用的库有下面几个:
- 纯真数据库 :完全免费,精度不高,可以通过(www.cz88.net/soft/setup.zip)下载安装包;
- IPIP数据库:国内做的最好的IP地址库,免费版的差强人意;
- GeoIP:免费版国内的城市精度一般,收费版比较精确,数据比较有特色,还同时提供了经纬度信息;
- Ip2Location:试了一下,挺好用的,不过地址都是汉语拼音或英文,想用汉字的,数据得自己处理一下;
关于IPv6
关于为什么要使用IPv6,可以参看协议森林04 地址耗尽危机 (IPv4与IPv6地址)。
IPV6的长度为128位,是IPV4地址长度的4倍。所以IPV4的点分十进制格式不再适用,采用十六进制表示,IPV6有三种表示方法:
-
冒分十六进制表示法:格式为X:X:X:X:X:X:X:X,每个X表示地址中的16b,以十六进制表示,例如:ABCD:EF01:2345:6789:ABCD:EF01:2345:6789,这种表示法中的前导0是可以省略的,例如:
2001:0DB8:0000:0023:0008:0800:200C:417A→ 2001:DB8:0:23:8:800:200C:417A
-
0位压缩表示法:在某些情况下,IPv6地址中可能包含很长一段0,就可以把0压缩为“::”,但为了保证解析地址的唯一性,“::”只能出现一次,例如:
FF01:0:0:0:0:0:0:1101 → FF01::1101
0:0:0:0:0:0:0:1 → ::1
-
内嵌IPv4表示法:为了实现IPv4-IPv6互通,IPv4的地址会嵌入到IPv6地址中,此时地址通常表示为:X:X:X:X:X:X:d.d.d.d,前96b采用冒分十六进制表示,后32b则采用IPv4的点分十进制表示,如:192.168.0.1与::FFFF:192.168.0.1,在前96b中,0位压缩法依旧适用。
扩展阅读:
- IPv6查询工具:https://ipv6.xinyunan.cn/index.html
- 全国三大运营商IPv6地址分配表:https://ipv6.xinyunan.cn/i6.html
关于IPv6的地址库,本人研究了国外Ip2Location中的免费数据,精度一般,最后发现了ZX公司的IPDB,数据收录基本能满足日常学习研究使用。
数据分析
纯真数据库(IPv4)
纯真数据库的安装包中提供了解压工具,可以将qqwry.dat的数据格式转换为txt格式,转换后的数据格式如下:
0.0.0.0 0.255.255.255 IANA 保留地址
1.0.0.0 1.0.0.0 美国 亚太互联网络信息中心(CloudFlare节点)
1.0.0.1 1.0.0.1 美国 APNIC&CloudFlare公共DNS服务器
1.0.0.2 1.0.0.255 美国 亚太互联网络信息中心(CloudFlare节点)
1.0.1.0 1.0.3.255 福建省 电信
1.0.4.0 1.0.7.255 澳大利亚 墨尔本Goldenit有限公司
第一列是起始IP、第二列是截止IP、第三列是地区、第四列是运营商信息。
IPDB(IPv6)
ZX公司的IPDB相对麻烦一些,没有提供相关的解压工具,需要自己分析数据格式,找到了Github上Rhilip大神的项目,并做了更改:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# Copyright (c) 2017-2020 Rhilip <rhilipruan@gmail.com>
import re
import os
dir = os.path.dirname(__file__)
v4db_path = os.path.join(dir, 'db/qqwry.dat')
v6db_path = os.path.join(dir, 'db/ipv6wry.db')
v6ptn = re.compile(r'^[0-9a-f:.]{3,51}$')
v4ptn = re.compile(r'.*((25[0-5]|2[0-4]d|[0-1]?dd?).){3}(25[0-5]|2[0-4]d|[0-1]?dd?)$')
def parseIpv4(ip):
sep = ip.rfind(':')
if sep >= 0:
ip = ip[sep + 1:]
if v4ptn.match(ip) is None:
return -1
v4 = 0
for sub in ip.split('.'):
v4 = v4 * 0x100 + int(sub)
return v4
def parseIpv6(ip):
if v6ptn.match(ip) is None:
return -1
count = ip.count(':')
if count >= 8 or count < 2:
return -1
ip = ip.replace('::', '::::::::'[0:8 - count + 1], 1)
if ip.count(':') < 6:
return -1
v6 = 0
for sub in ip.split(':')[0:4]:
if len(sub) > 4:
return -1
if len(sub) == 0:
v6 = v6 * 0x10000
else:
v6 = v6 * 0x10000 + int(sub, 16)
return v6
def parseIp(ip):
ip = ip.strip()
ip = ip.replace('*', '0')
v4 = parseIpv4(ip)
v6 = parseIpv6(ip)
v2002 = v6 >> (3 * 16)
if v2002 == 0x2002:
v4 = (v6 >> 16) & 0xffffffff
v2001 = v6 >> (2 * 16)
if v2001 == 0x20010000:
v4 = ~int(''.join(ip.split(':')[-2:]), 16)
v4 = int(bin(((1 << 32) - 1) & v4)[2:], 2)
return v4, v6
class IpDb(object):
except_raw = 0x19
osLen = ipLen = dLen = dbAddr = size = None
def __init__(self, db_path):
with open(db_path, 'rb') as f:
db = f.read()
self.db = db
if db[0:4] != 'IPDB'.encode():
self.type = 4
self._init_v4db()
else:
self.type = 6
self._init_v6db()
def _init_v4db(self):
self.osLen = 3
self.ipLen = 4
self.dLen = self.osLen + self.ipLen
self.dbAddr = int.from_bytes(self.db[0:4], byteorder='little')
endAddr = int.from_bytes(self.db[4:8], byteorder='little')
self.size = (endAddr - self.dbAddr) // self.dLen
def _init_v6db(self):
self.osLen = self.db[6] # 3
self.ipLen = self.db[7] # 8
self.dLen = self.osLen + self.ipLen
self.dbAddr = int.from_bytes(self.db[0x10: 0x18], byteorder='little') # 50434
self.size = int.from_bytes(self.db[8:0x10], byteorder='little') # 140045
def getSize(self):
return self.size
def getData(self, index):
self.checkIndex(index)
addr = self.dbAddr + index * self.dLen
ip = int.from_bytes(self.db[addr: addr + self.ipLen], byteorder='little')
return ip
def checkIndex(self, index):
if index < 0 or index >= self.getSize():
raise Exception
def getLoc(self, index):
self.checkIndex(index)
addr = self.dbAddr + index * self.dLen
# ip = int.from_bytes(self.db[addr: addr + self.ipLen],
# byteorder='little')
lAddr = int.from_bytes(self.db[addr + self.ipLen: addr + self.dLen], byteorder='little')
# print('ip_addr: %d ip: %d lAddr:%d' % (addr, ip, lAddr))
if self.type == 4:
lAddr += 4
loc = self.readLoc(lAddr, True)
if self.type == 4:
loc = loc.decode('cp936')
loc = loc.replace('CZ88.NET', '')
if self.type == 6:
loc = loc.decode('utf-8')
return loc
def readRawText(self, start):
bs = []
if self.type == 4 and start == self.except_raw:
return bs
while self.db[start] != 0:
bs += [self.db[start]]
start += 1
return bytes(bs)
def readLoc(self, start, isTwoPart=False):
jType = self.db[start]
if jType == 1 or jType == 2:
start += 1
offAddr = int.from_bytes(self.db[start:start + self.osLen], byteorder='little')
if offAddr == 0:
return 'Unknown address'
loc = self.readLoc(offAddr, True if jType == 1 else False)
nAddr = start + self.osLen
else:
loc = self.readRawText(start)
nAddr = start + len(loc) + 1
if isTwoPart and jType != 1:
partTwo = self.readLoc(nAddr)
if loc and partTwo:
loc += b' ' + partTwo
return loc
def searchIp(self, val):
index = self.binarySearch(val)
if index < 0:
return "Unknown address"
if index > self.getSize() - 2:
index = self.getSize() - 2
return self.getLoc(index)
def binarySearch(self, key, lo=0, hi=None):
if not hi:
hi = self.getSize() - 1
while lo <= hi:
if hi - lo <= 1:
if self.getData(lo) > key:
return -1
elif self.getData(hi) <= key:
return hi
else:
return lo
mid = (lo + hi) // 2
data = self.getData(mid)
if data is not None and data > key:
hi = mid - 1
elif data is not None and data < key:
lo = mid
else:
return mid
return -1
class IpQuery(object):
def __init__(self):
self.v6db = IpDb(v6db_path)
self.v4db = IpDb(v4db_path)
def searchIp(self, ip):
ret = ''
err = None
try:
v4, v6 = parseIp(ip)
# print('v4: %d v6: %d' % (v4, v6))
if v6 >= 0:
print(v6)
ret += self.v6db.searchIp(v6)
if v4 >= 0:
if ret != '':
ret += ' > '
ret += self.v4db.searchIp(v4)
except Exception as e:
err = "Internal server error"
return {
"ip": ip,
"loc": ret if ret else None,
"stats": err or ("Can't Format IP address." if ret == '' else "Success")
}
if __name__ == '__main__':
# ipquery = IpQuery()
# ip = '2001:250:230::'
# ip = '42.156.139.1'
# ip = '182.117.109.0'
# ip = '114.242.248.*'
# ip = None
# result = ipquery.searchIp(ip)
v6db = IpDb(v6db_path)
i = 0
fs = open('ipv6.csv','w',encoding="utf-8")
while(i < v6db.size - 1):
fs.write(str(v6db.getData(i)) + "," + str(v6db.getData(i + 1) - 1) +
"," + v6db.getLoc(i) + "
")
i+=1
fs.close()
导出后的数据格式如下:
0,28428538856079359,IANA保留地址
28428538856079360,28428538856079360,IANA特殊地址 包含v4地址的v6地址
28428538856079361,28428538856144895,IANA保留地址
28428538856144896,28428538856210431,IANA特殊地址 包含v4地址的v6地址
28428538856210432,72057594037927935,IANA保留地址
72057594037927936,72057594037927936,IANA特殊地址 仅用于丢弃的地址
72057594037927937,2306124484190404607,IANA保留地址
第一位和第二位是将IPv6的前4位计算得到的值,第三位是地址。
查询代码
为了提升加载速度和代码的一致性,这里考虑将IPv4的地址库处理为和IPv6地址库一致的格式,处理代码如下:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import ipaddress
fw = open('ipv4.txt','w',encoding='utf-8')
for line in open('ip.txt','r'):
larr = line.replace('CZ88.NET','').strip('
').split(' ')
larr = [sval for sval in filter(lambda s:s != '',larr)]
start = int(ipaddress.IPv4Address(larr[0]))
end = int(ipaddress.IPv4Address(larr[1]))
address = larr[2]
if len(larr) > 3:
for i in range(len(larr) - 3):
address+=larr[3 + i]
print(start,end,address)
fw.writelines(str(start) + ',' + str(end) + ',' + address+"
")
print('over')
查找方式使用二分法:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Numerics;
using System.Text;
namespace Trail.Common
{
/// <summary>
/// IP地址库工具。
/// </summary>
public class IPLocationTool
{
private const string IPv4Path = "ipv4.txt";
private const string IPv6Path = "ipv6.txt";
private const string UnKnowIP = "未知地址";
private static IPv4LocInfo[] _IPv4Infos = null;
private static IPv6LocInfo[] _IPv6Infos = null;
/// <summary>
/// 加载地址库数据。
/// </summary>
public static void Load()
{
//IPv4
using (var sr = new StreamReader(IPv4Path, Encoding.UTF8))
{
string line;
var ipv4LocInfos = new List<IPv4LocInfo>();
while (!string.IsNullOrEmpty(line = sr.ReadLine()))
{
var lineArr = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
IPv4LocInfo ipv4LocInfo = new IPv4LocInfo()
{
Start = Convert.ToUInt32(lineArr[0]),
End = Convert.ToUInt32(lineArr[1]),
Address = lineArr[2]
};
ipv4LocInfos.Add(ipv4LocInfo);
}
_IPv4Infos = ipv4LocInfos.ToArray();
}
using (var sr = new StreamReader(IPv6Path, Encoding.UTF8))
{
string line;
var ipv6LocInfos = new List<IPv6LocInfo>();
while (!string.IsNullOrEmpty(line = sr.ReadLine()))
{
var lineArr = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
IPv6LocInfo ipv6LocInfo = new IPv6LocInfo()
{
Start = BigInteger.Parse(lineArr[0]),
End = BigInteger.Parse(lineArr[1]),
Address = lineArr[2]
};
ipv6LocInfos.Add(ipv6LocInfo);
}
_IPv6Infos = ipv6LocInfos.ToArray();
}
}
/// <summary>
/// 二分查找IP地址。
/// </summary>
/// <param name="ip">IP。</param>
/// <returns>地址。</returns>
public static string BinSearch(string ip)
{
//IPv6
if (ip.Contains(":"))
{
var ipNum = IPv6ToIndex(ip);
int high = _IPv6Infos.Length;
for (int low = 0; low <= high;)
{
var point_index = (high + low) / 2;
if (ipNum < _IPv6Infos[point_index].Start)
{
high = point_index - 1;
continue;
}
else if (ipNum > _IPv6Infos[point_index].End)
{
low = point_index + 1;
continue;
}
return _IPv6Infos[point_index].Address;
}
}
//IPv4
else
{
//转数字
var ipNum = IPv4ToNumber(ip);
int high = _IPv4Infos.Length;
for (int low = 0; low <= high;)
{
var point_index = (high + low) / 2;
if (ipNum < _IPv4Infos[point_index].Start)
{
high = point_index - 1;
continue;
}
else if (ipNum > _IPv4Infos[point_index].End)
{
low = point_index + 1;
continue;
}
return _IPv4Infos[point_index].Address;
}
}
return UnKnowIP;
}
/// <summary>
/// IPv4转换为数值。
/// </summary>
/// <param name="ip">IPv4的地址。</param>
/// <returns>数值。</returns>
public static long IPv4ToNumber(string ip)
{
var ipArr = ip.Split(new char[] { '.' });
return long.Parse(ipArr[0]) * 16777216 + long.Parse(ipArr[1]) * 65536 + long.Parse(ipArr[2]) * 256 + long.Parse(ipArr[3]);
}
/// <summary>
/// IPV6转换为数值。
/// </summary>
/// <param name="ip">IPV6的地址。</param>
/// <returns>数值。</returns>
private static BigInteger IPv6ToNumber(string ip)
{
IPAddress address;
BigInteger ipnum;
if (IPAddress.TryParse(ip, out address))
{
byte[] addrBytes = address.GetAddressBytes();
if (BitConverter.IsLittleEndian)
{
List<byte> byteList = new List<byte>(addrBytes);
byteList.Reverse();
addrBytes = byteList.ToArray();
}
if (addrBytes.Length > 8)
{
//IPv6
ipnum = BitConverter.ToUInt64(addrBytes, 8);
ipnum <<= 64;
ipnum += BitConverter.ToUInt64(addrBytes, 0);
}
else
{
//IPv4
ipnum = BitConverter.ToUInt32(addrBytes, 0);
}
return ipnum;
}
return 0;
}
/// <summary>
/// IPV6转为索引值(IPv6是按头四位索引分配地址)。
/// </summary>
/// <param name="ip">IPV6的地址。</param>
/// <returns>数值。</returns>
private static BigInteger IPv6ToIndex(string ip)
{
//补齐::
int count = ip.ToCharArray().Count(p => p.Equals(':'));
ip = ip.Replace("::", ":::::::".Substring(0, 8 - count + 1));
if (ip.ToCharArray().Count(p => p.Equals(':')) < 6)
return -1;
BigInteger v6 = 0;
var ipArr = ip.Split(new string[] { ":" }, StringSplitOptions.None);
for (int i = 0; i < 4; i++)
{
if (string.IsNullOrEmpty(ipArr[i]))
v6 = v6 * 0x10000;
else
{
v6 = v6 * 0x10000 + Int64.Parse(ipArr[i], System.Globalization.NumberStyles.HexNumber);
}
}
return v6;
}
}
/// <summary>
/// IPv4地址信息。
/// </summary>
public class IPv4LocInfo
{
/// <summary>
/// 范围起始。
/// </summary>
public uint Start { get; set; }
/// <summary>
/// 范围结束。
/// </summary>
public uint End { get; set; }
/// <summary>
/// 归属地。
/// </summary>
public string Address { get; set; }
}
/// <summary>
/// IPv4地址信息。
/// </summary>
public class IPv6LocInfo
{
/// <summary>
/// 范围起始。
/// </summary>
public BigInteger Start { get; set; }
/// <summary>
/// 范围结束。
/// </summary>
public BigInteger End { get; set; }
/// <summary>
/// 归属地。
/// </summary>
public string Address { get; set; }
}
}
测试代码如下:
/// <summary>
/// 测试
/// </summary>
/// <param name="args">参数</param>
static void Main(string[] args)
{
try
{
IPLocationTool.Load();
var beginTime = DateTime.Now;
//IPv4测试
Console.WriteLine(IPLocationTool.BinSearch("61.152.197.155")); //上海市网友
Console.WriteLine(IPLocationTool.BinSearch("211.143.205.140")); //福建省漳州市
Console.WriteLine(IPLocationTool.BinSearch("218.57.116.146")); //山东省青岛市
Console.WriteLine(IPLocationTool.BinSearch("121.35.180.254")); //广东省深圳市
Console.WriteLine(IPLocationTool.BinSearch("112.13.166.125")); //浙江省丽水市
Console.WriteLine(IPLocationTool.BinSearch("61.181.236.137")); //天津市宝坻区
Console.WriteLine(IPLocationTool.BinSearch("1.65.212.143")); //香港
//IPv6测试
Console.WriteLine(IPLocationTool.BinSearch("2409:8a00::")); //中国北京市东城区
Console.WriteLine(IPLocationTool.BinSearch("2408:8410:47ff:ffff:1155:658:1254:632")); //中国天津市红桥区
Console.WriteLine(IPLocationTool.BinSearch("2409:8a0c:1200::")); //中国山西省太原市娄烦县
Console.WriteLine(IPLocationTool.BinSearch("2409:8a15:9400::")); //中国辽宁省辽阳市灯塔市
Console.WriteLine("运行完毕,耗时{0}ms", (DateTime.Now - beginTime).TotalMilliseconds);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message + "
" + ex.StackTrace);
}
Console.WriteLine("Over");
Console.Read();
}