linux环境下搭建osm_web服务器四（对万国语的地名进行翻译和检索）：

zoukankan html css js c++ java

linux环境下搭建osm_web服务器四（对万国语的地名进行翻译和检索）：
对万国语的地名进行翻译和检索

经过前三篇的调试，已经有了一个完整的Map可以浏览，我们痛苦的世界范围数据下载、导入过程也结束了。要提醒一下的是，鉴于网速，不要下载 planetosm.lastest 文件，因为这个文件每周更新，万一一周下不完，就over了。

当然了，导入后，别忘了

[plain] view plain copy
sudo touch /var/lib/mod_tile/planet-import-complete

sudo chown www-data /var/lib/mod_tile/planet-import-complete
设置时间戳哦！

   导入后，只有中国、日本有些中文字符，其他国家都是鸟语，必须进行汉化。用PostgreSQL count 一下，name 字段不为null 的条目太多了，利用在线的翻译API似乎不现实。我们通过下载字典来进行本地自动匹配与翻译。字典在http://download.csdn.net/detail/goldenhawking/4556453, 导入后，含有17万个地名翻译的表如下面所示

由于place_name 里的地名有不规范的表示，比如括号中的曾用名、用逗号分隔的等效名等情况，不能直接把地名表与planet_osm_roads 、planet_osm_polygon 、planet_osm_line、planet_osm_point 四张表的name字段做 like 或者 = 的换算。同样，即使是做正则式的匹配，也要考虑到比如   XXXX 与 XX'XX (YYYY) 的情况，即原本地名已经包含阿拉伯语与英语两种语言的情况。

   为此，写一个程序，进行匹配，提前把地名进行标准化。其算法过程是：

   读取planet_osm_roads 、planet_osm_polygon 、planet_osm_line、planet_osm_point四张表里 name is not null 并多于1个字符的地名，进行简化，清除括号、非拉丁、斯拉夫字符，而后与经过规范化的 place_name 进行匹配。为了存储独立的中文字段，在四张表尾部追加了一个trans_name_chs的 text 字段，以便存储纯粹的中文地名，供搜索用。

[sql] view plain copy
ALTER TABLE planet_osm_point ADD COLUMN trans_name_chs text;

ALTER TABLE planet_osm_line ADD COLUMN trans_name_chs text;

ALTER TABLE planet_osm_polygon ADD COLUMN trans_name_chs text;

ALTER TABLE planet_osm_roads ADD COLUMN trans_name_chs text;
算法伪代码表示:

[cpp] view plain copy
void Match(unicode TableName)

{

    for_each (record in TableName where 长度>3)

    {

        unicode 地名 = record->name;

        //清除首尾空格

        TrimSpaces(地名);



        //只保留两类字符，根据字符的unicode取值范围筛选

10.         unicode 词干 = FilterChar (地名, new LanguageFilter({拉丁,斯拉夫}));

11.

12.         //在翻译表中查找可能的翻译

13.             unicodeList 可能结果集 = DatabaseSearch("规范化词干表","like %s%",词干);

14.

15.         //对所有含有词干的可能结果，进行相似度排序,这里的策略是看看长度比例因子，

16.         //比如  Shanghai 与  Shanghai City 为 8:13，与Shanghai 为 1:1 ，因此取 Shanghai

17.

18.         unicode 最佳解=null;

19.         double  最佳因子=0;

20.

21.             for_each (unicode 可能解 in 可能结果集 where length(词干)/length(可能解)>0.6)

22.         {

23.             double 当前因子 = length(词干)/length(可能解);

24.             if (当前因子>最佳因子)

25.             {

26.                 最佳解 = 可能解;

27.                 最佳因子 = 当前因子;

28.                 if (最佳因子 == 1)

29.                     break;

30.             }

31.         }

32.

33.         //刷新数据库

34.         if (最佳因子 >0)

35.         {

36.             unicode 翻译结果 = 最佳解 + "(" + 地名 + ")";

37.             UpdateTable(TableName, record->id, 翻译结果);

38.         }

39.     }

40. }
匹配过程大概需要1-2天，匹配完成后，翻译好的地名便存入了name字段中。渲染瓦片，看一看，主要的地名都OK啦

德国的

最后，为这些字段建立索引

[sql] view plain copy
CREATE INDEX idx_planet_osm_roads_name ON planet_osm_roads USING btree ("name") where name is not null;

CREATE INDEX idx_planet_osm_roads_trans_name_chs ON planet_osm_roads USING btree ("trans_name_chs") where trans_name_chs is not null;

CREATE INDEX idx_planet_osm_polygon_name ON planet_osm_polygon USING btree ("name") where name is not null;

CREATE INDEX idx_planet_osm_polygon_trans_name_chs ON planet_osm_polygon USING btree ("trans_name_chs") where trans_name_chs is not null;

CREATE INDEX idx_planet_osm_line_name ON planet_osm_line USING btree ("name") where name is not null;

CREATE INDEX idx_planet_osm_line_trans_name_chs ON planet_osm_line USING btree ("trans_name_chs") where trans_name_chs is not null;

CREATE INDEX idx_planet_osm_point_name ON planet_osm_point USING btree ("name") where name is not null;

CREATE INDEX idx_planet_osm_point_trans_name_chs ON planet_osm_point USING btree ("trans_name_chs") where trans_name_chs is not null;
全部搞定后，vacuum 一下，索引就可以立刻参与查询了，而且使用 FCGI 实现地名的检索就变得简单了。这里为了测试，直接用C写CGI程序。

程序实现两个功能，一个是根据地名检索旁边的GIS对象，另一个是根据坐标检索最近的地名。这里用到 PostGIS的 CoverBy 系列函数. CGI代码：

略

转载：http://blog.csdn.net/goldenhawking/article/details/7952303
查看全文

相关阅读:
MySQL的sql_mode合理设置
 Redis
启动Jupyter Notebook提示ImportError: DLL load failed: 找不到指定的模块。
Linux目录结构
 修改mysql密码报错： You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '
springmvc运行原理
 博客园美化
 数据搜索
 git
window

原文地址：https://www.cnblogs.com/BigFishFly/p/6337361.html