elasticsearch系列（三）库表理解

zoukankan html css js c++ java

elasticsearch系列（三）库表理解
首先ES没有库和表的概念，只有index,type,document（详细术语可以看ES的系列一 http://www.cnblogs.com/ulysses-you/p/6736926.html），如果要加快理解的话，可以和一般关系型数据库做简单映射

下面是对这些概念的理解

Index

1.ES的index中shards相当于lucene的indices，而lucene的indices会拥有固定的磁盘空间，内容和文件描述，所以不能无脑新建ES的index，数据量大的1个index比多个小的index效率更高，所以ES的多个types代替多个indices可以减少ES对lucene的管理

2.尽量不要多个index一起查，ES在搜索过程会集合要搜索的每个index下的每个shards，所以会很吃资源

Type

1.1个index下搜索1个type和多个type不需要消耗更多资源

2.fields必须保持一致，1个index中有两个相同name的fields，但是type不同，则这两个fields的propertis必须一样

3.fields尽可能不要稀疏（hbase的表是稀疏型），已经存在的fileds会因为不存在的fields消耗资源，这也是lucene的一个问题

·由于fields稀疏会导致压缩的效率降低。

·1个document会预留一个固定大小的磁盘空间来提高寻址效率

4.由于index-wide统计，1个type下documents的scores会被其他type下documents影响

5.1个稀疏的index比把1个index分割成多个更加有害

总结

选择存储结构时的自问
- Are you using parent/child? If yes this can only be done with two types in the same index.
- Do your documents have similar mappings? If no, use different indices.
- If you have many documents for each type, then the overhead of Lucene indices will be easily amortized so you can safely use indices, with fewer shards than the default of 5 if necessary.
- Otherwise you can consider putting documents in different types of the same index. Or even in the same type.
常用套路

1个index包含5个type和5个index只有一个shard几乎是等价的。

2.如果documents的mapping不同，就多开index

3.一般而言，多types的场景很少

4.追求高写入，则增加shards，追求高读取，则减少shards

参考资料

//官方index和type的比较

https://www.elastic.co/blog/index-vs-type

//外国友人写的很详细的ES博客

https://blog.insightdatascience.com/anatomy-of-an-elasticsearch-cluster-part-i-7ac9a13b05db
新博客地址 http://ixiaosi.art/ 欢迎来访 : )
查看全文

相关阅读:
Linux 之编译器 gcc/g++参数详解
 linux下history命令显示历史指令记录的使用方法
 Linux 命令之 Navicat 连接 Linux 下的Mysql数据库
 Linux命令
 CentOS 下安装
 CMD命令之：修改windows的CMD窗口输出编码格式为UTF-8
CTO、技术总监、首席架构师的区别
 PHP ServerPush (推送) 技术的探讨
 一个公司的管理层级结构
 Table of Contents

原文地址：https://www.cnblogs.com/ulysses-you/p/6858033.html

elasticsearch系列（三）库表理解

Index

Type

总结

参考资料