Elasticsearch批量插入时，存在就不插入

zoukankan html css js c++ java

Elasticsearch批量插入时，存在就不插入
当我们使用 Elasticsearch-py 批量插入数据到 ES 的时候，我们常常使用它的 helpers模块里面的bulk函数。其使用方法如下：
from elasticsearch import helpers, Elasticsearch es = Elasticsearch(xxx) def generator(): datas = [1, 2, 3] for data in datas: yield { '_id': "xxx", '_source': { 'age': data } } helpers.bulk(es, index='xxx', generator(), doc_type='doc',)
但这种方式有一个问题，它默认相当于upsert操作。如果_id 对应的文档已经在 ES 里面了，那么数据会被更新。如果_id 对应的文档不在 ES 中，那么就插入。

如果我想实现，不存在就插入，存在就跳过怎么办？此时就需要在文档里面添加_op_type指定操作类型为create:
from elasticsearch import helpers, Elasticsearch es = Elasticsearch(xxx) def generator(): datas = [1, 2, 3] for data in datas: yield { '_op_type': 'create', '_id': "xxx", '_source': { 'age': data } } helpers.bulk(es, generator(), index='xxx', doc_type='doc')
此时，如果_id 对应的文档不在 ES 中，那么就会正常插入，如果ES里面已经有_id对应的数据了，那么就会报错。由于bulk一次性默认插入500条数据，假设其中有2条数据已经存在了，那么剩下的498条会被正常插入。然后程序报错退出，告诉你有两条写入失败，因为已经存在。

如果你不想让程序报错终止，那么可以增加2个参数：
helpers.bulk(es, generator(), index='xxx', doc_type='doc', raise_on_exception=False, raise_on_error=False)
其中raise_on_exception=False表示在插入数据失败时，不需要抛出异常。raise_on_error=False表示不抛出BulkIndexError。

转自：https://mp.weixin.qq.com/s?src=11&timestamp=1579108111&ver=2098&signature=ZXtHL4GJONIJr9lN3KD*vHKfeujxkmmrWRnFl3Pfyu0DENxKPlybBsPaIlcjfiy5woHNz-v8oWES6FQP5e8j3yTKJWCL2qLRbCRtWb6NLlHvLjyJvELSPyG0dXhv1sR6&new=1
查看全文

相关阅读:
greenplum日常维护手册
 Android UI界面基本属性大全
 Listview 选项按下去黑了所有按钮的解决方法 ——android:cacheColorHint=“#00000000”
【转】Android应用程序模块详解
 android退出有多个activity的应用
 启动模式"singleTask"和FLAG_ACTIVITY_NEW_TASK具有不同的行为！
Android 按两次back键退出效率最高版
 【转】跑马灯效果
 Sundy笔记__Git版本控制
 如果你想用对话框代替一个activity的话，可以设置activity的主题属性

原文地址：https://www.cnblogs.com/tjp40922/p/12203625.html