zoukankan      html  css  js  c++  java
  • [转]Cross-type joins in Elasticsearch

    Cross-type joins in Elasticsearch

    http://rore.im/posts/elasticsearch-joins

    December 31, 2014

    When modeling data in Elasticsearch, a common question is how to design the data to capture relationships between entities, to allow at least some level of “joins”.

    Elasticsearch has a good guide about data modeling. One of the options provided for expressing relationships is the parent-child model.

    A parent-child relationship in Elasticsearch is a way to express a one-to-many relationship (a parent with many children). The parent and child are separate Elasticsearch types, bounded only by specifying the parent type on the child mapping, and by giving the parent ID for every child index operation (this is used for routing the child to the shard of the parent).

    It’s a useful model when a parent has many children and when the child update pattern is different from that of the parent. (Since every child is a separate document, updating the child does not require re-indexing the parent).

    But this model also provides an interesting (if limited) way to capture relationships between sibling types.

    Lets consider the following data:

    My helpful screenshot

    Bill has two children - Adam and Eve, and a Dog (Apple).
    Bob has no children or pets (ah, freedom!).
    Mary has a little newborn child called Lamb.
    Jane has a boy named Xander, a cat (Buffy) and a dog (Willow).

    Lets create this data in Elasticsearch.
    We will have a parent type - “person”, and two child types - “children” and “pets”.
    First we’ll create the mapping for the child types.

        #!/bin/bash
        
        export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
        
        # Create indexes
        
        curl -XPUT "$ELASTICSEARCH_ENDPOINT/es-joins" -d '{
            "mappings": {
                "children": {
                    "_parent": {
                        "type": "person"
                    }
                },
                "pets": {
                    "_parent": {
                        "type": "person"
                    }
                }
            }
        }' 
    

    Next, index all the documents - parents, children and pets.

        # Index documents
        curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
        {"index":{"_index":"es-joins","_type":"person","_id":1}}
        {"name":"Bill","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":2}}
        {"name":"Bob","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":3}}
        {"name":"Mary","gender":"female"}
        {"index":{"_index":"es-joins","_type":"person","_id":4}}
        {"name":"Jane","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":1}}
        {"name":"Adam","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":2}}
        {"name":"Eve","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":3,"_id":3}}
        {"name":"Lamb","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":4,"_id":4}}
        {"name":"Xander","gender":"male"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":1,"_id":1}}
        {"name":"Apple","type":"dog"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":2}}
        {"name":"Buffy","type":"cat"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":3}}
        {"name":"Willow","type":"dog"}
        '
    

    Now we can do some searches on it.
    The usual example will be searching a parent by its children. Lets find all the parents that has a girl. We expect to get back only Bill.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "female"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    We can also combine conditions on multiple child types.
    Lets find parents that have a boy and a dog. This time we expect to get back both Bill and Jane.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "male"
                                        }
                                    }
                                }
                            },
                            {
                                "has_child": {
                                    "type": "pets",
                                    "query": {
                                        "term": {
                                            "type": "dog"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Another commonly used option is finding children by their parents.
    But a more interesting possibility is finding children by their siblings.
    Lets lookup all boys that have a dog. To do that we’re searching on the “children” type, and doing a has_parent filter that contains a has_child filter on the “pets” type.
    This time we expect to get back the children - Adam and Xander.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/children/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_parent": {
                                    "parent_type": "person",
                                    "filter": {
                                        "has_child": {
                                            "type": "pets",
                                            "query": {
                                                "term": {
                                                    "type": "dog"
                                                }
                                            }
                                        }
                                    }
                                }
                            },
                            {
                                "term": {
                                    "gender": "male"
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Of course, our data model here is a bit simplified as it allows only a single parent. If we were to extend it, we would create a “family” parent type, with child types - “parents”, “children” and “pets”.

    Currently, in order to get the details of the “joined” entity, another query is needed. For example, when searching “all boys that have a dog”, if we want the details of the dogs we need a second search for “all dogs with parents that have children with _id=…” (and the _ids of the children from the first search).
    This will change with the new upcoming inner hits feature that will allow getting the data of the inner entities in a single query.

    One should note that this method is not exactly recommended by Elasticsearch. Because of the memory requirements and performance hit, the official recommendation is: “Avoid using multiple parent-child joins in a single query”. So as always, test, measure and choose your modeling wisely.

  • 相关阅读:
    个人总结
    再见,我的伪算法竞赛生涯。
    C语言实现迷宫小游戏
    关于第九届蓝桥杯决赛
    蓝桥杯近3年决赛题之3(17年b组)
    蓝桥杯近3年决赛题之二(16年b组)
    蓝桥杯近三年决赛题之一(15年B组)
    第九届蓝桥杯C/C++B组省赛感想
    读书笔记《程序员的自我修养—链接、装载与库》
    蓝桥杯近3年初赛题之三(17年b组)
  • 原文地址:https://www.cnblogs.com/freebird92/p/6340043.html
Copyright © 2011-2022 走看看