zoukankan      html  css  js  c++  java
  • [转]Cross-type joins in Elasticsearch

    Cross-type joins in Elasticsearch

    http://rore.im/posts/elasticsearch-joins

    December 31, 2014

    When modeling data in Elasticsearch, a common question is how to design the data to capture relationships between entities, to allow at least some level of “joins”.

    Elasticsearch has a good guide about data modeling. One of the options provided for expressing relationships is the parent-child model.

    A parent-child relationship in Elasticsearch is a way to express a one-to-many relationship (a parent with many children). The parent and child are separate Elasticsearch types, bounded only by specifying the parent type on the child mapping, and by giving the parent ID for every child index operation (this is used for routing the child to the shard of the parent).

    It’s a useful model when a parent has many children and when the child update pattern is different from that of the parent. (Since every child is a separate document, updating the child does not require re-indexing the parent).

    But this model also provides an interesting (if limited) way to capture relationships between sibling types.

    Lets consider the following data:

    My helpful screenshot

    Bill has two children - Adam and Eve, and a Dog (Apple).
    Bob has no children or pets (ah, freedom!).
    Mary has a little newborn child called Lamb.
    Jane has a boy named Xander, a cat (Buffy) and a dog (Willow).

    Lets create this data in Elasticsearch.
    We will have a parent type - “person”, and two child types - “children” and “pets”.
    First we’ll create the mapping for the child types.

        #!/bin/bash
        
        export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
        
        # Create indexes
        
        curl -XPUT "$ELASTICSEARCH_ENDPOINT/es-joins" -d '{
            "mappings": {
                "children": {
                    "_parent": {
                        "type": "person"
                    }
                },
                "pets": {
                    "_parent": {
                        "type": "person"
                    }
                }
            }
        }' 
    

    Next, index all the documents - parents, children and pets.

        # Index documents
        curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
        {"index":{"_index":"es-joins","_type":"person","_id":1}}
        {"name":"Bill","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":2}}
        {"name":"Bob","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":3}}
        {"name":"Mary","gender":"female"}
        {"index":{"_index":"es-joins","_type":"person","_id":4}}
        {"name":"Jane","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":1}}
        {"name":"Adam","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":2}}
        {"name":"Eve","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":3,"_id":3}}
        {"name":"Lamb","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":4,"_id":4}}
        {"name":"Xander","gender":"male"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":1,"_id":1}}
        {"name":"Apple","type":"dog"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":2}}
        {"name":"Buffy","type":"cat"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":3}}
        {"name":"Willow","type":"dog"}
        '
    

    Now we can do some searches on it.
    The usual example will be searching a parent by its children. Lets find all the parents that has a girl. We expect to get back only Bill.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "female"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    We can also combine conditions on multiple child types.
    Lets find parents that have a boy and a dog. This time we expect to get back both Bill and Jane.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "male"
                                        }
                                    }
                                }
                            },
                            {
                                "has_child": {
                                    "type": "pets",
                                    "query": {
                                        "term": {
                                            "type": "dog"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Another commonly used option is finding children by their parents.
    But a more interesting possibility is finding children by their siblings.
    Lets lookup all boys that have a dog. To do that we’re searching on the “children” type, and doing a has_parent filter that contains a has_child filter on the “pets” type.
    This time we expect to get back the children - Adam and Xander.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/children/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_parent": {
                                    "parent_type": "person",
                                    "filter": {
                                        "has_child": {
                                            "type": "pets",
                                            "query": {
                                                "term": {
                                                    "type": "dog"
                                                }
                                            }
                                        }
                                    }
                                }
                            },
                            {
                                "term": {
                                    "gender": "male"
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Of course, our data model here is a bit simplified as it allows only a single parent. If we were to extend it, we would create a “family” parent type, with child types - “parents”, “children” and “pets”.

    Currently, in order to get the details of the “joined” entity, another query is needed. For example, when searching “all boys that have a dog”, if we want the details of the dogs we need a second search for “all dogs with parents that have children with _id=…” (and the _ids of the children from the first search).
    This will change with the new upcoming inner hits feature that will allow getting the data of the inner entities in a single query.

    One should note that this method is not exactly recommended by Elasticsearch. Because of the memory requirements and performance hit, the official recommendation is: “Avoid using multiple parent-child joins in a single query”. So as always, test, measure and choose your modeling wisely.

  • 相关阅读:
    zookeeper03
    微服务网关Zuul
    微服务网关概述
    服务熔断Hystrix高级
    服务熔断Hystrix入门
    微服务架构的高并发问题
    服务注册和发现总结
    服务调用Feign高级
    服务调用Feign入门
    负载均衡Ribbon高级
  • 原文地址:https://www.cnblogs.com/freebird92/p/6340043.html
Copyright © 2011-2022 走看看