zoukankan      html  css  js  c++  java
  • ElasticSearch 2 (5)

    ElasticSearch 2.1.1 (5) - Document APIs

    This section describes the following CRUD APIs:

    Single document APIs

    Index API

    Query:

    $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
        "user" : "kimchy",
        "post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'  
    

    Result:

    {
        "_shards" : {
            "total" : 10,
            "failed" : 0,
            "successful" : 10
        },
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 1,
        "created" : true
    }
    
    • total -

      Indicates to how many shard copies (primary and replica shards) the index operation should be executed on.

    • successful -

      Indicates the number of shard copies the index operation succeeded on.

    • failures -

      An array that contains replication related errors in the case an index operation failed on a replica shard.

    Replica shards may not all be started when an indexing operation successfully returns (by default, a quorum is required). In that case, total will be equal to the total shards based on the index replica settings and successful will be equal to the number of shards started (primary plus replicas). As there were no failures, the failed will be 0.

    Automatic Index Creation

    • Automatic index creation disable

        action.auto_create_index 	=>		false
      
    • Automatic mapping(type) creation disable

        index.mapper.dynamic  		=> 		false	
      
    • Index creation black/white list

        action.auto_create_index	=>		+aaa*,-bbb*,+ccc*,-*
      
      • +: allowed
      • -: disallowed

    Versioning

    • Optimistic concurrency control

    Transactional read-then-update. It is recommended to set preference to _primary

    curl -XPUT 'localhost:9200/twitter/tweet/1?version=2' -d '{
        "message" : "elasticsearch now has versioning support, double cool!"
    }'    
    

    NOTE: versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, then the operation is executed without any version checks.

    • Version types

      • internal

        only index the document if the given version is identical to the version of the stored document.

      • external or external_gt

        only index the document if the given version is strictly higher than the version of the stored document or if there is no existing document. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.

      • external_gte

        only index the document if the given version is equal or higher than the version of the stored document. If there is no existing document the operation will succeed as well. The given version will be used as the new version and will be stored with the new document. The supplied version must be a non-negative long number.

      • force

        correcting errors

    Operation Type

    • op_type

        $ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
            "user" : "kimchy",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }'	
      
    • create

        $ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
            "user" : "kimchy",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }'        
      

    Automatic ID Generation

    POST used instead of PUT (op_type will automatically be set to create)

    $ curl -XPOST 'http://localhost:9200/twitter/tweet/' -d '{
    	"user" : "kimchy",
    	"post_date" : "2009-11-15T14:12:12",
        "message" : "trying out Elasticsearch"
    }'
    

    automatic ID generation (6a8ca01c-7896-48e9-81cc-9f70661fcb32)

    {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "6a8ca01c-7896-48e9-81cc-9f70661fcb32",
        "_version" : 1,
        "created" : true
    }    
    

    Routing

    • default

      Hash of the document’s id value

    • explicit control

      The value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter

        $ curl -XPOST 'http://localhost:9200/twitter/tweet?routing=kimchy' -d '{
            "user" : "kimchy",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }'
      

      When setting up explicit mapping, the _routing field can be optionally used to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

    Parents & Children

    Child document index by parent(Automatically)

    $ curl -XPUT localhost:9200/blogs/blog_tag/1122?parent=1111 -d '{
    	"tag" : "something"
    }'
    

    Timestamp (Deprecated in 2.0.0-beta2.)

    Use Date
    

    TTL (time to live) (Deprecated in 2.0.0-beta2.)

    Future
    

    Distributed

    The index operation is directed to the primary shard based on its route (see the Routing section above) and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

    Primary shard => Replicas
    

    Write Consistency

    • Quorum

      (>replicas/2+1) of active shards are available

    • action.write_consistency

      • one
      • quorum
      • all
    • behavior

      • node-by-node
      • per-operation
    • sync replication

      The index operation only returns after all active shards within the replication group have indexed the document

    Refresh

    • To refresh the shard (not the whole index)
    • True - poor performance
    • GetAPI - realtime (doesn't require refresh)

    Noop Updates

    • detect_noop (version)

      • true: compare document content
      • false: ignore document content
    • no hard and fast rule

    Timeout

    • default - 1 min

    • explicit

        $ curl -XPUT 'http://localhost:9200/twitter/tweet/1?timeout=5m' -d '{
            "user" : "kimchy",
            "post_date" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }'
      

    Get API

    Get:

    curl -XGET 'http://localhost:9200/twitter/tweet/1'
    

    Result:

    {
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 1,
        "found": true,
        "_source" : {
            "user" : "kimchy",
            "postDate" : "2009-11-15T14:12:12",
            "message" : "trying out Elasticsearch"
        }
    }  
    

    Check Exists:

    curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1'
    

    Result:

    HTTP/1.1 404 Not Found
    es.resource.type: index_expression
    es.resource.id: twitter
    es.index: twitter
    Content-Type: text/plain; charset=UTF-8
    Content-Length: 0    
    

    Realtime

    • To disable

        action.get.realtime => false
      
    • fields (Good Practice)

      • BECAUSE: At least for a period of time, basically, until the next flush
      • THEREFORE: Assume fields will be loaded from source when using realtime GET

    Optional Type

    • _type: optional

    • _all: fetch first cross types

    Source filtering

    • Default: open

    • To disable

      • use fields

      • _source false

          curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=false'
        
    • Large document

      • _source_include
      • _source_exclude
    • Parameter

      • list
      • wildcards
    • Example

        curl -XGET 'http://localhost:9200/twitter/tweet/1?_source_include=*.id&_source_exclude=entities'
      
    • Short notation(_source_include)

        curl -XGET 'http://localhost:9200/twitter/tweet/1?_source=*.id,retweeted'
      

    Fields

    • Example

      curl -XGET 'http://localhost:9200/twitter/tweet/1?fields=title,content'

    • Backward compatibility

      If the requested fields are not stored, they will be fetched from the _source

      Be replaced by source filtering

    • Field values

      • Document (always array)

      • Meta (never array)

        _routing

        _parent

    • Leaf/Object

      • Leaf success
      • Object fail

    Generated fields

    • No refresh occurred between indexing and refresh

      GET will access the transaction log to fetch the document

    • Some fields are generated ONLY when indexing

      • default

          error
        
      • ignore

          ignore_errors_on_generated_fields=true
        

    Getting the _source directly

    • Direct

        curl -XGET 'http://localhost:9200/twitter/tweet/1/_source'
      
    • Source filtering

        curl -XGET 'http://localhost:9200/twitter/tweet/1/_source?_source_include=*.id&_source_exclude=entities'
      
    • Existence

        curl -XHEAD -i 'http://localhost:9200/twitter/tweet/1/_source'
      

    Routing

    Get:

    	curl -XGET 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
    

    Error:

    {"error":{"root_cause":[
    	{"type":"index_not_found_exception",
    	"reason":"no such index",
    	"resource.type":"index_expression",
    	"resource.id":"twitter",
    	"index":"twitter"}],
    "type":"index_not_found_exception",
    "reason":"no such index",
    "resource.type":"index_expression",
    "resource.id":"twitter",
    "index":"twitter"},"status":404}
    

    Preference

    Controls a preference of which shard replicas to execute the get request on

    • Default

      Random

    • Preference

      • _primary

        The operation will go and be executed only on the primary shards.

      • _local

        The operation will prefer to be executed on a local allocated shard if possible.

    • Custom(string) value

      A custom value will be used to guarantee that the same shards will be used for the same custom value. This can help with "jumping values" when hitting different shards in different refresh states. A sample value can be something like the web session id, or the user name.

    Refresh

    It may cause a heavy load on the system (and slows down indexing)

    Distributed

    1. The get operation gets hashed into a specific shard id.

    2. It then gets redirected to one of the replicas within that shard id and returns the result.

    The replicas are the primary shard and its replicas within that shard id group. This means that the more replicas we will have, the better GET scaling we will have.

    Versioning support

    Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

    Delete API

    Delete:

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1'
    

    Result:

    {
        "_shards" : {
            "total" : 10,
            "failed" : 0,
            "successful" : 10
        },
        "found" : true,
        "_index" : "twitter",
        "_type" : "tweet",
        "_id" : "1",
        "_version" : 2
    }
    

    Versioning

    Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document we are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation executed on a document, deletes included, causes its version to be incremented.

    Routing

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?routing=kimchy'
    

    Note, issuing a delete without the correct routing, will cause the document to not be deleted.

    Many times, the routing value is not known when deleting a document. For those cases, when specifying the _routing mapping as required, and no routing value is specified, the delete will be broadcast automatically to all shards.

    Parent

    Note that deleting a parent document does not automatically delete its children.

    • delete all (parent_type#parent_id)

      delete-by-query plugin

    Automatic index creation

    • Automatically creates an index if it has not been created before

    • Automatically creates a dynamic type mapping for the specific type if it has not been created before

    Distributed

    Write Consistency

    Refresh

    Timeout

    $ curl -XDELETE 'http://localhost:9200/twitter/tweet/1?timeout=5m'
    

    Update API

    curl -XPUT localhost:9200/test/type1/1 -d '{
        "counter" : 1,
        "tags" : ["red"]
    }'
    

    Scripted updates

    • Increment the counter

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : {
                "inline": "ctx._source.counter += count",
                "params" : {
                    "count" : 4
                }
            }
        }'
      
    • Add a tag (no duplication check)

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : {
                "inline": "ctx._source.tags += tag",
                "params" : {
                    "tag" : "blue"
                }
            }
        }'
      
    • ctx map

      • _source
      • _index
      • _type
      • _id
      • _version
      • _routing
      • _parent
      • _timestamp
      • _ttl
    • Add field

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : "ctx._source.name_of_new_field = "value_of_new_field""
        }'
      
    • Remove field

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : "ctx._source.remove("name_of_field")"
        }'
      
    • Condition

      delete or noop

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : {
                "inline": "ctx._source.tags.contains(tag) ? ctx.op = "delete" : ctx.op = "none"",
                "params" : {
                    "tag" : "blue"
                }
            }
        }'
      

    Updates with a partial document

    • Merge

      Simple recursive merge, inner merging of objects, replacing core "keys/values" and arrays

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "doc" : {
                "name" : "new_name"
            }
        }'
      

      script > doc

    Detecting noop updates

    curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
        "doc" : {
            "name" : "new_name"
        },
        "detect_noop": false
    }'
    

    Upserts

    • upsert

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "script" : {
                "inline": "ctx._source.counter += count",
                "params" : {
                    "count" : 4
                }
            },
            "upsert" : {
                "counter" : 1
            }
        }'
      
    • scripted_upsert

        curl -XPOST 'localhost:9200/sessions/session/dh3sgudg8gsrgl/_update' -d '{
            "scripted_upsert":true,
            "script" : {
                "id": "my_web_session_summariser",
                "params" : {
                    "pageViewEvent" : {
                        "url":"foo.com/bar",
                        "response":404,
                        "time":"2014-01-01 12:32"
                    }
                }
            },
            "upsert" : {}
        }'
      
    • doc_as_upsert

        curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
            "doc" : {
                "name" : "new_name"
            },
            "doc_as_upsert" : true
        }'
      

    Parameters

    • retry_on_conflict

      In between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. The retry_on_conflict parameter controls how many times to retry the update before finally throwing an exception.

    • routing

      Routing is used to route the update request to the right shard and sets the routing for the upsert request if the document being updated doesn’t exist. Can’t be used to update the routing of an existing document.

    • parent

      Parent is used to route the update request to the right shard and sets the parent for the upsert request if the document being updated doesn’t exist. Can’t be used to update the parent of an existing document.

    • timeout

      Timeout waiting for a shard to become available.

    • consistency

      The write consistency of the index/delete operation.

    • refresh

      Refresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately.

    • fields

      Return the relevant fields from the updated document. Specify _source to return the full updated source.

    • version & version_type

      The update API uses the Elasticsearch’s versioning support internally to make sure the document doesn’t change during the update. You can use the version parameter to specify that the document should only be updated if it’s version matches the one specified. By setting version type to force you can force the new version of the document after update (use with care! with force there is no guarantee the document didn’t change).Version types external & external_gte are not supported.

    Multi-document APIs

    • mget

        curl 'localhost:9200/_mget' -d '{
            "docs" : [
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "1"
                },
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "2"
                }
            ]
        }'
      
    • against index

        curl 'localhost:9200/test/_mget' -d '{
            "docs" : [
                {
                    "_type" : "type",
                    "_id" : "1"
                },
                {
                    "_type" : "type",
                    "_id" : "2"
                }
            ]
        }'
      
    • against type

        curl 'localhost:9200/test/type/_mget' -d '{
            "docs" : [
                {
                    "_id" : "1"
                },
                {
                    "_id" : "2"
                }
            ]
        }'
      
    • ids

        curl 'localhost:9200/test/type/_mget' -d '{
            "ids" : ["1", "2"]
        }'
      

    Multi Get API

    Optional Type

    • same document twice

        curl 'localhost:9200/test/_mget' -d '{
            "ids" : ["1", "1"]
        }'
      
    • explicit

        GET /test/_mget/
        {
          "docs" : [
                {
                    "_type":"typeA",
                    "_id" : "1"
                },
                {
                    "_type":"typeB",
                    "_id" : "1"
                }
            ]
        }
      

    Source filtering

    _source, _source_include & _source_exclude

    	curl 'localhost:9200/_mget' -d '{
    	    "docs" : [
    	        {
    	            "_index" : "test",
    	            "_type" : "type",
    	            "_id" : "1",
    	            "_source" : false
    	        },
    	        {
    	            "_index" : "test",
    	            "_type" : "type",
    	            "_id" : "2",
    	            "_source" : ["field3", "field4"]
    	        },
    	        {
    	            "_index" : "test",
    	            "_type" : "type",
    	            "_id" : "3",
    	            "_source" : {
    	                "include": ["user"],
    	                "exclude": ["user.location"]
    	            }
    	        }
    	    ]
    	}'
    

    Fields

    • example

        curl 'localhost:9200/_mget' -d '{
            "docs" : [
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "1",
                    "fields" : ["field1", "field2"]
                },
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "2",
                    "fields" : ["field3", "field4"]
                }
            ]
        }'
      
    • specify default

        curl 'localhost:9200/test/type/_mget?fields=field1,field2' -d '{
            "docs" : [
                {
                    "_id" : "1" 
                },
                {
                    "_id" : "2",
                    "fields" : ["field3", "field4"] 
                }
            ]
        }'
      

      result:

        "_id" : "1"  => returns field1 and field2
      
        "_id" : "2"  => returns field3 and field4 
      

    Generated fields

    Fields are generated only when indexing.

    Routing

    • example

        curl 'localhost:9200/_mget?routing=key1' -d '{
            "docs" : [
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "1",
                    "_routing" : "key2"
                },
                {
                    "_index" : "test",
                    "_type" : "type",
                    "_id" : "2"
                }
            ]
        }'
      

      result:

        test/type/1 => key2
        test/type/2 => key1
      

    Security

    URL-based access control

    Bulk API

    The bulk API makes it possible to perform many index/delete operations in a single API call

    • Increase the indexing speed

    • Client support for bulk requests

      • Perl
      • Python
    • Endpoint /_bulk

        action_and_meta_data
      
        optional_source
      
        action_and_meta_data
      
        optional_source
      
        ....
        action_and_meta_data
      
        optional_source
      		 	
      
    • Actions

        index
        create
        delete
        update
      
    • Curl

      • text (--data-binary)

      • document (-d)

        The latter doesn't preserve newlines

          $ cat requests
          { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
          { "field1" : "value1" }
          $ curl -s -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
          {"took":7,"items":[{"create":{"_index":"test","_type":"type1","_id":"1","_version":1}}]}		
        
    • Example

        { "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
        { "field1" : "value1" }
        { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
        { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
        { "field1" : "value3" }
        { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
        { "doc" : {"field2" : "value2"} }
      
    • Above

      • merge

      • endpoints(use default unless explicit)

        /_bulk

        /{index}/_bulk

        {index}/{type}/_bulk

      • only action_meta_data is parsed on the receiving node side (fast)

      • response - large JSON structure

      • number of actions - should be optimized per workload

      • HTTP API - no chunks (slow down)

    Versioning

    Routing

    Parent

    Timestamp

    Deprecated in 2.0.0-beta2.

    TTL

    Deprecated in 2.0.0-beta2.

    Write Consistency

    Refresh

    Update

    • Action inline _retry_on_conflict

    • Supports

      • doc (partial document)
      • upsert
      • doc_as_upsert
      • script
      • params (for script)
      • lang (for script)
      • fields
    • Example

        { "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
        { "doc" : {"field" : "value"} }
        { "update" : { "_id" : "0", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
        { "script" : { "inline": "ctx._source.counter += param1", "lang" : "js", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
        { "update" : {"_id" : "2", "_type" : "type1", "_index" : "index1", "_retry_on_conflict" : 3} }
        { "doc" : {"field" : "value"}, "doc_as_upsert" : true }
        { "update" : {"_id" : "3", "_type" : "type1", "_index" : "index1", "fields" : ["_source"]} }
        { "doc" : {"field" : "value"} }
        { "update" : {"_id" : "4", "_type" : "type1", "_index" : "index1"} }
        { "doc" : {"field" : "value"}, "fields": ["_source"]}
      

    Security

    Term Vectors

    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

    Multi termvectors API

    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-multi-termvectors.html

    Reference

    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html

  • 相关阅读:
    网上常见的分享功能, 比如 点击分享到 人人 微博 空间 等都是怎么做的...
    qq客服代码实现过程
    cnzz友盟怎么安装网站统计代码监控网站流量
    本地部署151688过程记录
    本地部署151688过程记录20110526
    梦里秦淮:互联网商业模式≠成功
    要远离这些平台网站
    阿里旺旺新老版本共存
    豆皮拖鞋穿著確實感覺不太舒服,磨腳,好看是好看,還是沒有哥倫比亞好
    深圳批发市场有哪些好的呢?
  • 原文地址:https://www.cnblogs.com/richaaaard/p/5168308.html
Copyright © 2011-2022 走看看