zoukankan      html  css  js  c++  java
  • 将PostgreSQL数据库的表导入到elasticsearch中

    1.查看PostgreSQL表结构和数据信息

    edbstore=# d customers
                                              Table "edbstore.customers"
            Column        |         Type          |                           Modifiers                            
    ----------------------+-----------------------+----------------------------------------------------------------
     customerid           | integer               | not null default nextval('customers_customerid_seq'::regclass)
     firstname            | character varying(50) | not null
     lastname             | character varying(50) | not null
     address1             | character varying(50) | not null
     address2             | character varying(50) | 
     city                 | character varying(50) | not null
     state                | character varying(50) | 
     zip                  | integer               | 
     country              | character varying(50) | not null
     region               | smallint              | not null
     email                | character varying(50) | 
     phone                | character varying(50) | 
     creditcardtype       | integer               | not null
     creditcard           | character varying(50) | not null
     creditcardexpiration | character varying(50) | not null
     username             | character varying(50) | not null
     password             | character varying(50) | not null
     age                  | smallint              | 
     income               | integer               | 
     gender               | character varying(1)  | 
    Indexes:
        "customers_pkey" PRIMARY KEY, btree (customerid)
        "ix_cust_username" UNIQUE, btree (username)
    Referenced by:
        TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
        TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL
    
    edbstore=# select count(1) from customers;
     count 
    -------
     20000
    (1 row)

    2.利用PostgreSQL的row_to_json函数将表结构导出并保存为json格式

    edbstore=# 	
    Tuples only is on.
    edbstore=# o customer.json
    edbstore=# select row_to_json(r) from customers as r;
    edbstore=# q
    
    [postgres@sht-sgmhadoopcm-01 dba]$ ls -lh customer.json 
    -rw-r--r-- 1 postgres appuser 7.7M Dec  7 22:37 customer.json
    
    $ head -1 customer.json 
     {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}

    此时customer表虽然转储为json格式文件,但是并不能直接导入到elasticsearch,否则会报错如下

    $ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/bank/_bulk?pretty&refresh" --data-binary "@customer.json"
    {
      "error" : {
        "root_cause" : [
          {
            "type" : "illegal_argument_exception",
            "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
          }
        ],
        "type" : "illegal_argument_exception",
        "reason" : "Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [VALUE_NUMBER]"
      },
      "status" : 400
    }

     根据文档https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html说明,我们的json数据里并未指定每行数据唯一的文档id值

    3.为json格式的表数据添加id字段

    因为之前我们看到该customer表共有2000行,所以我们需要生成对应的20000个id值,我们借助python实现,新建build_id.py文件,并写入如下内容,看清楚是20001,因为包头不包尾原则,1-20000实际打印出来是1-19999,所以我们写1-20001

    for i in range(1,20001):
        print('{"index":{"_id":"%s"}}' %i ) 

    为该文件添加可执行权限,然后执行即可

    $ python build_id.py > build_id.txt
    
    $ head -3 build_id.txt 
    {"index":{"_id":"1"}}
    {"index":{"_id":"2"}}
    {"index":{"_id":"3"}}

    利用linux “paste"命令,将id文件和表文件合并

    $ paste -d'
    ' build_id.txt customer.json > customer_new.json
    
    $ head -4 customer_new.json 
    {"index":{"_id":"1"}}
     {"customerid":1,"firstname":"VKUUXF","lastname":"ITHOMQJNYX","address1":"4608499546 Dell Way","address2":null,"city":"QSDPAGD","state":"SD","zip":24101,"country":"US","region":1,"email":"ITHOMQJNYX@dell.com","phone":"4608499546","creditcardtype":1,"creditcard":"1979279217775911","creditcardexpiration":"2012/03","username":"user1","password":"password","age":55,"income":100000,"gender":"M"}
    {"index":{"_id":"2"}}
     {"customerid":2,"firstname":"HQNMZH","lastname":"UNUKXHJVXB","address1":"5119315633 Dell Way","address2":null,"city":"YNCERXJ","state":"AZ","zip":11802,"country":"US","region":1,"email":"UNUKXHJVXB@dell.com","phone":"5119315633","creditcardtype":1,"creditcard":"3144519586581737","creditcardexpiration":"2012/11","username":"user2","password":"password","age":80,"income":40000,"gender":"M"}

     4.此时处理过的json格式的表文件就可以正常导入到elasticsearch中了,测试

    $ curl -H "Content-Type: application/json" -XPOST "172.16.101.55:9200/customer/_bulk?pretty&refresh" --data-binary "@customer_new.json"
    $ curl http://172.16.101.55:9200/_cat/indices?v
    health status index    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
    yellow open   customer DvLoM7NjSYyjTwD5BSkK3A   1   1      20000            0       10mb           10mb
  • 相关阅读:
    Redis持久化之RDB
    linux中查看进程中的线程
    Redis客户端
    Redis之GEO
    Redis之发布订阅
    Redis之HyperLogLog
    CSP-S2020游记
    根据表名 查询 表的列,备注,类型等 SQL
    mybatis-plus的使用 ------ 入门
    IntelliJ IDEA 版本控制(svn、git) 修改文件后,所属目录的颜色也变化
  • 原文地址:https://www.cnblogs.com/ilifeilong/p/12003888.html
Copyright © 2011-2022 走看看