zoukankan      html  css  js  c++  java
  • pg_bulkload使用记录

    很久之前就使用过pg_bulkload来导入数据了,并做了对比试验,现在另一个项目又需要用了,这里做个记录:

    1.rpm包比较老,下下来之后发现只支持到pg94,目前我用的是pg10,因此放弃。

    2.下载源码安装:

    git clone https://github.com/ossc-db/pg_bulkload.git

    cd pg_bulkload

    make && make install

    --这里他会读取pg_config来获取pg的环境变量。

    3.在要使用的数据库中执行:

    create extension pg_bulkload;

    4.导入csv文件:

    pg_bulkload -i c_xxx.csv -O c_xxx -l c_xxx_load.log -d xxx -o "TYPE=CSV" -o "WRITER=PARALLEL"

    5.导入压缩文件:

    zcat c_xxx.gz |pg_bulkload -i stdin -O c_xxx -l c_xxx_load.log -d xxx -o "TYPE=CSV" -o "WRITER=PARALLEL"

    6.关于-o的选项在help中没有,我们可以通过导入的log来看有哪些参数可以配置:

    pg_bulkload 3.1.14 on 2018-09-28 11:31:12.641693+08
    
    INPUT = stdin
    PARSE_BADFILE = /var/lib/pgsql/pg10/data/pg_bulkload/20180928113112_sgdw_public_c_xxx.prs
    LOGFILE = /var/lib/pgsql/sgdw/data/c_xxx_load.log
    LIMIT = INFINITE
    PARSE_ERRORS = 0
    ENCODING = UTF8
    CHECK_CONSTRAINTS = NO
    TYPE = CSV
    SKIP = 0
    DELIMITER = ,
    QUOTE = """
    ESCAPE = """
    NULL =
    OUTPUT = public.c_xxx
    MULTI_PROCESS = YES
    VERBOSE = NO
    WRITER = DIRECT
    DUPLICATE_BADFILE = /var/lib/pgsql/pg10/data/pg_bulkload/20180928113112_sgdw_public_c_xxx.dup.csv
    DUPLICATE_ERRORS = 0
    ON_DUPLICATE_KEEP = NEW
    TRUNCATE = YES
    
    
      0 Rows skipped.
      29423400 Rows successfully loaded.
      0 Rows not loaded due to parse errors.
      0 Rows not loaded due to duplicate errors.
      0 Rows replaced with new rows.
    
    Run began on 2018-09-28 11:31:12.641693+08
    Run ended on 2018-09-28 11:39:48.835205+08
    
    CPU 2.63s/399.05u sec elapsed 516.19 sec
    

    理论上黑体的都是可以配置的,比如配置为verbose为yes,那就在后面加一个-o "verbose=yes"

    另外:默认逗号分隔,双引号将值括起来,默认直接写。如果忘记了,就导一个默认的,看看log就知道了。

    附一个批量的脚本:

     1 -bash-4.1$ cat load.sh
     2 #!/bin/sh
     3 
     4 #$1 data fil ename
     5 
     6 file=$1
     7 
     8 if [ ! -f $file  ]
     9 then
    10     echo "File is not exist"
    11     exit 1
    12 fi
    13 
    14 echo "-----------------------------------------------------------------"
    15 
    16 tbname=$( echo $file |cut -d . -f1 )
    17 echo "Table name is : "$tbname
    18 
    19 zcat $file|pg_bulkload -i stdin -O public.$tbname -l $tbname.log -o "TYPE=CSV" -o "WRITER=PARALLEL" -d sgdw
    20 
    21 echo "load complete"
    22 echo "-----------------------------------------------------------------"
    View Code
  • 相关阅读:
    【2019-08-03】自卑和悲观是有区别的
    你现在不用写代码了吧?
    【2019-08-02】信任是一种能力
    【2019-08-01】给孩子一个渴望长大的榜样
    【一句日历】2019年8月
    【2019-07-31】一切皆有寓意
    【2019-07-30】原来努力会上瘾
    【2019-07-29】睡多了,会被宰的
    【2019-07-28】活到老,学到老
    【2019-07-27】习惯的力量很强大
  • 原文地址:https://www.cnblogs.com/kuang17/p/9717997.html
Copyright © 2011-2022 走看看