zoukankan      html  css  js  c++  java
  • 使用pigz快速压缩TB级别文件

    背景:

    实验室的数据需要将搭建的UCSC Genome Browser进行备份,gbdb文件夹和mysql数据库文件夹总数据量将近10T,使用普通的tar和gzip进行压缩的话非常耗时。

    解决方法:

    使用pigz软件进行压缩。

    A parallel implementation of gzip for modernmulti-processor, multi-core machines。

    pigz就是支持并行的gzip,这样就能充分发挥实验室计算机的计算性能了,多开几个线程以提高速度。

    至于软件的性能,具体没做测试,其他人测试的结果如下:

    1、pigz默认用法(默认并发线程是逻辑cpu个数)可比gzip快5.3倍,CPU消耗则是gzip的8倍,压缩比则相当;
    2、并发8线程对比4线程提升:41.2%,16线程对比8线程提升:27.9%,32线程对比16线程提升:3%3、在对压缩效率要求较高、但对短时间内CPU消耗较高不受影响的场景,使用pigz非常合适。

    具体的命令如下:

    压缩文件

    tar -cf - /data/rdata1/gb/gbdb/ | pigz -v -p 32 -c - > gbdb-backup.tar.pigz  

    解压文件

    pigz -dc gbdb-backup.tar.pigz | tar -xvf -

    pigz的参数如下:

    Usage: pigz [options] [files ...]
      will compress files in place, adding the suffix '.gz'.  If no files are
      specified, stdin will be compressed to stdout.  pigz does what gzip does,
      but spreads the work over multiple processors and cores when compressing.
    
    Options:
      -0 to -9, -11        Compression level (11 is much slower, a few % better)
      --fast, --best       Compression levels 1 and 9 respectively
      -b, --blocksize mmm  Set compression block size to mmmK (default 128K)
      -c, --stdout         Write all processed output to stdout (won't delete)
      -d, --decompress     Decompress the compressed input
      -f, --force          Force overwrite, compress .gz, links, and to terminal
      -F  --first          Do iterations first, before block split for -11
      -h, --help           Display a help screen and quit
      -i, --independent    Compress blocks independently for damage recovery
      -I, --iterations n   Number of iterations for -11 optimization
      -k, --keep           Do not delete original file after processing
      -K, --zip            Compress to PKWare zip (.zip) single entry format
      -l, --list           List the contents of the compressed input
      -L, --license        Display the pigz license and quit
      -M, --maxsplits n    Maximum number of split blocks for -11
      -n, --no-name        Do not store or restore file name in/from header
      -N, --name           Store/restore file name and mod time in/from header
      -O  --oneblock       Do not split into smaller blocks for -11
      -p, --processes n    Allow up to n compression threads (default is the
                           number of online processors, or 8 if unknown)
      -q, --quiet          Print no messages, even on error
      -r, --recursive      Process the contents of all subdirectories
      -R, --rsyncable      Input-determined block locations for rsync
      -S, --suffix .sss    Use suffix .sss instead of .gz (for compression)
      -t, --test           Test the integrity of the compressed input
      -T, --no-time        Do not store or restore mod time in/from header
      -v, --verbose        Provide more verbose output
      -V  --version        Show the version of pigz
      -z, --zlib           Compress to zlib (.zz) instead of gzip format
      --                   All arguments after "--" are treated as files
  • 相关阅读:
    ckeditor添加插入flv视频的插件
    使用JWPlayer在网页中嵌入视频
    java使用ffmpeg和mencoder做视频格式转换
    spring支持的websocket
    tomcat支持的websocket服务
    MicrosoftRootCertificateAuthority2011.cer 下载
    java读取json文件进行解析,String转json对象
    记一次nmap扫描信息收集过程
    java随机分配端口占用其它服务端口问题完美解决
    申请Let's Encrypt永久免费SSL证书
  • 原文地址:https://www.cnblogs.com/lyhonk/p/4557137.html
Copyright © 2011-2022 走看看