zoukankan      html  css  js  c++  java
  • Linux 大文件的分割与合并【转】

    1.分割 -- split命令

    可以指定按行数分割和按字节大小分割两种模式。

    (1) 按行数分割

    $ split -l 300 large_file.txt new_file_prefix

    加上-d,使用数字后缀;加上--verbose,显示分割进度:

    $ split -l50000 -d large_file.txt part_ --verbose

    (2) 按字节大小分割

    $ split -b 10m  -d large_file.log new_file_prefix

    2.合并 -- cat命令

    $ cat part_* > merge_file.txt

    [注] split命令语法:

    复制代码
    $ split --h
    Usage: split [OPTION]... [FILE [PREFIX]]
    Output pieces of FILE to PREFIXaa, PREFIXab, ...;
    default size is 1000 lines, and default PREFIX is 'x'.
    
    With no FILE, or when FILE is -, read standard input.
    
    Mandatory arguments to long options are mandatory for short options too.
      -a, --suffix-length=N   generate suffixes of length N (default 2)            后缀名称的长度 (默认为2) 
          --additional-suffix=SUFFIX  append an additional SUFFIX to file names
      -b, --bytes=SIZE        put SIZE bytes per output file                       每个输出文件的字节大小
      -C, --line-bytes=SIZE   put at most SIZE bytes of records per output file    每个输出文件每行的最大字节大小
      -d                      use numeric suffixes starting at 0, not alphabetic   使用数字后缀代替字母后缀
          --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
      -e, --elide-empty-files  do not generate empty output files with '-n'        不产生空的输出文件
          --filter=COMMAND    write to shell COMMAND; file name is $FILE           写入到shell命令行
      -l, --lines=NUMBER      put NUMBER lines/records per output file             设定每个输出文件的行数,默认行数是1000行
      -n, --number=CHUNKS     generate CHUNKS output files; see explanation below  产生chunks文件
      -t, --separator=SEP     use SEP instead of newline as the record separator;  使用新字符分割
                                '' (zero) specifies the NUL character
      -u, --unbuffered        immediately copy input to output with '-n r/...'     无需缓存
          --verbose           print a diagnostic just before each                  显示分割进度
                                output file is opened
          --help     display this help and exit                                    显示帮助信息
          --version  output version information and exit                           显示版本信息
    
    The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
    Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
    
    CHUNKS may be:
      N       split into N files based on size of input
      K/N     output Kth of N to stdout
      l/N     split into N files without splitting lines/records
      l/K/N   output Kth of N to stdout without splitting lines/records
      r/N     like 'l' but use round robin distribution
      r/K/N   likewise but only output Kth of N to stdout
    
    GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
    Full documentation at: <http://www.gnu.org/software/coreutils/split>
    or available locally via: info '(coreutils) split invocation'
    复制代码

    cat命令语法:

    复制代码
    $ cat --h
    Usage: cat [OPTION]... [FILE]...
    Concatenate FILE(s) to standard output.
    
    With no FILE, or when FILE is -, read standard input.
    
      -A, --show-all           equivalent to -vET
      -b, --number-nonblank    number nonempty output lines, overrides -n
      -e                       equivalent to -vE
      -E, --show-ends          display $ at end of each line
      -n, --number             number all output lines
      -s, --squeeze-blank      suppress repeated empty output lines
      -t                       equivalent to -vT
      -T, --show-tabs          display TAB characters as ^I
      -u                       (ignored)
      -v, --show-nonprinting   use ^ and M- notation, except for LFD and TAB
          --help     display this help and exit
          --version  output version information and exit
    
    Examples:
      cat f - g  Output f's contents, then standard input, then g's contents.
      cat        Copy standard input to standard output.
    
    GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
    Full documentation at: <http://www.gnu.org/software/coreutils/cat>
    or available locally via: info '(coreutils) cat invocation'
    复制代码

    参考

    Linux 大文件的分割与合并==>https://www.cnblogs.com/bymo/p/7571320.html

    感觉空虚寂寞,只是因为你无所关注,无处付出。
  • 相关阅读:
    Python使用SMTP模块、email模块发送邮件
    harbor搭建及使用
    ELK搭建-windows
    ELK技术栈之-Logstash详解
    【leetcode】1078. Occurrences After Bigram
    【leetcode】1073. Adding Two Negabinary Numbers
    【leetcode】1071. Greatest Common Divisor of Strings
    【leetcode】449. Serialize and Deserialize BST
    【leetcode】1039. Minimum Score Triangulation of Polygon
    【leetcode】486. Predict the Winner
  • 原文地址:https://www.cnblogs.com/whatlonelytear/p/13811255.html
Copyright © 2011-2022 走看看