zoukankan      html  css  js  c++  java
  • 15 Linux Split and Join Command Examples to Manage Large Files--reference

    by HIMANSHU ARORA on OCTOBER 16, 2012

    http://www.thegeekstuff.com/2012/10/15-linux-split-and-join-command-examples-to-manage-large-files/

    Linux split and join commands are very helpful when you are manipulating large files. This article explains how to use Linux split and join command with descriptive examples.

    Join and split command syntax:

    join [OPTION]… FILE1 FILE2
    split [OPTION]… [INPUT [PREFIX]]

    Linux Split Command Examples

    1. Basic Split Example

    Here is a basic example of split command.

    $ split split.zip 
    
    $ ls
    split.zip  xab  xad  xaf  xah  xaj  xal  xan  xap  xar  xat  xav  xax  xaz  xbb  xbd  xbf  xbh  xbj  xbl  xbn
    xaa        xac  xae  xag  xai  xak  xam  xao  xaq  xas  xau  xaw  xay  xba  xbc  xbe  xbg  xbi  xbk  xbm  xbo

    So we see that the file split.zip was split into smaller files with x** as file names. Where ** is the two character suffix that is added by default. Also, by default each x** file would contain 1000 lines.

    $ wc -l *
       40947 split.zip
        1000 xaa
        1000 xab
        1000 xac
        1000 xad
        1000 xae
        1000 xaf
        1000 xag
        1000 xah
        1000 xai
    ...
    ...
    ...

    So the output above confirms that by default each x** file contains 1000 lines.

    2.Change the Suffix Length using -a option

    As discussed in example 1 above, the default suffix length is 2. But this can be changed by using -a option.

    As you see in the following example, it is using suffix of length 5 on the split files.

    $ split -a5 split.zip
    $ ls
    split.zip  xaaaac  xaaaaf  xaaaai  xaaaal  xaaaao  xaaaar  xaaaau  xaaaax  xaaaba  xaaabd  xaaabg  xaaabj  xaaabm
    xaaaaa     xaaaad  xaaaag  xaaaaj  xaaaam  xaaaap  xaaaas  xaaaav  xaaaay  xaaabb  xaaabe  xaaabh  xaaabk  xaaabn
    xaaaab     xaaaae  xaaaah  xaaaak  xaaaan  xaaaaq  xaaaat  xaaaaw  xaaaaz  xaaabc  xaaabf  xaaabi  xaaabl  xaaabo

    Note: Earlier we also discussed about other file manipulation utilities – tac, rev, paste.

    3.Customize Split File Size using -b option

    Size of each output split file can be controlled using -b option.

    In this example, the split files were created with a size of 200000 bytes.

    $ split -b200000 split.zip 
    
    $ ls -lart
    total 21084
    drwxrwxr-x 3 himanshu himanshu     4096 Sep 26 21:20 ..
    -rw-rw-r-- 1 himanshu himanshu 10767315 Sep 26 21:21 split.zip
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xad
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xac
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xab
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaa
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xah
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xag
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xaf
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xae
    -rw-rw-r-- 1 himanshu himanshu   200000 Sep 26 21:35 xar
    ...
    ...
    ...

    4. Create Split Files with Numeric Suffix using -d option

    As seen in examples above, the output has the format of x** where ** are alphabets. You can change this to number using -d option.

    Here is an example. This has numeric suffix on the split files.

    $ split -d split.zip
    $ ls
    split.zip  x01  x03  x05  x07  x09  x11  x13  x15  x17  x19  x21  x23  x25  x27  x29  x31  x33  x35  x37  x39
    x00        x02  x04  x06  x08  x10  x12  x14  x16  x18  x20  x22  x24  x26  x28  x30  x32  x34  x36  x38  x40

    5. Customize the Number of Split Chunks using -C option

    To get control over the number of chunks, use the -C option.

    This example will create 50 chunks of split files.

    $ split -n50 split.zip
    $ ls
    split.zip  xac  xaf  xai  xal  xao  xar  xau  xax  xba  xbd  xbg  xbj  xbm  xbp  xbs  xbv
    xaa        xad  xag  xaj  xam  xap  xas  xav  xay  xbb  xbe  xbh  xbk  xbn  xbq  xbt  xbw
    xab        xae  xah  xak  xan  xaq  xat  xaw  xaz  xbc  xbf  xbi  xbl  xbo  xbr  xbu  xbx

    6. Avoid Zero Sized Chunks using -e option

    While splitting a relatively small file in large number of chunks, its good to avoid zero sized chunks as they do not add any value. This can be done using -e option.

    Here is an example:

    $ split -n50 testfile
    
    $ ls -lart x*
    -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xag
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaf
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xae
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xad
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xac
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xab
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:55 xaa
    -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbx
    -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbw
    -rw-rw-r-- 1 himanshu himanshu 0 Sep 26 21:55 xbv
    ...
    ...
    ...

    So we see that lots of zero size chunks were produced in the above output. Now, lets use -e option and see the results:

    $ split -n50 -e testfile
    $ ls
    split.zip  testfile  xaa  xab  xac  xad  xae  xaf
    
    $ ls -lart x*
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaf
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xae
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xad
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xac
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xab
    -rw-rw-r-- 1 himanshu himanshu 1 Sep 26 21:57 xaa

    So we see that no zero sized chunk was produced in the above output.

    7. Customize Number of Lines using -l option

    Number of lines per output split file can be customized using the -l option.

    As seen in the example below, split files are created with 20000 lines.

    $ split -l20000 split.zip
    
    $ ls
    split.zip  testfile  xaa  xab  xac
    
    $ wc -l x*
       20000 xaa
       20000 xab
         947 xac
       40947 total

    Get Detailed Information using –verbose option

    To get a diagnostic message each time a new split file is opened, use –verbose option as shown below.

    $ split -l20000 --verbose split.zip
    creating file `xaa'
    creating file `xab'
    creating file `xac'

    Linux Join Command Examples

    8. Basic Join Example

    Join command works on first field of the two files (supplied as input) by matching the first fields.

    Here is an example :

    $ cat testfile1
    1 India
    2 US
    3 Ireland
    4 UK
    5 Canada
    
    $ cat testfile2
    1 NewDelhi
    2 Washington
    3 Dublin
    4 London
    5 Toronto
    
    $ join testfile1 testfile2
    1 India NewDelhi
    2 US Washington
    3 Ireland Dublin
    4 UK London
    5 Canada Toronto

    So we see that a file containing countries was joined with another file containing capitals on the basis of first field.

    9. Join works on Sorted List

    If any of the two files supplied to join command is not sorted then it shows up a warning in output and that particular entry is not joined.

    In this example, since the input file is not sorted, it will display a warning/error message.

    $ cat testfile1
    1 India
    2 US
    3 Ireland
    5 Canada
    4 UK
    
    $ cat testfile2
    1 NewDelhi
    2 Washington
    3 Dublin
    4 London
    5 Toronto
    
    $ join testfile1 testfile2
    1 India NewDelhi
    2 US Washington
    3 Ireland Dublin
    join: testfile1:5: is not sorted: 4 UK
    5 Canada Toronto

    10. Ignore Case using -i option

    When comparing fields, the difference in case can be ignored using -i option as shown below.

    $ cat testfile1
    a India
    b US
    c Ireland
    d UK
    e Canada
    
    $ cat testfile2
    a NewDelhi
    B Washington
    c Dublin
    d London
    e Toronto
    
    $ join testfile1 testfile2
    a India NewDelhi
    c Ireland Dublin
    d UK London
    e Canada Toronto
    
    $ join -i testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London
    e Canada Toronto

    11. Verify that Input is Sorted using –check-order option

    Here is an example. Since testfile1 was unsorted towards the end so an error was produced in the output.

    $ cat testfile1
    a India
    b US
    c Ireland
    d UK
    f Australia
    e Canada
    
    $ cat testfile2
    a NewDelhi
    b Washington
    c Dublin
    d London
    e Toronto
    
    $ join --check-order testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London
    join: testfile1:6: is not sorted: e Canada

    12. Do not Check the Sortness using –nocheck-order option

    This is the opposite of the previous example. No check for sortness is done in this example, and it will not display any error message.

    $ join --nocheck-order testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London

    13. Print Unpairable Lines using -a option

    If both the input files cannot be mapped one to one then through -a[FILENUM] option we can have those lines that cannot be paired while comparing. FILENUM is the file number (1 or 2).

    In the following example, we see that using -a1 produced the last line in testfile1 (marked as bold below) which had no pair in testfile2.

    $ cat testfile1
    a India
    b US
    c Ireland
    d UK
    e Canada
    f Australia
    
    $ cat testfile2
    a NewDelhi
    b Washington
    c Dublin
    d London
    e Toronto
    
    $ join testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London
    e Canada Toronto
    
    $ join -a1 testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London
    e Canada Toronto
    f Australia

    14. Print Only Unpaired Lines using -v option

    In the above example both paired and unpaired lines were produced in the output. But, if only unpaired output is desired then use -v option as shown below.

    $ join -v1 testfile1 testfile2
    f Australia

    15. Join Based on Different Columns from Both Files using -1 and -2 option

    By default the first columns in both the files is used for comparing before joining. You can change this behavior using -1 and -2 option.

    In the following example, the first column of testfile1 was compared with the second column of testfile2 to produce the join command output.

    $ cat testfile1
    a India
    b US
    c Ireland
    d UK
    e Canada
    
    $ cat testfile2
    NewDelhi a
    Washington b
    Dublin c
    London d
    Toronto e
    
    $ join -1 1 -2 2 testfile1 testfile2
    a India NewDelhi
    b US Washington
    c Ireland Dublin
    d UK London
    e Canada Toronto
  • 相关阅读:
    POJ 1330 Nearest Common Ancestors (LCA)
    POJ 3264 Balanced Lineup (RMQ | 线段树 | ST )
    .Net开发笔记(七)使用组件编程
    .Net开发笔记(六)关于事件(续)
    .Net开发笔记(十一) 设计时(DesignTime)和运行时(RunTime)的区别
    .Net开发笔记(十) “容器组件服务”模型
    .Net开发笔记(九)自定义窗体设计器
    .Net开发笔记(五) 关于事件
    .Net开发笔记(八) 动态编译
    10 款最新的 jQuery 内容滑块插件
  • 原文地址:https://www.cnblogs.com/davidwang456/p/3715704.html
Copyright © 2011-2022 走看看