zoukankan      html  css  js  c++  java
  • Awk基本入门[2] Awk Builtin Variables

    1、FS - Input Field Separator


    awk处理文档时,默认的域分隔符为空格,所以如果你的输入文件的域分隔符不是空格,可以通过-F选项来指定分隔符,如下所示:

    awk -F ',' '{print $2, $3}' employee.txt

    我们也可以使用awk内置变量FS来设置分隔符,需要在BEGIN块里设置:

    awk 'BEGIN {FS=","} {print $2, $3}' employee.txt

    我们还可以指定多个域分隔符,例如存在以下记录文件,其中的每条记录包含3个不同的域分隔符:逗号、冒号和百分号:

    $ vi employee-multiple-fs.txt
    101,John Doe:CEO%10000
    102,Jason Smith:IT Manager%5000
    103,Raj Reddy:Sysadmin%4500
    104,Anand Ram:Developer%4500
    105,Jane Miller:Sales Manager%3000

    You can specify MULTIPLE field separators using a regular expression. For example FS = "[,:%]" indicates that the field separator can be , or : or %

    So, the following example will print the name and the title from the employee-multiple-fs.txt file that contains different field separators.

    $ awk 'BEGIN {FS="[,:%]"} {print $2, $3}' \
    employee-multiple-fs.txt
    John Doe CEO
    Jason Smith IT Manager
    Raj Reddy Sysadmin
    Anand Ram Developer
    Jane Miller Sales Manager

    2、FIELDWIDTHS


    awk默认使用FS指定的字符(串或正则表达式)作为输入域分隔依据,但是也可以使用FIELDWIDTHS指定每一列的宽度以分隔输入域,例如:

    $ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2"} {$1=$1;print $0}'
    a bc
    $ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2 3"} {$1=$1;print $0}'
    a bc def

    参考:http://www.gnu.org/software/gawk/manual/html_node/Constant-Size.html

    3、FPAT


    假设存在以下的scv文件(逗号分隔值),内容为如下格式:

    $ cat addresses.csv
    Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA

    注意到其中的地址字段("1234 A Pretty Street, NE")中包含了一个“,”,如果采用了FS=","来分隔输入域,则地址会被拆分成两部分:

    "1234 A Pretty Street NE

    这不是我们想要的结果

    针对这样的场景,我么可以使用内置变量FPAT来解决问题。FPAT的值是一个正则表达式,该正则表达式描述了每一个域的内容。

    针对上述场景中的csv文件,每个域或者是不包含","的字符串,或者是由一对双引号括起来的字符串。

    因此,我们可以这样来解决:

    $ cat simple-csv.awk 
    BEGIN {
             FPAT = "([^,]+)|(\"[^\"]+\")"
         }
         
         {
             print "NF = ", NF
             for (i = 1; i <= NF; i++) {
                 printf("$%d = <%s>\n", i, $i)
             }
         }
     $ gawk -f simple-csv.awk addresses.csv
    NF =  7
    $1 = <Robbins>
    $2 = <Arnold>
    $3 = <"1234 A Pretty Street, NE">
    $4 = <MyTown>
    $5 = <MyState>
    $6 = <12345-6789>
    $7 = <USA>

    可以看到地址被作为一个域而存在了。

    参考:http://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

    3、OFS - Output Field Separator


    OFS表示输出分隔符,用以在输出时作为连续域之间的分隔符。默认的域分隔符为空格。

    When you use a single print statement to print two 

    variables by separating them with comma (as shown below), it will print the values of those two variables separated by space.

    $ awk -F ',' '{print $2, $3}' employee.txt
    John Doe CEO
    Jason Smith IT Manager
    Raj Reddy Sysadmin
    Anand Ram Developer
    Jane Miller Sales Manager

    The following print statement is printing two variables ($2 and $4) separated by comma, however the output will have colon separating them (instead of space), as our OFS is set to colon.

    $ awk -F ',' 'BEGIN { OFS=":" } \
    { print $2, $3 }' employee.txt
    John Doe:CEO
    Jason Smith:IT Manager
    Raj Reddy:Sysadmin
    Anand Ram:Developer
    Jane Miller:Sales Manager

    When you specify a comma in the print statement between different print values, awk will use the OFS. In the following example, the default OFS is used, so you'll see a space between the values in the output.

    $ awk 'BEGIN { print "test1","test2" }'
    test1 test2

    When you don't separate values with a comma in the print statement, awk will not use the OFS; instead it will print the values with nothing in between.

    $ awk 'BEGIN { print "test1" "test2" }'
    test1test2

    4、RS - Record Separator


     假设存在以下的数据文件:

    $ vi employee-one-line.txt
    101,John Doe:102,Jason Smith:103,Raj Reddy:104,Anand
    Ram:105,Jane Miller

    在这个文件中,每条记录由两部分组成(编号和姓名),记录之间用冒号分隔而非换行,而每条记录中的两个域则由逗号分隔。

    awk默认使用换行作为记录分隔符,如果你试图只打印所有员工的姓名,则以下方法是行不通的:

    $ awk -F, '{print $2}' employee-one-line.txt
    John Doe:102

    这是因为awk将整行文本作为一条记录,而且逗号作为域分隔符,所以第二个域就是John Doe:102。所以如果想要将整行文本作为5条记录来处理,需要显示的指定记录分隔符:

    $ awk -F, 'BEGIN { RS=":" } \
    { print $2 }' employee-one-line.txt
    John Doe
    Jason Smith
    Raj Reddy
    Anand Ram
    Jane Miller

    5、ORS - Output Record Separator


    默认情况下,awk在输出记录时使用换行来分隔每条记录,可以通过指定变量ORS来显示的指定输出记录分隔符:

    $ awk 'BEGIN { FS=","; ORS="\n---\n" } \
    {print $2, $3}' employee.txt
    John Doe CEO
    ---
    Jason Smith IT Manager
    ---
    Raj Reddy Sysadmin
    ---
    Anand Ram Developer
    ---
    Jane Miller Sales Manager
    ---

    6、NR - Number of Records


     

    NR is very helpful. When used inside the loop, this gives the line number. When used in the END block, this gives the total number of records in the file.

    The following example shows how NR works in the body block,and in the END block:

    $ awk 'BEGIN {FS=","} \
    {print "Emp Id of record number",NR,"is",$1;} \
    END {print "Total number of records:",NR}' employee.txt
    Emp Id of record number 1 is 101
    Emp Id of record number 2 is 102
    Emp Id of record number 3 is 103
    Emp Id of record number 4 is 104
    Emp Id of record number 5 is 105
    Total number of records: 5

    7、FILENAME – Current File Name


     

    FILENAME is helpful when you are specifying multiple input-files to the awk program. This will give you the name of the file Awk is currently processing.

    $ awk '{ print FILENAME }' \
    employee.txt employee-multiple-fs.txt
    employee.txt
    employee.txt
    employee.txt
    employee.txt
    employee.txt
    employee-multiple-fs.txt
    employee-multiple-fs.txt
    employee-multiple-fs.txt
    employee-multiple-fs.txt
    employee-multiple-fs.txt

    8、FNR - File "Number of Record"


     

    NR keeps
    growing between multiple files. When the body block starts processing the 2nd file, NR will not be reset to 1, instead it will continue from the last NR number value of the previous file.

    FNR will give you record number within the current file. So, when awk finishes executing the body block for the 1st file and starts the body block the next file, FNR will start from 1 again.

    The following example shows both NR and FNR:

    $ vi fnr.awk
    BEGIN {
    FS=","
    }
    {
    printf "FILENAME=%s NR=%s FNR=%s\n", FILENAME, NR,
    FNR;
    }
    END {
    printf "END Block: NR=%s FNR=%s\n", NR, FNR
    }
    $ awk -f fnr.awk employee.txt employee-multiple-fs.txt
    FILENAME=employee.txt NR=1 FNR=1
    FILENAME=employee.txt NR=2 FNR=2
    FILENAME=employee.txt NR=3 FNR=3
    FILENAME=employee.txt NR=4 FNR=4
    FILENAME=employee.txt NR=5 FNR=5
    FILENAME=employee-multiple-fs.txt NR=6 FNR=1
    FILENAME=employee-multiple-fs.txt NR=7 FNR=2
    FILENAME=employee-multiple-fs.txt NR=8 FNR=3
    FILENAME=employee-multiple-fs.txt NR=9 FNR=4
    FILENAME=employee-multiple-fs.txt NR=10 FNR=5
    END Block: NR=10 FNR=5
  • 相关阅读:
    iOS7——UIControlEventTouchDown延迟响应问题
    View.setTag(key,object)异常:The key must be an application-specific resource id.
    为什么阿里巴巴规定禁止超过三张表 join?
    四种常见的系统架构,目前你处于哪个阶段呢?
    JAVA BigDecimal的相加(累加)
    MyBatis中Like语句使用方式
    mybatis传参的几种方式
    英语说话方式思维和汉语说话的区别
    EXTJs前端传值的几种方式
    oracle+MyBatis批量导入sublist
  • 原文地址:https://www.cnblogs.com/yangfengtao/p/3124100.html
Copyright © 2011-2022 走看看