4、 Generic String Functions
Index Function
The index function can be used to get the index (location) of the given string (or character) in an input string.
You can also use index to check whether a given string (or character) is present in an input string. If the given string is not present, it will return the location as 0, which means the given string doesn't exist, as shown below.
$ cat index.awk BEGIN { state="CA is California" print "String CA starts at location",index(state,"CA"); print "String Cali starts at location",index(state,"Cali"); if (index(state,"NY")==0) print "String NY is not found in:", state } $ awk -f index.awk String CA starts at location 1 String Cali starts at location 7 String NY is not found in: CA is California
Length Function
The length function returns the length of a string. In the following example, we print the total number of characters in each record of
the items.txt file.
$ awk '{print length($0)}' items.txt 29 32 27 31 30
Split Function
Syntax:
split(input-string,output-array,separator)
This split function splits a string into individual array elements. It takes following three arguments.
• input-string: This is the input string that needs to be split into multiple strings.
• output-array: This array will contain the split strings as individual elements.
• separator: The separator that should be used to split the input-string.
For this example, the original items-sold.txt file is slightly changed to have different field delimiters, i.e. a colon to separate the item number and the quantity sold. Within quantity sold, the individual quantities are separated by comma.
So, in order for us to calculate the total number of items sold for a particular item, we should take the 2nd field (which is all the quantities sold delimited by comma), split them using comma separator and store the substrings in an array, then loop through the array to add the quantities.
$ cat items-sold1.txt 101:2,10,5,8,10,12 102:0,1,4,3,0,2 103:10,6,11,20,5,13 104:2,3,4,0,6,5 105:10,2,5,7,12,6 $ cat split.awk BEGIN { FS=":" } { split($2,quantity,","); total=0; for (x in quantity) total=total+quantity[x]; print "Item", $1, ":", total, "quantities sold"; }
$ awk -f split.awk items-sold1.txt Item 101 : 47 quantities sold Item 102 : 10 quantities sold Item 103 : 65 quantities sold Item 104 : 20 quantities sold Item 105 : 42 quantities sold
Substr Function
Syntax:
substr(input-string, location, length)
The substr function extracts a portion of a given string. In the above syntax:
• input-string: The input string containing the substring.
• location: The starting location of the substring.
• length: The total number of characters to extract from the starting location. This parameter is optional. When you don't specify it extracts the rest of the characters from the starting location.
Start from the 1st character (of the 2nd field) and prints 5 characters:
$ awk -F"," '{print substr($2,1,5)}' items.txt HD Ca Refri MP3 P Tenni Laser
2、GAWK/NAWK String Functions
These string functions are available only in GAWK and NAWK flavors.
Sub Function
syntax:
sub(original-string,replacement-string,string-variable)
• sub stands for substitution.
• original-string: This is the original string that needs to be replaced. This can also be a regular expression.
• replacement-string: This is the replacement string.
• string-variable: This acts as both input and output string variable. You have to be careful with this, as after the successful substitution, you lose the original value in this string-variable.
In the following example:
• original-string: This is the regular expression C[Aa], which matches either "CA" or "Ca"
• replacement-string: When the original-string is found, replace it with "KA"
• string-variable: Before executing the sub, the variable contains the input string. Once the replacement is done, the variable contains the output string.
Please note that sub replaces only the 1st occurrence of the match.
$ cat sub.awk BEGIN { state="CA is California" sub("C[Aa]","KA",state); print state; } $ awk -f sub.awk KA is California
The 3rd parameter string-variable is optional. When it is not specified, awk will use $0 (the current line), as shown below. This example changes the first 2 characters of the record from "10" to "20". So, the item number 101 becomes 201, 102 becomes 202, etc.
$ awk '{ sub("10","20"); print $0; }' items.txt 201,HD Camcorder,Video,210,10 202,Refrigerator,Appliance,850,2 203,MP3 Player,Audio,270,15 204,Tennis Racket,Sports,190,20 205,Laser Printer,Office,475,5
When a successful substitution happens, the sub function returns 1, otherwise it returns 0.
Print the record only when a successful substitution occurs:
$ awk '{ if (sub("HD","High-Def")) print $0; }' items.txt 101,High-Def Camcorder,Video,210,10
Gsub Function
gsub stands for global substitution. gsub is exactly same as sub, except that all occurrences of original-string are changed to replacement-string.
In the following example, both "CA" and "Ca" are changed to "KA":
$ cat gsub.awk BEGIN { state="CA is California" gsub("C[Aa]","KA",state); print state; } $ awk -f gsub.awk KA is KAlifornia
As with sub, the 3rd parameter is optional. When it is not specified, awk will use $0 just as sub.
Match Function () and RSTART, RLENGTH variables
Match function searches for a given string (or regular expression) in the input-string, and returns a positive value when a successful match occurs.
Syntax:
match(input-string,search-string)
• input-string: This is the input-string that needs to be searched.
• search-string: This is the search-string, that needs to be search in the input-string. This can also be a regular expression.
The following example searches for the string "Cali" in the state string variable. If present, it prints a successful message.
$ cat match.awk BEGIN { state="CA is California" if (match(state,"Cali")) { print substr(state,RSTART,RLENGTH),"is present in:", state; } } $ awk -f match.awk Cali is present in: CA is California
Match sets the following two special variables. The above example uses these in the substring function call, to print the pattern in the success message.
• RSTART - The starting location of the search-string
• RLENGTH - The length of the search-string.
index(string1, subStr) == match(string1, subStr)
3、GAWK String Functions
tolower and toupper are available only in Gawk. As the name suggests the function converts the given string to lower case or upper case as shown below.
$ awk '{print tolower($0)}' items.txt 101,hd camcorder,video,210,10 102,refrigerator,appliance,850,2 103,mp3 player,audio,270,15 104,tennis racket,sports,190,20 105,laser printer,office,475,5 $ awk '{print toupper($0)}' items.txt 101,HD CAMCORDER,VIDEO,210,10 102,REFRIGERATOR,APPLIANCE,850,2 103,MP3 PLAYER,AUDIO,270,15 104,TENNIS RACKET,SPORTS,190,20 105,LASER PRINTER,OFFICE,475,5