Delete a variable in awk - variables

I wonder if it is possible to delete a variable in awk. For an array, you can say delete a[2] and the index 2 of the array a[] will be deleted. However, for a variable I cannot find a way.
The closest I get is to say var="" or var=0.
But then, it seems that the default value of a non-existing variable is 0 or False:
$ awk 'BEGIN {if (b==0) print 5}'
5
$ awk 'BEGIN {if (!b) print 5}'
5
So I also wonder if it is possible to distinguish between a variable that is set to 0 and a variable that has not been set, because it seems not to:
$ awk 'BEGIN {a=0; if (a==b) print 5}'
5

There is no operation to unset/delete a variable. The only time a variable becomes unset again is at the end of a function call when it's an unused function argument being used as a local variable:
$ cat tst.awk
function foo( arg ) {
if ( (arg=="") && (arg==0) ) {
print "arg is not set"
}
else {
printf "before assignment: arg=<%s>\n",arg
}
arg = rand()
printf "after assignment: arg=<%s>\n",arg
print "----"
}
BEGIN {
foo()
foo()
}
$ awk -f tst.awk file
arg is not set
after assignment: arg=<0.237788>
----
arg is not set
after assignment: arg=<0.291066>
----
so if you want to perform some actions A then unset the variable X and then perform actions B, you could encapsulate A and/or B in functions using X as a local var.
Note though that the default value is zero or null, not zero or false, since its type is "numeric string".
You test for an unset variable by comparing it to both null and zero:
$ awk 'BEGIN{ if ((x=="") && (x==0)) print "y" }'
y
$ awk 'BEGIN{ x=0; if ((x=="") && (x==0)) print "y" }'
$ awk 'BEGIN{ x=""; if ((x=="") && (x==0)) print "y" }'
If you NEED to have a variable you delete then you can always use a single-element array:
$ awk 'BEGIN{ if ((x[1]=="") && (x[1]==0)) print "y" }'
y
$ awk 'BEGIN{ x[1]=""; if ((x[1]=="") && (x[1]==0)) print "y" }'
$ awk 'BEGIN{ x[1]=""; delete x; if ((x[1]=="") && (x[1]==0)) print "y" }'
y
but IMHO that obfuscates your code.
What would be the use case for unsetting a variable? What would you do with it that you can't do with var="" or var=0?

An unset variable expands to "" or 0, depending on the context in which it is being evaluated.
For this reason, I would say that it's a matter of preference and depends on the usage of the variable.
Given that we use a + 0 (or the slightly controversial +a) in the END block to coerce the potentially unset variable a to a numeric type, I guess you could argue that the natural "empty" value would be "".
I'm not sure that there's too much to read in to the cases that you've shown in the question, given the following:
$ awk 'BEGIN { if (!"") print }'
5
("" is false, unsurprisingly)
$ awk 'BEGIN { if (b == "") print 5 }'
5
(unset variable evaluates equal to "", just the same as 0)

Related

How to replace all escape sequences with non-escaped equivalent with unix utilities (sed/tr/awk)

I'm processing a Wireshark config file (dfilter_buttons) for display filters and would like to print out the filter of a given name. The content of file is like:
Sample input
"TRUE","test","sip contains \x22Hello, world\x5cx22\x22",""
And the resulting output should have the escape sequences replaced, so I can use them later in my script:
Desired output
sip contains "Hello, world\x22"
My first pass is like this:
Current parser
filter_name=test
awk -v filter_name="$filter_name" 'BEGIN {FS="\",\""} ($2 == filter_name) {print $3}' "$config_file"
And my output is this:
Current output
sip contains \x22Hello, world\x5cx22\x22
I know I can handle these exact two escape sequences by piping to sed and matching those exact two sequences, but is there a generic way to substitutes all escape sequences? Future filters I build may utilize more escape sequences than just " and , and I would like to handle future scenarios.
Using gnu-awk you can do this using split, gensub and strtonum functions:
awk -F '","' -v filt='test' '$2 == filt {n = split($3, subj, /\\x[0-9a-fA-F]{2}/, seps); for (i=1; i<n; ++i) printf "%s%c", subj[i], strtonum("0" substr(seps[i], 2)); print subj[i]}' file
sip contains "Hello, world\x22"
A more readable form:
awk -F '","' -v filt='test' '
$2 == filt {
n = split($3, subj, /\\x[0-9a-fA-F]{2}/, seps)
for (i=1; i<n; ++i)
printf "%s%c", subj[i], strtonum("0" substr(seps[i], 2))
print subj[i]
}' file
Explanation:
Using -F '","' we split input using delimiter ","
$2 == filt we filter input for $2 == "test" condition
Using /\\x[0-9a-fA-F]{2}/ as regex (that matches 2 digit hex strings) we split $3 and save split tokens into array subj and matched separators into array seps
Using substr we remove first char i.e \\ and prepend 0
Using strtonum we convert hex string to equivalent ascii number
Using %c in printf we print corresponding ascii character
Last for loop joins $3 back using subj and seps array elements
Using GNU awk for FPAT, gensub(), strtonum(), and the 3rd arg to match():
$ cat tst.awk
BEGIN { FPAT="([^,]*)|(\"[^\"]*\")"; OFS="," }
$2 == ("\"" filter_name "\"") {
gsub(/^"|"$/,"",$3)
while ( match($3,/(\\x[0-9a-fA-F]{2})(.*)/,a) ) {
printf "%s%c", substr($3,1,RSTART-1), strtonum(gensub(/./,0,1,a[1]))
$3 = a[2]
}
print $3
}
$ awk -v filter_name='test' -f tst.awk file
sip contains "Hello, world\x22"
The above assumes your escape sequences are always \x followed by exactly 2 hex digits. It isolates every \xHH string in the input, replaces \ with 0 in that string so that strtonum() can then convert the string to a number, then uses %c in the printf formatting string to convert that number to a character.
Note that GNU awk has a debugger (see https://www.gnu.org/software/gawk/manual/gawk.html#Debugger) so if you're ever not sure what any part of a program does you can just run it in the debugger (-D) and trace it, e.g. in the following I plant a breakpoint to tell awk to stop at line 1 of the script (b 1), then start running (r) and the step (s) through the script printing the value of $3 (p $3) at each line so I can see how it changes after the gsub():
$ awk -D -v filter_name='test' -f tst.awk file
gawk> b 1
Breakpoint 1 set at file `tst.awk', line 1
gawk> r
Starting program:
Stopping in BEGIN ...
Breakpoint 1, main() at `tst.awk':1
1 BEGIN { FPAT="([^,]*)|(\"[^\"]*\")"; OFS="," }
gawk> p $3
$3 = uninitialized field
gawk> s
Stopping in Rule ...
2 $2 == "\"" filter_name "\"" {
gawk> p $3
$3 = "\"sip contains \\x22Hello, world\\x5cx22\\x22\""
gawk> s
3 gsub(/^"|"$/,"",$3)
gawk> p $3
$3 = "\"sip contains \\x22Hello, world\\x5cx22\\x22\""
gawk> s
4 while ( match($3,/(\\x[0-9a-fA-F]{2})(.*)/,a) ) {
gawk> p $3
$3 = "sip contains \\x22Hello, world\\x5cx22\\x22"

read file and extract variables based on what is in the line

I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight

Variable operator for numerical comparison in awk

Can awk use variable operators for numerical comparison? The following code works with a hard coded operator, but not with a variable operator:
awk -v o="$operator" -v c="$comparison" '$1 o c'
No, that cannot work. Awk's -v option defines actual Awk variables, and not token-level macro substitutions.
It doesn't work for the same reason that this doesn't work:
awk 'BEGIN { o = "+"; print 2 o 2 }' # hoping for 2 + 2
Awk is different from the POSIX shell and similar languages; it doesn't evaluate variables by means of textual substitution.
Since you're calling Awk from a shell command line, you can use the shell's substitution to generate the Awk syntax, thereby obtaining that effect:
awk -v c="$comparison" "\$1 $operator c"
We now need a backslash on the $1 because we switched to double quotes, inside of which $1 is now recognized by the shell itself.
Another way to the one proposed by Kaz would be to define your own mapping function which takes the two variables as argument and the corresponding operator string o:
awk -v o="$operator" -v c="$comparison" '
function operator(arg1, arg2, op) {
if (op == "==") return arg1 == arg2
if (op == "!=") return arg1 != arg2
if (op == "<") return arg1 < arg2
if (op == ">") return arg1 > arg2
if (op == "<=") return arg1 <= arg2
if (op == ">=") return arg1 >= arg2
}
{ print operator($1,c,o) }'
This way you can also define your own operators.
No but you have a couple of options, the simplest being to let the shell expand one of the variables to become part of the awk script before awk runs on it:
$ operator='>'; comparison='3'
$ echo 5 | awk -v c="$comparison" '$1 '"$operator"' c'
5
Otherwise you can write your own eval-style function, e.g.:
$ cat tst.awk
cmp($1,o,c)
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ echo 5 | awk -v c="$comparison" -v o="$operator" -f tst.awk
5
See https://stackoverflow.com/a/54161251/1745001. The latter would work even if your awk program was saved in a file while the former would not. If you want to mix a library of functions with command line scripts then here's one way with GNU awk for -i:
$ cat tst.awk
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ awk -v c="$comparison" -v o="$operator" -i tst.awk 'cmp($1,o,c)'
5

Can someone help me getting average of a column using awk with condition on other column

awk -F, '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
awk: {if ($2 == 0) awk {
awk: ^ syntax error
bash: count++: command not found
Just for fun, let's look at what's wrong with your original version and transform it into something that works, step by step. Here's your initial version (I'll call it version 0):
awk -F, '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
The -F, sets the field separator to be the comma character, but your later comment seems to indicate that the columns (fields) are separated by spaces. So let's get rid of it; whitespace-separation is what awk expects by default. Version 1:
awk '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
You seem to be attempting to nest a call to awk inside your awk program? There's almost never any call for that, and this wouldn't be the way to do it anyway. Let's also get rid of the mismatched quotes while we're at it: note in passing that you cannot nest single quotes inside another pair of single quotes that way: you'd have to escape them somehow. But there's no need for them at all here. Version 2:
awk '{if ($2 == 0) { total += $3; count++ } END { print total/count } }' /tmp/CLN_Tapes_LON
This is close but not quite right: the END block is only executed when all lines of input are finished processing: it doesn't make sense to have it inside an if. So let's move it outside the braces. I'm also going to tighten up some whitespace. Version 3:
awk '{if ($2==0) {total+=$3; count++}} END{print total/count}' /tmp/CLN_Tapes_LON
Version 3 actually works, and you could stop here. But awk has a handy way of specifying to run a block of code only against lines that match a condition: 'condition {code}' So yours can more simply be written as:
awk '$2==0 {total+=$3; count++} END{print total/count}' /tmp/CLN_Tapes_LON
... which, of course, is pretty much exactly what John1024 suggested.
$ awk '$2 == 0 { total += $3; count++;} END { print total/count; }' CLN_Tapes_LON
3
This assumes that your input file looks like:
$ cat CLN_Tapes_LON
CLH040 0 3
CLH041 0 3
CLH042 0 3
CLH043 0 3
CLH010 1 0
CLH011 1 0
CLH012 1 0
CLH013 1 0
CLH130 1 40
CLH131 1 40
CLH132 1 40
CLH133 1 40
Thought I'd try to do this without awk. Awk is clearly the better choice, but it's still a one-liner.
bc<<<"($(grep ' 0 ' file|tee >(wc -l>i)|cut -d\ -f3|tr '\n' '+')0)/"$(<i)
3
It extracts lines with 0 in the second column with grep. This is passed to tee for wc -l to count the lines and to cut to extract the third column. tr replaces the new lines with "+" which is put over the number of lines (i.e., "12 / 4"). This is then passed to bc.

Why does awk "not in" array work just like awk "in" array?

Here's an awk script that attempts to set difference of two files based on their first column:
BEGIN{
OFS=FS="\t"
file = ARGV[1]
while (getline < file)
Contained[$1] = $1
delete ARGV[1]
}
$1 not in Contained{
print $0
}
Here is TestFileA:
cat
dog
frog
Here is TestFileB:
ee
cat
dog
frog
However, when I run the following command:
gawk -f Diff.awk TestFileA TestFileB
I get the output just as if the script had contained "in":
cat
dog
frog
While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".
I cannot find any doc about element not in array.
Try !(element in array).
I guess: awk sees not as an uninitialized variable, so not is evaluated as an empty string.
$1 not == $1 "" == $1
I figured this one out. The ( x in array ) returns a value, so to do "not in array", you have to do this:
if ( x in array == 0 )
print "x is not in the array"
or in your example:
($1 in Contained == 0){
print $0
}
In my solution for this problem I use the following if-else statement:
if($1 in contained);else{print "Here goes your code for \"not in\""}
Not sure if this is anything like you were trying to do.
#! /bin/awk
# will read in the second arg file and make a hash of the token
# found in column one. Then it will read the first arg file and print any
# lines with a token in column one not matching the tokens already defined
BEGIN{
OFS=FS="\t"
file = ARGV[1]
while (getline &lt file)
Contained[$1] = $1
# delete ARGV[1] # I don't know what you were thinking here
# for(i in Contained) {print Contained[i]} # debuging, not just for sadists
close (ARGV[1])
}
{
if ($1 in Contained){} else { print $1 }
}
In awk commande line I use:
! ($1 in a)
$1 pattern
a array
Example:
awk 'NR==FNR{a[$1];next}! ($1 in a) {print $1}' file1 file2