Reading Trace file from NS2 using gawk - awk

I am new user of gawk. I am trying to read trace file by putting a small code in a file and then by making that file executable. Following is what I am trying to do.
#!/bin/sh
set i = 0
while ($i < 5)
awk 'int($2)=='$i' && $1=="r" && $4==0 {pkt += $6} END {print '$i'":"pkt}' out.tr
set i = `expr $i + 1`
end
after this I am running following command:
sh ./test.sh
and it says:
syntax error: word unexpected (expecting do)
any help?

Assuming you are using bash
Syntax of while loop:
while test-commands; do consequent-commands; done
more info
For comparison using < operator you need to use Double-Parentheses see Shell Arithmetic and Conditional Constructs.
To assign value to the variable you used in the code just write i=0.
To access a shell variable in awk use -v option of awk.
Thus your might be become like this:
i=0
while ((i < 5))
do
awk -v k=$i 'int($2)==k && $1=="r" && $4==0 {pkt += $6} END {print k":"pkt}' out.tr
i=`expr $i + 1`
done
Here the variable k in awk code, has the value of variable $i from shell.
Instead of expr $i + 1 you can use $((i + 1)) or shorter $((++i))
Also you can use for loop then your code becomes much cleaner:
for (( i=0; i < 5; i++ ))
do
awk -v k=$i 'int($2)==k && $1=="r" && $4==0 {pkt += $6} END {print k":"pkt}' out.tr
done

Related

How to evaluate or process if statements in data?

Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

tcsh error: while loop

This is a basic program but since I'm a newbie, I'm not able to figure out the solution.
I have a file named rama.xvg in the following format:
-75.635 105.879 ASN-2
-153.704 64.7089 ARG-3
-148.238 -47.6076 GLN-4
-63.2568 -8.05441 LEU-5
-97.8149 -7.34302 GLU-6
-119.276 8.99017 ARG-7
-144.198 -103.917 SER-8
-65.4354 -10.3962 GLY-9
-60.6926 12.424 ARG-10
-159.797 -0.551989 PHE-11
65.9924 -48.8993 GLY-12
179.677 -7.93138 GLY-13
..........
...........
-70.5046 38.0408 GLY-146
-155.876 153.746 TRP-147
-132.355 151.023 GLY-148
-66.2679 167.798 ASN-2
-151.342 -33.0647 ARG-3
-146.483 41.3483 GLN-4
..........
..........
-108.566 0.0212432 SER-139
47.6854 33.6991 MET-140
47.9466 40.1073 ASP-141
46.4783 48.5301 SER-142
-139.17 172.486 LYS-143
58.9514 32.0602 SER-144
60.744 18.3059 SER-145
-94.0533 165.745 GLY-146
-161.809 177.435 TRP-147
129.172 -101.736 GLY-148
I need to extract all the lines containing "ASN-2" in one file all_1.dat and so on for all the 147 residues.
If I run the following command in the terminal, it gives the desired output for ASN-2:
awk '{if( NR%147 == 1 ) printf $0 "\n"}' rama.xvg > all_1.dat
To avoid doing it repeatedly for all the residues, I have written the following code.
#!/bin/tcsh
set i = 1
while ( $i < 148)
echo $i
awk '{if( NR%147 == i ) printf $0 "\n"}' rama.xvg > all_"$i".dat
# i++
end
But this code prints the lines containing GLY-148 in all the output files.
Please let me know what is the error in this code. I think it is related to nesting.
In your awk-line the variable i is an awk-variable not shell variable! If you want use shell-variable $i you can do:
awk -v i="$i" '{if( NR%147 == i ) printf $0 "\n"}' rama.xvg > all_"$i".dat
But I think would better put your while-loop into awk:
awk '{for (i=1; i<=147; i++) { if (NR%147==i) {printf $0 "\n" > ("all_" i ".dat") } } }' rama.xvg

Delete a variable in awk

I wonder if it is possible to delete a variable in awk. For an array, you can say delete a[2] and the index 2 of the array a[] will be deleted. However, for a variable I cannot find a way.
The closest I get is to say var="" or var=0.
But then, it seems that the default value of a non-existing variable is 0 or False:
$ awk 'BEGIN {if (b==0) print 5}'
5
$ awk 'BEGIN {if (!b) print 5}'
5
So I also wonder if it is possible to distinguish between a variable that is set to 0 and a variable that has not been set, because it seems not to:
$ awk 'BEGIN {a=0; if (a==b) print 5}'
5
There is no operation to unset/delete a variable. The only time a variable becomes unset again is at the end of a function call when it's an unused function argument being used as a local variable:
$ cat tst.awk
function foo( arg ) {
if ( (arg=="") && (arg==0) ) {
print "arg is not set"
}
else {
printf "before assignment: arg=<%s>\n",arg
}
arg = rand()
printf "after assignment: arg=<%s>\n",arg
print "----"
}
BEGIN {
foo()
foo()
}
$ awk -f tst.awk file
arg is not set
after assignment: arg=<0.237788>
----
arg is not set
after assignment: arg=<0.291066>
----
so if you want to perform some actions A then unset the variable X and then perform actions B, you could encapsulate A and/or B in functions using X as a local var.
Note though that the default value is zero or null, not zero or false, since its type is "numeric string".
You test for an unset variable by comparing it to both null and zero:
$ awk 'BEGIN{ if ((x=="") && (x==0)) print "y" }'
y
$ awk 'BEGIN{ x=0; if ((x=="") && (x==0)) print "y" }'
$ awk 'BEGIN{ x=""; if ((x=="") && (x==0)) print "y" }'
If you NEED to have a variable you delete then you can always use a single-element array:
$ awk 'BEGIN{ if ((x[1]=="") && (x[1]==0)) print "y" }'
y
$ awk 'BEGIN{ x[1]=""; if ((x[1]=="") && (x[1]==0)) print "y" }'
$ awk 'BEGIN{ x[1]=""; delete x; if ((x[1]=="") && (x[1]==0)) print "y" }'
y
but IMHO that obfuscates your code.
What would be the use case for unsetting a variable? What would you do with it that you can't do with var="" or var=0?
An unset variable expands to "" or 0, depending on the context in which it is being evaluated.
For this reason, I would say that it's a matter of preference and depends on the usage of the variable.
Given that we use a + 0 (or the slightly controversial +a) in the END block to coerce the potentially unset variable a to a numeric type, I guess you could argue that the natural "empty" value would be "".
I'm not sure that there's too much to read in to the cases that you've shown in the question, given the following:
$ awk 'BEGIN { if (!"") print }'
5
("" is false, unsurprisingly)
$ awk 'BEGIN { if (b == "") print 5 }'
5
(unset variable evaluates equal to "", just the same as 0)

Repeat printf arguments

I've found some related posts, but nothing seems to work.
I want to repeat the same argument $i for the instances 03-12. I'm really trying to use some nco operators - but the printf statement is hanging me up.
#!/bin/csh
set i = 1
while ($i < 2)
`printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%02d.nc $i`
# i = $i + 1
end
The output is - so it gets it for 03 but not the rest.
printf: O3_BDBP_1979ghg.cam.h0.0004-%02d.nc: expected a numeric value
I've also tried this statement (per other posts)
`printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%1$02d.nc $i`
Any suggestions would be greatly appreciated!
The braces produce multiple arguments for the printf command; only the first is treated as a format string, while the rest are treated as arguments for %1 in the first. In other words, you're getting
printf O3_BDBP_1979ghg.cam.h0.0003-%02d.nc O3_BDBP_1979ghg.cam.h0.0004-%02d.nc ... O3_BDBP_1979ghg.cam.h0.0012-%02d.nc $i
as the effective command line. Try a nested loop instead:
#!/bin/csh
set i = 1
while ($i < 2)
foreach j ( {03,04,05,06,07,08,09,10,11,12} )
printf O3_BDBP_1979ghg.cam.h0.00%02-%02d.nc $j $i
end
# i = $i + 1
end

awk output format for average

I am computing average of many values and printing it using awk using following script.
for j in `ls *.txt`; do
for i in emptyloop dd cp sleep10 gpid forkbomb gzip bzip2; do
echo -n $j $i" "; cat $j | grep $i | awk '{ sum+=$2} END {print sum/NR}'
done;
echo ""
done
but problem is, it is printing the value in in 1.2345e+05, which I do not want, I want it to print values in round figure. but I am unable to find where to pass the output format.
EDIT: using {print "average,%3d = ",sum/NR}' inplace of {print sum/NR}' is not helping, because it is printing "average,%3d 1.2345e+05".
You need printf instead of simply print. Print is a much simpler routine than printf is.
for j in *.txt; do
for i in emptyloop dd cp sleep10 gpid forkbomb gzip bzip2; do
awk -v "i=$i" -v "j=$j" '$0 ~ i {sum += $2} END {printf j, i, "average %6d", sum/NR}' "$j"
done
echo
done
You don't need ls - a glob will do.
Useless use of cat.
Quote all variables when they are expanded.
It's not necessary to use echo - AWK can do the job.
It's not necessary to use grep - AWK can do the job.
If you're getting numbers like 1.2345e+05 then %6d might be a better format string than %3d. Use printf in order to use format strings - print doesn't support them.
The following all-AWK script might do what you're looking for and be quite a bit faster. Without seeing your input data I've made a few assumptions, primarily that the command name being matched is in column 1.
awk '
BEGIN {
cmdstring = "emptyloop dd cp sleep10 gpid forkbomb gzip bzip2";
n = split(cmdstring, cmdarray);
for (i = 1; i <= n; i++) {
cmds[cmdarray[i]]
}
}
$1 in cmds {
sums[$1, FILENAME] += $2;
counts[$1, FILENAME]++
files[FILENAME]
}
END {
for file in files {
for cmd in cmds {
printf "%s %s %6d", file, cmd, sums[cmd, file]/counts[cmd, file]
}
}
}' *.txt