Repeat printf arguments - printf

I've found some related posts, but nothing seems to work.
I want to repeat the same argument $i for the instances 03-12. I'm really trying to use some nco operators - but the printf statement is hanging me up.
#!/bin/csh
set i = 1
while ($i < 2)
`printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%02d.nc $i`
# i = $i + 1
end
The output is - so it gets it for 03 but not the rest.
printf: O3_BDBP_1979ghg.cam.h0.0004-%02d.nc: expected a numeric value
I've also tried this statement (per other posts)
`printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%1$02d.nc $i`
Any suggestions would be greatly appreciated!

The braces produce multiple arguments for the printf command; only the first is treated as a format string, while the rest are treated as arguments for %1 in the first. In other words, you're getting
printf O3_BDBP_1979ghg.cam.h0.0003-%02d.nc O3_BDBP_1979ghg.cam.h0.0004-%02d.nc ... O3_BDBP_1979ghg.cam.h0.0012-%02d.nc $i
as the effective command line. Try a nested loop instead:
#!/bin/csh
set i = 1
while ($i < 2)
foreach j ( {03,04,05,06,07,08,09,10,11,12} )
printf O3_BDBP_1979ghg.cam.h0.00%02-%02d.nc $j $i
end
# i = $i + 1
end

Related

read file and extract variables based on what is in the line

I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight

Awk average of n data in each column

"Using awk to bin values in a list of numbers" provide a solution to average each set of 3 points in a column using awk.
How is it possible to extend it to an indefinite number of columns mantaining the format? For example:
2457135.564106 13.249116 13.140903 0.003615 0.003440
2457135.564604 13.250833 13.139971 0.003619 0.003438
2457135.565067 13.247932 13.135975 0.003614 0.003432
2457135.565576 13.256441 13.146996 0.003628 0.003449
2457135.566039 13.266003 13.159108 0.003644 0.003469
2457135.566514 13.271724 13.163555 0.003654 0.003476
2457135.567011 13.276248 13.166179 0.003661 0.003480
2457135.567474 13.274198 13.165396 0.003658 0.003479
2457135.567983 13.267855 13.156620 0.003647 0.003465
2457135.568446 13.263761 13.152515 0.003640 0.003458
averaging values every 5 lines, should output something like
2457135.564916 13.253240 13.143976 0.003622 0.003444
2457135.567324 13.270918 13.161303 0.003652 0.003472
where the first result is the average of the first 1-5 lines, and the second result is the average of the 6-10 lines.
The accepted answer to Using awk to bin values in a list of numbers is:
awk '{sum+=$1} NR%3==0 {print sum/3; sum=0}' inFile
The obvious extension to average all the columns is:
awk 'BEGIN { N = 3 }
{ for (i = 1; i <= NF; i++) sum[i] += $i }
NR % N == 0 { for (i = 1; i <= NF; i++)
{
printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
sum[i] = 0
}
}' inFile
The extra flexibility here is that if you want to group blocks of 5 rows, you simply change one occurrence of 3 into 5. This ignores blocks of up to N-1 rows at the end of the file. If you want to, you can add an END block that prints a suitable average if NR % N != 0.
For the sample input data, the output I got from the script above was:
2457135.564592 13.249294 13.138950 0.003616 0.003437
2457135.566043 13.264723 13.156553 0.003642 0.003465
2457135.567489 13.272767 13.162732 0.003655 0.003475
You can make the code much more complex if you want to analyze what the output formats should be. I've simply used %.6f to ensure 6 decimal places.
If you want N to be a command-line parameter, you can use the -v option to relay the variable setting to awk:
awk -v N="${variable:-3}" \
'{ for (i = 1; i <= NF; i++) sum[i] += $i }
NR % N == 0 { for (i = 1; i <= NF; i++)
{
printf("%.6f%s", sum[i]/N, (i == NF) ? "\n" : " ")
sum[i] = 0
}
}' inFile
When invoked with $variable set to 5, the output generated from the sample data is:
2457135.565078 13.254065 13.144591 0.003624 0.003446
2457135.567486 13.270757 13.160853 0.003652 0.003472

Endless recursion in gawk-script

Please pardon me in advance for posting such a big part of my problem, but I just can't put my finger on the part that fails...
I got input-files like this (abas-FO if you care to know):
.fo U|xiininputfile = whatever
.type text U|xigibsgarnich
.assign U|xigibsgarnich
..
..Comment
.copy U|xigibswohl = Spaß
.ein "ow1/UWEDEFTEST.FOP"
.in "ow1/UWEINPUT2"
.continue BOTTOM
.read "SOemthing" U|xttmp
!BOTTOM
..
..
Now I want to recursivly follow each .in[put]/.ein[gabe]-statement, parse the mentioned file and if I don't know it yet, add it to an array. My code looks like this:
#!/bin/awk -f
function getFopMap(inputregex, infile, mandantdir, infiles){
while(getline f < infile){
#printf "*"
#don't match if there is a '
if(f ~ inputregex "[^']"){
#remove .input-part
sub(inputregex, "", f)
#trim right
sub(/[[:blank:]]+$/, "", f)
#remove leading and trailing "
gsub(/(^\"|\"$)/,"" ,f)
if(!(f in infiles)){
infiles[f] = "found"
}
}
}
close(infile)
for (i in infiles){
if(infiles[i] == "found"){
infiles[i] = "parsed"
cmd = "test -f \"" i "\""
if(system(cmd) == 0){
close(cmd)
getFopMap(inputregex, f, mandantdir, infiles)
}
}
}
}
BEGIN{
#Matches something like [.input myfile] or [.ein "ow1/myfile"]
inputregex = "^\\.(in|ein)[^[:blank:]]*[[:blank:]]+"
#Get absolute path of infile
cmd = "python -c 'import os;print os.path.abspath(\"" ARGV[1] "\")'"
cmd | getline rootfile
close(cmd)
infiles[rootfile] = "parsed"
getFopMap(inputregex, rootfile, mandantdir, infiles)
#output result
for(infile in infiles) print infile
exit
}
I call the script (in the same directory the paths are relative to) like this:
./script ow1/UWEDEFTEST.FOP
I get no output. It just hangs up. If I remove the comment before the printf "*" command, I'm seeing stars, without end.
I appreciate every help and hints how to do it better.
My awk:
gawk Version 3.1.7
idk it it's your only problem but you're calling getline incorrectly and consequently will go into an infinite loop in some scenarios. Make sure you fully understand all of the caveats at http://awk.info/?tip/getline and you might want to use the recursion example there as the starting point for your code.
The most important item initially for your code is that when getline fails it can return a negative value so then while(getline f < infile) will create an infinite loop since the failing getline will always be returning non-zero and will so continue to be called and continue to fail. You need to use while ( (getline f < infile) > 0) instead.

Reading Trace file from NS2 using gawk

I am new user of gawk. I am trying to read trace file by putting a small code in a file and then by making that file executable. Following is what I am trying to do.
#!/bin/sh
set i = 0
while ($i < 5)
awk 'int($2)=='$i' && $1=="r" && $4==0 {pkt += $6} END {print '$i'":"pkt}' out.tr
set i = `expr $i + 1`
end
after this I am running following command:
sh ./test.sh
and it says:
syntax error: word unexpected (expecting do)
any help?
Assuming you are using bash
Syntax of while loop:
while test-commands; do consequent-commands; done
more info
For comparison using < operator you need to use Double-Parentheses see Shell Arithmetic and Conditional Constructs.
To assign value to the variable you used in the code just write i=0.
To access a shell variable in awk use -v option of awk.
Thus your might be become like this:
i=0
while ((i < 5))
do
awk -v k=$i 'int($2)==k && $1=="r" && $4==0 {pkt += $6} END {print k":"pkt}' out.tr
i=`expr $i + 1`
done
Here the variable k in awk code, has the value of variable $i from shell.
Instead of expr $i + 1 you can use $((i + 1)) or shorter $((++i))
Also you can use for loop then your code becomes much cleaner:
for (( i=0; i < 5; i++ ))
do
awk -v k=$i 'int($2)==k && $1=="r" && $4==0 {pkt += $6} END {print k":"pkt}' out.tr
done

array over non-existing indices in awk

Sorry for the verbose question, it boils down to a very simple problem.
Assume there are n text files each containing one column of strings (denominating groups) and one of integers (denominating the values of instances within these groups):
# filename xxyz.log
a 5
a 6
b 10
b 15
c 101
c 100
#filename xyzz.log
a 3
a 5
c 116
c 128
Note that while the length of both columns within any given file is always identical it differs between files. Furthermore, not all files contain the same range of groups (the first one contains groups a, b, c, while the second one only contains groups a and c). In awk one could calculate the average of column 2 for each string in column 1 within each file separately and output the results with the following code:
NAMES=$(ls|grep .log|awk -F'.' '{print $1}');
for q in $NAMES;
do
gawk -F' ' -v y=$q 'BEGIN {print "param", y}
{sum1[$1] += $2; N[$1]++}
END {for (key in sum1) {
avg1 = sum1[key] / N[key];
printf "%s %f\n", key, avg1;
} }' $q.log | sort > $q.mean;
done;
Howerver, for the abovementioned reasons, the length of the resulting .mean files differs between files. For each .log file I'd like to output a .mean file listing the entire range of groups (a-d) in the first column and the corresponding mean value or empty spaces in the second column depending on whether this category is present in the .log file. I've tried the following code (given without $NAMES for brevity):
awk 'BEGIN{arr[a]="a"; arr[b]="b"; arr[c]="c"; arr[d]="d"}
{sum[$1] += $2; N[$1]++}
END {for (i in arr) {
if (i in sum) {
avg = sum[i] / N[i];
printf "%s %f\n" i, avg;}
else {
printf "%s %s\n" i, "";}
}}' xxyz.log > xxyz.mean;
but it returns the following error:
awk: (FILENAME=myfile FNR=7) fatal: not enough arguments to satisfy format string
`%s %s
'
^ ran out for this one
Any suggestions would be highly appreciated.
Will you ever have explicit zeroes or negative numbers in the log files? I'm going to assume not.
The first line of your second script doesn't do what you wanted:
awk 'BEGIN{arr[a]="a"; arr[b]="b"; arr[c]="c"; arr[d]="d"}
This assigns "a" to arr[0] (because a is a variable not previously used), then "b" to the same element (because b is a variable not previously used), then "c", then "d". Clearly, not what you had in mind. This (untested) code should do the job you need as long as you know that there are just the four groups. If you don't know the groups a priori, you need a more complex program (it can be done, but it is harder).
awk 'BEGIN { sum["a"] = 0; sum["b"] = 0; sum["c"] = 0; sum["d"] = 0 }
{ sum[$1] += $2; N[$1]++ }
END { for (i in sum) {
if (N[i] == 0) N[i] = 1 # Divide by zero protection
avg = sum[i] / N[i];
printf "%s %f\n" i, avg;
}
}' xxyz.log > xxyz.mean;
This will print a zero average for the missing groups. If you prefer, you can do:
awk 'BEGIN { sum["a"] = 0; sum["b"] = 0; sum["c"] = 0; sum["d"] = 0 }
{ sum[$1] += $2; N[$1]++ }
END { for (i in sum) {
if (N[i] == 0)
printf("%s\n", i;
else {
avg = sum[i] / N[i];
printf "%s %f\n" i, avg;
}
}
}' xxyz.log > xxyz.mean;
For each .log file I'd like to output a .mean file listing the entire
range of groups (a-d) in the first column and the corresponding mean
value or empty spaces in the second column depending on whether this
category is present in the .log file.
Not purely an awk solution, but you can get all the groups with this.
awk '{print $1}' *.log | sort -u > groups
After you calculate the means, you can then join the groups file. Let's say the means for your second input file look like this temporary, intermediate file. (I called it xyzz.tmp.)
a 4
c 122
Join the groups, preserving all the values from the groups file.
$ join -a1 groups xyzz.tmp > xyzz.mean
$ cat xyzz.mean
a 4
b
c 122
Here's my take on the problem. Run like:
./script.sh
Contents of script.sh:
array=($(awk '!a[$1]++ { print $1 }' *.log))
readarray -t sorted < <(for i in "${array[#]}"; do echo "$i"; done | sort)
for i in *.log; do
for j in "${sorted[#]}"; do
awk -v var=$j '
{
sum[$1]+=$2
cnt[$1]++
}
END {
print var, (var in cnt ? sum[var]/cnt[var] : "")
}
' "$i" >> "${i/.log/.main}"
done
done
Results of grep . *.main:
xxyz.main:a 5.5
xxyz.main:b 12.5
xxyz.main:c 100.5
xyzz.main:a 4
xyzz.main:b
xyzz.main:c 122
Here is a pure awk answer:
find . -maxdepth 1 -name "*.log" -print0 |
xargs -0 awk '{SUBSEP=" ";sum[FILENAME,$1]+=$2;cnt[FILENAME,$1]+=1;next}
END{for(i in sum)print i, sum[i], cnt[i], sum[i]/cnt[i]}'
Easy enough to push this into a file --