Repeat printf arguments with command line operators - printf

I want to repeat the same argument $i for the instances 03-12. I'm really trying to use some nco operators - but the printf statement is hanging me up.
I'm trying to use an netcdf operator on it - where these outputs of the printf are the input files to the command. While this works now with the printf statements, it's not piping into the netcdf command. Which goes as: ncea -v T,U inputfiles outputfile
#!/bin/csh
set i = 1
while ($i < 2)
ncea -v T,U
foreach j ( {3,4,6,7,8,9,10,11,12} )
`printf O3_BDBP_1979ghg.cam.h0.00%02d-%02d.nc $j $i `
end
O3_BDBP_1979.nc
# i = $i + 1
end
Other printf statements I've tried are
ncea -v T,U `printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%02d.nc $i` O3_BDBP_1979.nc
ncea -v T,U `printf O3_BDBP_1979ghg.cam.h0.00{03,04,05,06,07,08,09,10,11,12}-%1$02d.nc $i` O3_BDBP_1979.nc

Related

Awk output formatting

I have 2 .po files and some word in there has 2 different meanings
and want to use awk to turn it into some kind of translator
For example
in .po file 1
msgid "example"
msgstr "something"
in .po file 2
msgid "example"
msgstr "somethingelse"
I came up with this
awk -F'"' 'match($2, /^example$/) {printf "%s", $2": ";getline; printf "%s", $2}' file1.po file2.po
The output will be
example:something example:somethinelse
How do I make it into this kind of format
example : something, somethingelse.
Reformatting
example:something example:somethinelse
into
example : something, somethingelse
can be done with this one-liner:
awk -F":| " -v OFS="," '{printf "%s:", $1; for (i=1;i<=NF;i++) if (i % 2 == 0)printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
Testing:
$ echo "example:something example:somethinelse example:something3 example:something4" | \
awk -F":| " -v OFS="," '{ \
printf "%s:", $1; \
for (i=1;i<=NF;i++) \
if (i % 2 == 0) \
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
example:something,somethinelse,something3,something4
Explanation:
$ cat tst.awk
BEGIN{FS=":| ";OFS=","} # define field sep and output field sep
{ printf "%s:", $1 # print header line "example:"
for (i=1;i<=NF;i++) # loop over all fields
if (i % 2 == 0) # we're only interested in all "even" fields
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))
}
But you could have done the whole thing in one go with something like this:
$ cat tst.awk
BEGIN{OFS=","} # set output field sep to ","
NF{ # if NF (i.e. number of fields) > 0
# - to skip empty lines -
if (match($0,/msgid "(.*)"/,a)) id=a[1] # if line matches 'msgid "something",
# set "id" to "something"
if (match($0,/msgstr "(.*)"/,b)) str=b[1] # same here for 'msgstr'
if (id && str){ # if both "id" and "str" are set
r[id]=(id in r)?r[id] OFS str:str # save "str" in array r with index "id".
# if index "id" already exists,
# add "str" preceded by OFS (i.e. "," here)
id=str=0 # after printing, reset "id" and "str"
}
}
END { for (i in r) printf "%s : %s\n", i, r[i] } # print array "r"
and call this like:
awk -f tst.awk *.po
$ awk -F'"' 'NR%2{k=$2; next} NR==FNR{a[k]=$2; next} {print k" : "a[k]", "$2}' file1 file2
example : something, somethingelse

Passing a shell variable to awk in gnuplot

I need to pass a shell variable to awk in gnuplot but I get error messages :
The variable is set in the sript and is called FILE. This changes according to date.
My code : (in a Gnuplot script)
plot FILE using 1:14 with points pointtype 7 pointsize 1 # this works fine
replot '< awk ''{y1 = y2; y2 = $14; if (NR > 1 && y2 - y1 >= 100) printf("\n") ; if (NR > 1 && y2 -y1 <= -100) printf("\n"); print}'' FILE' using 1:14 with linespoints
Err msg
awk: fatal: cannot open file `FILE' for reading (No such file or directory)
When I hard code the FILE path the replot works.
Could anyone clarify the code I need to pass this variable to awk? Am I on the right track with something like :
% environment_variable=FILE
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }' ?
Here is my Gnuplot script code: cobbled together from other posts mostly..
#FILE selection - we want to plot the most recent data file
FILE = strftime('/data/%Y-%m-%d.txt', time(0)) # this is correct
print "FILE is : " .FILE
#set file path variable for awk : (This is where my problem is)
awk -v var="$FILE" '{print var}'
awk '{print $0}' <<< "$FILE"
Thank you in advance
If FILE is a gnuplot variable that contains the path of the file, you can do this:
FILE = 'input'
plot '<awk ''1'' ' . FILE
This concatenates the value of the gnuplot variable FILE onto the end of the awk command. The resulting awk "script" is therefore awk '1' input (which just prints every line of the file); you can substitute the '1' for whatever it is you want to do with awk.
By the way, your awk script can be simplified a little bit to this:
awk '{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'
It's not often that you need to use if in awk, as each block { } is executed conditionally (or if no condition is specified, the block is always executed). Assuming you haven't modified the record separator (the RS variable), print "" is the same as printf("\n"). Rather than specifying using 1:14 in gnuplot, you may as well only print the columns that you are interested in using print $1, $14.
So your replot line in gnuplot would be:
replot '<awk ''{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'' ' . FILE with linespoints
Of course, this line is getting a bit long. You might want to split it up a bit:
awk_cmd = '{ y1 = y2; y2 = $14 } NR > 1 && (y2 - y1 >= 100 || y2 - y1 <= -100) { print "" } { print $1, $14 }'
replot sprintf("<awk '%s' %s", awk_cmd, FILE) with linespoints

gawk "not equal to", field separator

gawk -F ";" '!($2 == "-") {print $0}' file.csv > out.file
This is my file:
MYH1;Myosin-1;MYH1_HUMAN
CALML6;Calmodulin-like;protein
UNQ6494;-;-
This is the output I want:
MYH1;Myosin-1;MYH1_HUMAN
CALML6;Calmodulin-like;protein
I thought that I understood this, but I get this output (same as input):
MYH1;Myosin-1;MYH1_HUMAN
CALML6;Calmodulin-like;protein
UNQ6494;-;-

Compute average and standard deviation with awk

I have a 'file.dat' with 24 (rows) x 16 (columns) data.
I have already tested the following awk script that computes de average of each column.
touch aver-std.dat
awk '{ for (i=1; i<=NF; i++) { sum[i]+= $i } }
END { for (i=1; i<=NF; i++ )
{ printf "%f \n", sum[i]/NR} }' file.dat >> aver-std.dat
The output 'aver-std.dat' has one column with these averages.
Similarly as the average computation
I would like to compute the standard deviation of each column of the data file 'file.dat' and write it in a second column of the output file.
Namely I would like an output file with the average in the first column and the standard deviation in the second column.
I have been making different tests, like this one
touch aver-std.dat
awk '{ for (i=1; i<=NF; i++) { sum[i]+= $i }}
END { for (i=1; i<=NF; i++ )
{std[i] += ($i - sum[i])^2 ; printf "%f %f \n", sum[i]/NR, sqrt(std[i]/(NR-1))}}' file.dat >> aver-std.dat
and it writes values in the second column but they are not the correct value of the standard deviation. The computation of the deviation is not right somehow.
I would appreciate very much any help.
Regards
Standard deviation is
stdev = sqrt((1/N)*(sum of (value - mean)^2))
But there is another form of the formula which does not require you to know the mean beforehand. It is:
stdev = sqrt((1/N)*((sum of squares) - (((sum)^2)/N)))
(A quick web search for "sum of squares" formula for standard deviation will give you the derivation if you are interested)
To use this formula, you need to keep track of both the sum and the sum of squares of the values. So your awk script will change to:
awk '{for(i=1;i<=NF;i++) {sum[i] += $i; sumsq[i] += ($i)^2}}
END {for (i=1;i<=NF;i++) {
printf "%f %f \n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)}
}' file.dat >> aver-std.dat
To simply calculate the population standard deviation of a list of numbers, you can use a command like this:
awk '{x+=$0;y+=$0^2}END{print sqrt(y/NR-(x/NR)^2)}'
Or this calculates the sample standard deviation:
awk '{sum+=$0;a[NR]=$0}END{for(i in a)y+=(a[i]-(sum/NR))^2;print sqrt(y/(NR-1))}'
^ is in POSIX. ** is supported by gawk and nawk but not by mawk.
Here is some calculation I've made on a grinder data output file for a long soak test which had to be interrupted:
Standard deviation(biased) + average:
cat <grinder_data_file> | grep -v "1$" | awk -F ', ' '{ sum=sum+$5 ; sumX2+=(($5)^2)} END { printf "Average: %f. Standard Deviation: %f \n", sum/NR, sqrt(sumX2/(NR) - ((sum/NR)^2) )}'
Standard deviation(non-biased) + average:
cat <grinder_data_file> | grep -v "1$" | awk -F ', ' '{ sum=sum+$5 ; sumX2+=(($5)^2)} END { avg=sum/NR; printf "Average: %f. Standard Deviation: %f \n", avg, sqrt(sumX2/(NR-1) - 2*avg*(sum/(NR-1)) + ((NR*(avg^2))/(NR-1)))}'
Your script should somehow be in this form instead:
awk '{
sum = 0
for (i=1; i<=NF; i++) {
sum += $i
}
avg = sum / NF
avga[NR] = avg
sum = 0
for (i=1; i<=NF; i++) {
sum += ($i - avg) ^ 2
}
stda[NR] = sqrt(sum / NF)
}
END { for (i = 1; i in stda; ++i) { printf "%f %f \n", avga[i], stda[i] } }' file.dat >> aver-std.dat

shell script to return value

I have below shell script which produce output as desired.
RuleNum=$1
cat input.txt |awk -v var=$RuleNum '$1==var {out=$1; for(i=NF;i >=0;i--)if($i~/bps/){sub("bps","",$i);out=out" "$i} print out;out=""}'
./downup.sh 20
20 BW-IN:2560000 BW-OUT:2048000
i want output as below
./downup.sh 20
256000 2048000
./downup.sh 36
2560000 2048000
below is the input.txt
20 name:abc addr:203.45.247.247/255.255.255.255 WDW-THRESH:12 BW-OUT:10000000bps BW-IN:15000000bps STATSDEVICE:test247 STATS:Enabled (4447794/0) <IN OUT>
25 name:xyz160 addr:203.45.233.160/255.255.255.224 STATSDEVICE:test160 STATS:Enabled priority:pass-thru (1223803328/0) <IN OUT>
37 name:testgrp2 <B> WDW-THRESH:8 BW-BOTH:192000bps STATSDEVICE:econetgrp2 STATS:Enabled (0/0) <Group> START:NNNNNNN-255-0 STOP:NNNNNNN-255-0
62 name:blahblahl54 addr:203.45.225.54/255.255.255.255 WDW-THRESH:5 BWLINK:cbb256 BW-BOTH:256000bps STATSDEVICE:hellol54 STATS:Enabled (346918/77) <IN OUT>
Add sub("BW.*:", "", $i) after the existing sub().
And cat isn't necessary. Just put the filename at the end of the line:
awk ... input.txt
To eliminate the rule number from the output, remove out = $1;.
Here is the result with an addition to avoid printing a space at the beginning of each line:
awk -v var=$RuleNum '$1==var {for(i = NF; i >= 0; i--) if ($i ~ /bps/) {sub("bps","",$i); sub("BW.*:", "", $i); out = out delim $i; delim = OFS} print out; out = delim = ""}'