awk/gawk array concatenation into string fail - awk

I want to concatenate a array into a string in gawk
have the array a [ ABC, DEF, GHI ]
and want a string like "ABC DEF GHI"
here I simplified the pipe feeding with 'echo' for demonstration purpouses:
test... working OK (no concatenation, only output)
#(echo ABC & echo DEF & echo GHI) | gawk '{a[$0]++} END { for( f in a ) print f; }'
now try to concatenate
#(echo ABC & echo DEF & echo GHI) | gawk '{a[$0]++} END { for( f in a ) b=(b f); print b; }'
just returns the 1st string in the array 'a'...
?any ideas on how to do this...

If you're trying to print the concatenated values of a Unix shell array such as created by:
$ a=(ABC DEF GHI)
then just do:
$ echo "${a[*]}"
ABC DEF GHI
If you want to concatenate multiple input lines as produced by:
$ printf '%s\n' "${a[#]}"
ABC
DEF
GHI
with awk for some reason then there's several options including:
$ printf '%s\n' "${a[#]}" | awk '{printf "%s%s", (NR>1 ? OFS : ""), $0} END{print ""}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{b=(NR>1 ? b OFS : "") $0} END{print b}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{a[NR]=$0} END{for (i=1; i<=NR; i++) b=(i>1 ? b OFS : "") a[i]; print b}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{a[NR]=$0} END{for (i=1; i<=NR; i++) printf "%s%s", (i>1 ? OFS : ""), a[i]; print ""}'
ABC DEF GHI
If none of that is what you're trying to do then please edit your question to clarify your requirements.

Related

post-decrement of NF in awk

I'm a bit confused by the following:
$ echo foo bar baz | awk '{printf "%d:", NF--; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; NF -= 1; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(NF--)=""; print NF}'
3:3
I see the same behavior in awk version 20070501 (macos) and GNU Awk 4.0.2. Why does the post-decrement of NF in the 3rd case not apply? Is that behavior expected, mandated by a standard, or a quirk of the implementation?
EDIT by Ed Morton: FWIW I'd find the following a more compelling example:
$ echo foo bar baz | awk '{printf "%d:", NF; NF--; $NF=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; --NF; $NF=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; NF--; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; --NF; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(--NF)=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(NF--)=""; print NF}'
3:3
with the question being why does the last example (post-decrement with assignment) behave differently from all of the other cases, regardless of which one you think it should be equivalent to.
The value of a post-decrement is the value of the variable before it's decremented. Because of this, the last case is adding a new field after it decrements NF, which updates NF.
$(NF--) = "";
is equivalent to
temp = NF; # temp == 3
NF--; # NF == 2
$temp = ""; # adds a new field 3, so now NF == 3

Awk output formatting

I have 2 .po files and some word in there has 2 different meanings
and want to use awk to turn it into some kind of translator
For example
in .po file 1
msgid "example"
msgstr "something"
in .po file 2
msgid "example"
msgstr "somethingelse"
I came up with this
awk -F'"' 'match($2, /^example$/) {printf "%s", $2": ";getline; printf "%s", $2}' file1.po file2.po
The output will be
example:something example:somethinelse
How do I make it into this kind of format
example : something, somethingelse.
Reformatting
example:something example:somethinelse
into
example : something, somethingelse
can be done with this one-liner:
awk -F":| " -v OFS="," '{printf "%s:", $1; for (i=1;i<=NF;i++) if (i % 2 == 0)printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
Testing:
$ echo "example:something example:somethinelse example:something3 example:something4" | \
awk -F":| " -v OFS="," '{ \
printf "%s:", $1; \
for (i=1;i<=NF;i++) \
if (i % 2 == 0) \
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
example:something,somethinelse,something3,something4
Explanation:
$ cat tst.awk
BEGIN{FS=":| ";OFS=","} # define field sep and output field sep
{ printf "%s:", $1 # print header line "example:"
for (i=1;i<=NF;i++) # loop over all fields
if (i % 2 == 0) # we're only interested in all "even" fields
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))
}
But you could have done the whole thing in one go with something like this:
$ cat tst.awk
BEGIN{OFS=","} # set output field sep to ","
NF{ # if NF (i.e. number of fields) > 0
# - to skip empty lines -
if (match($0,/msgid "(.*)"/,a)) id=a[1] # if line matches 'msgid "something",
# set "id" to "something"
if (match($0,/msgstr "(.*)"/,b)) str=b[1] # same here for 'msgstr'
if (id && str){ # if both "id" and "str" are set
r[id]=(id in r)?r[id] OFS str:str # save "str" in array r with index "id".
# if index "id" already exists,
# add "str" preceded by OFS (i.e. "," here)
id=str=0 # after printing, reset "id" and "str"
}
}
END { for (i in r) printf "%s : %s\n", i, r[i] } # print array "r"
and call this like:
awk -f tst.awk *.po
$ awk -F'"' 'NR%2{k=$2; next} NR==FNR{a[k]=$2; next} {print k" : "a[k]", "$2}' file1 file2
example : something, somethingelse

editing a output file to be delimited with a semicolon and the input file is a CSV kornshell

My input file is CSV
AED,E ,3.67295,20160105,20:10:00,UAE DIRHAM
ATS,E ,10.9814,20160105,20:10:00,AUSTRIAN SHILLINGS
AUD,A ,0.71525,20160105,20:10:00,AUSTRALIAN DOLLAR
I want to read it in to output it like so
EUR;1.127650;USD/EUR;EURO;Cash
JPY;124.335000;JPY/USD;JAPANESE YEN;Cash
GBP;1.538050;USD/GBP;BRITISH POUND;Cash
actual code :
cat $FILE2 | while read a b c d e f
do
echo $a $c $a/USD $f Cash \
| awk -F, 'BEGIN { OFS =";" } {print $1, $2, $3, $4, $5}' >> my_ratesoutput.csv
output:
Cash;;;;95 AED/USD UAE DIRHAM
Cash;;;;14 ATS/USD AUSTRIAN SHILLINGS
Cash;;;;25 AUD/USD AUSTRALIAN DOLLAR
Cash;;;;/USD BARBADOS DOLLAR
export IFS=","
semico=';'
FILE=rates.csv
FILE2=rateswork.csv
echo $FILE
rm my_ratesoutput.csv
cp -p $FILE $FILE2
sed 1d $FILE2 > temp.csv
mv temp.csv $FILE2
echo "Currency;Spot Rate;Terms;Name;Curve" >>my_ratesoutput.csv
cat $FILE2 |while read a b c d e f
do
echo $a$semico$c$semico$a/USD$semico$f$semicoCash >> my_ratesoutput.csv
done

awk improve command - Count & Sum

Would like to get your suggestion to improve this command and want to remove unwanted execution to avoid time consumption,
actually i am trying to find CountOfLines and SumOf$6 group by $2,substr($3,4,6),substr($4,4,6),$10,$8,$6.
GunZip Input file contains around 300 Mn rows of lines.
Input.gz
2067,0,09-MAY-12.04:05:14,09-MAY-12.04:05:14,21-MAR-16,600,INR,RO312,20120321_1C,K1,,32
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2104,5,13-JAN-13.01:01:38,,13-JAN-17,4150,INR,RO113,CD1301_RC50_B1_20130113,K2,,21
Am using the below command and working fine.
zcat Input.gz | awk -F"," '{OFS=","; print $2,substr($3,4,6),substr($4,4,6),$10,$8,$6}' | \
awk -F"," 'BEGIN {count=0; sum=0; OFS=","} {key=$0; a[key]++;b[key]=b[key]+$6} \
END {for (i in a) print i,a[i],b[i]}' >Output.txt
Output.txt
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150
Any suggestion to improve the above command are welcome ..
This seems more efficient:
zcat Input.gz | awk -F, '{key=$2","substr($3,4,6)","substr($4,4,6)","$10","$8","$6;++a[key];b[key]=b[key]+$6}END{for(i in a)print i","a[i]","b[i]}'
Output:
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150
Uncondensed form:
zcat Input.gz | awk -F, '{
key = $2 "," substr($3, 4, 6) "," substr($4, 4, 6) "," $10 "," $8 "," $6
++a[key]
b[key] = b[key] + $6
}
END {
for (i in a)
print i "," a[i] "," b[i]
}'
You can do this with one awk invocation by redefining the fields according to the first awk script, i.e. something like this:
$1 = $2
$2 = substr($3, 4, 6)
$3 = substr($4, 4, 6)
$4 = $10
$5 = $8
No need to change $6 as that is the same field. Now if you base the key on the new fields, the second script will work almost unaltered. Here is how I would write it, moving the code into a script file for better readability and maintainability:
zcat Input.gz | awk -f parse.awk
Where parse.awk contains:
BEGIN {
FS = OFS = ","
}
{
$1 = $2
$2 = substr($3, 4, 6)
$3 = substr($4, 4, 6)
$4 = $10
$5 = $8
key = $1 OFS $2 OFS $3 OFS $4 OFS $5 OFS $6
a[key]++
b[key] += $6
}
END {
for (i in a)
print i, a[i], b[i]
}
You can of course still run it as a one-liner, but it will look more cryptic:
zcat Input.gz | awk '{ key = $2 FS substr($3,4,6) FS substr($4,4,6) FS $10 FS $8 FS $6; a[key]++; b[key]+=$6 } END { for (i in a) print i,a[i],b[i] }' FS=, OFS=,
Output in both cases:
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150

shell script to return value

I have below shell script which produce output as desired.
RuleNum=$1
cat input.txt |awk -v var=$RuleNum '$1==var {out=$1; for(i=NF;i >=0;i--)if($i~/bps/){sub("bps","",$i);out=out" "$i} print out;out=""}'
./downup.sh 20
20 BW-IN:2560000 BW-OUT:2048000
i want output as below
./downup.sh 20
256000 2048000
./downup.sh 36
2560000 2048000
below is the input.txt
20 name:abc addr:203.45.247.247/255.255.255.255 WDW-THRESH:12 BW-OUT:10000000bps BW-IN:15000000bps STATSDEVICE:test247 STATS:Enabled (4447794/0) <IN OUT>
25 name:xyz160 addr:203.45.233.160/255.255.255.224 STATSDEVICE:test160 STATS:Enabled priority:pass-thru (1223803328/0) <IN OUT>
37 name:testgrp2 <B> WDW-THRESH:8 BW-BOTH:192000bps STATSDEVICE:econetgrp2 STATS:Enabled (0/0) <Group> START:NNNNNNN-255-0 STOP:NNNNNNN-255-0
62 name:blahblahl54 addr:203.45.225.54/255.255.255.255 WDW-THRESH:5 BWLINK:cbb256 BW-BOTH:256000bps STATSDEVICE:hellol54 STATS:Enabled (346918/77) <IN OUT>
Add sub("BW.*:", "", $i) after the existing sub().
And cat isn't necessary. Just put the filename at the end of the line:
awk ... input.txt
To eliminate the rule number from the output, remove out = $1;.
Here is the result with an addition to avoid printing a space at the beginning of each line:
awk -v var=$RuleNum '$1==var {for(i = NF; i >= 0; i--) if ($i ~ /bps/) {sub("bps","",$i); sub("BW.*:", "", $i); out = out delim $i; delim = OFS} print out; out = delim = ""}'