post-decrement of NF in awk - awk

I'm a bit confused by the following:
$ echo foo bar baz | awk '{printf "%d:", NF--; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; NF -= 1; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(NF--)=""; print NF}'
3:3
I see the same behavior in awk version 20070501 (macos) and GNU Awk 4.0.2. Why does the post-decrement of NF in the 3rd case not apply? Is that behavior expected, mandated by a standard, or a quirk of the implementation?
EDIT by Ed Morton: FWIW I'd find the following a more compelling example:
$ echo foo bar baz | awk '{printf "%d:", NF; NF--; $NF=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; --NF; $NF=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; NF--; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $NF=""; --NF; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(--NF)=""; print NF}'
3:2
$ echo foo bar baz | awk '{printf "%d:", NF; $(NF--)=""; print NF}'
3:3
with the question being why does the last example (post-decrement with assignment) behave differently from all of the other cases, regardless of which one you think it should be equivalent to.

The value of a post-decrement is the value of the variable before it's decremented. Because of this, the last case is adding a new field after it decrements NF, which updates NF.
$(NF--) = "";
is equivalent to
temp = NF; # temp == 3
NF--; # NF == 2
$temp = ""; # adds a new field 3, so now NF == 3

Related

awk/gawk array concatenation into string fail

I want to concatenate a array into a string in gawk
have the array a [ ABC, DEF, GHI ]
and want a string like "ABC DEF GHI"
here I simplified the pipe feeding with 'echo' for demonstration purpouses:
test... working OK (no concatenation, only output)
#(echo ABC & echo DEF & echo GHI) | gawk '{a[$0]++} END { for( f in a ) print f; }'
now try to concatenate
#(echo ABC & echo DEF & echo GHI) | gawk '{a[$0]++} END { for( f in a ) b=(b f); print b; }'
just returns the 1st string in the array 'a'...
?any ideas on how to do this...
If you're trying to print the concatenated values of a Unix shell array such as created by:
$ a=(ABC DEF GHI)
then just do:
$ echo "${a[*]}"
ABC DEF GHI
If you want to concatenate multiple input lines as produced by:
$ printf '%s\n' "${a[#]}"
ABC
DEF
GHI
with awk for some reason then there's several options including:
$ printf '%s\n' "${a[#]}" | awk '{printf "%s%s", (NR>1 ? OFS : ""), $0} END{print ""}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{b=(NR>1 ? b OFS : "") $0} END{print b}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{a[NR]=$0} END{for (i=1; i<=NR; i++) b=(i>1 ? b OFS : "") a[i]; print b}'
ABC DEF GHI
$ printf '%s\n' "${a[#]}" | awk '{a[NR]=$0} END{for (i=1; i<=NR; i++) printf "%s%s", (i>1 ? OFS : ""), a[i]; print ""}'
ABC DEF GHI
If none of that is what you're trying to do then please edit your question to clarify your requirements.

print part of a field with awk

I run:
ss -atp | grep -vi state | awk '{ print $2" "$3" "$4" "$5" "$6 }'
output:
0 0 192.168.1.14:49254 92.222.106.156:http users:(("firefox-esr",pid=696,fd=95))
From the last column, I want to strip everything but firefox-esr (in this case); more precisely I want to only fetch what's between "".
I have tried:
ss -atp | grep -vi state | awk '{ sub(/users\:\(\("/,"",$6); print $2" "$3" "$4" "$5" "$6 }'
0 0 192.168.1.14:49254 92.222.106.156:http firefox-esr",pid=696,fd=95))
There is still the last part to strip; the problem is that the pid and fd are not a constant value and keep changing.
You might harness gensub reference ability for that. For simplicity let file.txt content be
users:(("firefox-esr",pid=696,fd=95))
then
awk '{print gensub(/.*"(.+)".*/,"\\1",1,$1)}' file.txt
outputs:
firefox-esr
Keep in mind that gensub do not alter string it gets as 4th argument, but return new string, so I print it.
You can use
awk '{ gsub(/^[^\"]*\"|\".*/, "", $6); print $2" "$3" "$4" "$5" "$6 }'
Here, gsub(/^[^\"]*\"|\".*/, "", $6) will take Field 6 as input, and remove all chars from start till the first " including it (see the ^[^\"]*\" part) and then the next " and all text after it (using \".*).
See this online awk demo:
s='0 0 0 192.168.1.14:49254 92.222.106.156:http users:(("firefox-esr",pid=696,fd=95))'
awk '{gsub(/^[^\"]*\"|\".*/, "",$6); print $2" "$3" "$4" "$5" "$6 }' <<< "$s"
# => 0 0 192.168.1.14:49254 92.222.106.156:http firefox-esr

Awk output formatting

I have 2 .po files and some word in there has 2 different meanings
and want to use awk to turn it into some kind of translator
For example
in .po file 1
msgid "example"
msgstr "something"
in .po file 2
msgid "example"
msgstr "somethingelse"
I came up with this
awk -F'"' 'match($2, /^example$/) {printf "%s", $2": ";getline; printf "%s", $2}' file1.po file2.po
The output will be
example:something example:somethinelse
How do I make it into this kind of format
example : something, somethingelse.
Reformatting
example:something example:somethinelse
into
example : something, somethingelse
can be done with this one-liner:
awk -F":| " -v OFS="," '{printf "%s:", $1; for (i=1;i<=NF;i++) if (i % 2 == 0)printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
Testing:
$ echo "example:something example:somethinelse example:something3 example:something4" | \
awk -F":| " -v OFS="," '{ \
printf "%s:", $1; \
for (i=1;i<=NF;i++) \
if (i % 2 == 0) \
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))}'
example:something,somethinelse,something3,something4
Explanation:
$ cat tst.awk
BEGIN{FS=":| ";OFS=","} # define field sep and output field sep
{ printf "%s:", $1 # print header line "example:"
for (i=1;i<=NF;i++) # loop over all fields
if (i % 2 == 0) # we're only interested in all "even" fields
printf("%s%s%s", ((i==2)?"":OFS), $i, ((i==NF)?"\n":""))
}
But you could have done the whole thing in one go with something like this:
$ cat tst.awk
BEGIN{OFS=","} # set output field sep to ","
NF{ # if NF (i.e. number of fields) > 0
# - to skip empty lines -
if (match($0,/msgid "(.*)"/,a)) id=a[1] # if line matches 'msgid "something",
# set "id" to "something"
if (match($0,/msgstr "(.*)"/,b)) str=b[1] # same here for 'msgstr'
if (id && str){ # if both "id" and "str" are set
r[id]=(id in r)?r[id] OFS str:str # save "str" in array r with index "id".
# if index "id" already exists,
# add "str" preceded by OFS (i.e. "," here)
id=str=0 # after printing, reset "id" and "str"
}
}
END { for (i in r) printf "%s : %s\n", i, r[i] } # print array "r"
and call this like:
awk -f tst.awk *.po
$ awk -F'"' 'NR%2{k=$2; next} NR==FNR{a[k]=$2; next} {print k" : "a[k]", "$2}' file1 file2
example : something, somethingelse

awk: aggregate several lines in only one based on a field value

I would like to aggregate values in a file based on a specific field value which is a kind of group attribute. The ending file should have one line per group.
MWE:
$ head -n4 foo
X;Y;OID;ID;OQTE;QTE;OTYPE;TYPE;Z
603.311;800.928;930;982963;0;XTX;49;comment;191.299
603.512;810.700;930;982963;0;XTX;49;comment;191.341
604.815;802.475;930;982963;0;XTX;49;comment;191.393
601.901;858.701;122;982954;0;XTX;50;comment;194.547
601.851;832.317;122;982954;0;XTX;50;comment;193.733
There is two groups here; 982963 and 982954.
Target:
$ head -n2 bar
CODE;OID;ID;OQTE;QTE;OTYPE;TYPE
"FLW (603.311 800.928 191.299, 603.512 801.700 191.341, 604.815 802.475 191.393)";982963;0;XTX;49;comment
"FLW (601.901 858.701 194.547, 601.851 832.317 193.733)";982954;0;XTX;49;comment
The group field is the 4 of the foo file. All other may vary.
X Y Z values of each record composing the group should be stored within the FLW parenthesis, following the same order as they appear in the first file lines.
I've tried many things ans as I'm absolutely not an expert using awk yet, this kind of code doesn't work at all:
awk -F ";" 'NR==1 {print "CODE;"$3";"$4";"$5";"$6";"$7";"$8}; NR>1 {a[$4]=a[$4]}END{for(i in a) { print "\"FLW ("$1","$2","$NF")\";"$3";"i""a[i]";"$5";"$6";"$7";"$8 }}' foo
Try:
$ awk -F ";" 'NR==1 {print "CODE;"$3";"$4";"$5";"$6";"$7";"$8}; NR>1 {a[$4]=$5";"$6";"$7";"$8; b[$4]=(b[$4]?b[$4]", ":"")$1" "$2" "$NF;}END{for(i in a) printf "\"FLW (%s)\";%s;%s\n", b[i], i, a[i]}' foo
CODE;OID;ID;OQTE;QTE;OTYPE;TYPE
"FLW (601.901 858.701 194.547, 601.851 832.317 193.733)";982954;0;XTX;50;comment
"FLW (603.311 800.928 191.299, 603.512 810.700 191.341, 604.815 802.475 191.393)";982963;0;XTX;49;comment
Or, as spread out over multiple lines:
awk -F ";" '
NR==1 {
print "CODE;"$3";"$4";"$5";"$6";"$7";"$8
}
NR>1 {
a[$4]=$5";"$6";"$7";"$8
b[$4]=(b[$4]?b[$4]", ":"")$1" "$2" "$NF
}
END{
for(i in a)
printf "\"FLW (%s)\";%s;%s\n", b[i], i, a[i]
}
' foo
Alternate styles
For one, we can replace ";" with FS:
awk -F";" 'NR==1 {print "CODE;"$3 FS $4 FS $5 FS $6 FS $7 FS $8}; NR>1 {a[$4]=$5 FS $6 FS $7 FS $8; b[$4]=(b[$4]?b[$4]", ":"")$1" "$2" "$NF;}END{for(i in a) printf "\"FLW (%s)\";%s;%s\n", b[i], i, a[i]}' foo
For another, the first print can also be replaced with a printf:
awk -F";" 'NR==1 {printf "CODE;%s;%s;%s;%s;%s;%s",$3,$4,$5,$6,$7,$8}; NR>1 {a[$4]=$5 FS $6 FS $7 FS $8; b[$4]=(b[$4]?b[$4]", ":"")$1" "$2" "$NF;}END{for(i in a) printf "\"FLW (%s)\";%s;%s\n", b[i], i, a[i]}' foo
Variation
If, as per the comments, the group field is the third, not the fourth, then:
awk -F";" 'NR==1 {print "CODE;"$3 FS $4 FS $5 FS $6 FS $7 FS $8}; NR>1 {a[$3]= $4 FS $5 FS $6 FS $7 FS $8; b[$3]=(b[$3]?b[$3]", ":"")$1" "$2" "$NF;}END{for(i in a) printf "\"FLW (%s)\";%s;%s\n", b[i], i, a[i]}'

awk improve command - Count & Sum

Would like to get your suggestion to improve this command and want to remove unwanted execution to avoid time consumption,
actually i am trying to find CountOfLines and SumOf$6 group by $2,substr($3,4,6),substr($4,4,6),$10,$8,$6.
GunZip Input file contains around 300 Mn rows of lines.
Input.gz
2067,0,09-MAY-12.04:05:14,09-MAY-12.04:05:14,21-MAR-16,600,INR,RO312,20120321_1C,K1,,32
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2160,0,26-MAY-14.02:05:27,26-MAY-14.02:05:27,18-APR-18,600,INR,RO414,20140418_7,K1,,30
2104,5,13-JAN-13.01:01:38,,13-JAN-17,4150,INR,RO113,CD1301_RC50_B1_20130113,K2,,21
Am using the below command and working fine.
zcat Input.gz | awk -F"," '{OFS=","; print $2,substr($3,4,6),substr($4,4,6),$10,$8,$6}' | \
awk -F"," 'BEGIN {count=0; sum=0; OFS=","} {key=$0; a[key]++;b[key]=b[key]+$6} \
END {for (i in a) print i,a[i],b[i]}' >Output.txt
Output.txt
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150
Any suggestion to improve the above command are welcome ..
This seems more efficient:
zcat Input.gz | awk -F, '{key=$2","substr($3,4,6)","substr($4,4,6)","$10","$8","$6;++a[key];b[key]=b[key]+$6}END{for(i in a)print i","a[i]","b[i]}'
Output:
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150
Uncondensed form:
zcat Input.gz | awk -F, '{
key = $2 "," substr($3, 4, 6) "," substr($4, 4, 6) "," $10 "," $8 "," $6
++a[key]
b[key] = b[key] + $6
}
END {
for (i in a)
print i "," a[i] "," b[i]
}'
You can do this with one awk invocation by redefining the fields according to the first awk script, i.e. something like this:
$1 = $2
$2 = substr($3, 4, 6)
$3 = substr($4, 4, 6)
$4 = $10
$5 = $8
No need to change $6 as that is the same field. Now if you base the key on the new fields, the second script will work almost unaltered. Here is how I would write it, moving the code into a script file for better readability and maintainability:
zcat Input.gz | awk -f parse.awk
Where parse.awk contains:
BEGIN {
FS = OFS = ","
}
{
$1 = $2
$2 = substr($3, 4, 6)
$3 = substr($4, 4, 6)
$4 = $10
$5 = $8
key = $1 OFS $2 OFS $3 OFS $4 OFS $5 OFS $6
a[key]++
b[key] += $6
}
END {
for (i in a)
print i, a[i], b[i]
}
You can of course still run it as a one-liner, but it will look more cryptic:
zcat Input.gz | awk '{ key = $2 FS substr($3,4,6) FS substr($4,4,6) FS $10 FS $8 FS $6; a[key]++; b[key]+=$6 } END { for (i in a) print i,a[i],b[i] }' FS=, OFS=,
Output in both cases:
0,MAY-14,MAY-14,K1,RO414,600,3,1800
0,MAY-12,MAY-12,K1,RO312,600,1,600
5,JAN-13,,K2,RO113,4150,1,4150