AWK: How to supress default print
Following awk if statement always prints $0. How to stop it from doing so
( nodeComplete && count )
{
#print $0
#print count;
for (i = 0; i < count; i++) {print array1[i];};
nodeComplete=0;
count=0;
}
Welcome to SO, try changing your braces { position and let me know if this helps.
( nodeComplete && count ){
#print $0
#print count;
for (i = 0; i < count; i++) {print array1[i];};
nodeComplete=0;
count=0;
}
Explanation of above change:
logic behind this is simple { next to condition means coming
statements should be executed as per condition. If you put them in
next line then it will all together a different set of block and
condition will be a different block. So if condition is TRUE then it
will print complete line since { is altogether a separate block.
Related
I'm trying to compute some stuff in awk, and at the end print the result in the order of the input. For each line, I check if it has not been already seen. If not, I add it to the array and also store it in an order array.
{
if (! $0 in seen) {
seen[$0] = 1
order[o++] = $0
}
} END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
You can try it with
printf 'a\nb\na\nc\nb\na\n' | awk script_above
It prints nothing. If I print the variable o at the end, it shows that its value is still 0. What am I doing wrong?
You just need to add parens to get the right operator precedence*:
# a.awk
{
if (!($0 in seen)) {
seen[$0] = 1
order[o++] = $0
}
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}
Test:
$ awk -f a.awk file
a
b
c
* (The unary ! binds more tightly than the in operator: https://www.gnu.org/software/gawk/manual/html_node/Precedence.html)
What you are trying to do is in Shell way, awk has a way where you could keep checking if an element is part of an array or not, try following once.
printf 'a\nb\na\nc\nb\na\n' | awk '
!seen[$0]++ {
order[o++] = $0
}
END {
for (i=0; i<o; i++)
printf "%s\n", order[i]
}'
Here !seen[$0]++ means it is checking condition if an element is NOT a part of indexes of array named a then go inside the BLOCK(where your next statements are provided) then it does ++ which makes sure that this element(which was NOT there in array before checking condition)'s counter incremented by 1 so that next time this !seen[$0]++` condition is NOT TRUE for the already passed element.
Here's my program with the unnecessary to the issue things taken out:
BEGIN{
count = 0
total = 0
FS = ","
}
{
for(i=1; i<10; i++)
count += $i;
total += count
count = 0
}
END{ print(total) }
The count when it prints out comes out as the very large negative number
-2519999999999999782145076764868608
when I'm expecting a positive number.
How would I go about fixing this? I don't think it's a concatenation issue because there are more values in the csv than in the printed out number.
Okay I got it
Instead of += the total,
I'm doing
for(i=0; i<count; i++)
total ++
it may not be the prettiest but it gets the right answer!
I have seen numerous posts to achieve this task for individual fields, but I am struggling to apply it on multiple field separately.
input:
group1|apple|orange|lemon
group1|apple|kiwi|banana
group1|orange|cherry| lemon
group1|apple|orange|pear
(The real file has many more fields, so I need to use a loop to process each fields)
output:
Field|Fruit|Count
2|apple|3
2|orange|1
3|orange|2
3|kiwi|1
3|cherry|1
4|lemon|2
4|banana|1
4|pear|1
What I tried so far, but returns the entire count for all the fields:
awk '
BEGIN{FS=OFS="|"; print "Field|Fruit|Count"}
{
for(i=2; i<=NF; i++){
a[$i]=$i
count[$i]++
}
}
END{
for(j in count) print j OFS count[j]
}'
Use the field number as part of the key in the count array.
awk '
BEGIN{FS=OFS="|"; print "Field|Fruit|Count"}
{
for (i = 2; i <= NF; i++) {
count[i OFS $i]++;
}
}
END {
for (j in count) {
print j, count[j];
}
}'
awk 'BEGIN
{
INPUTFILE ='XXX'; iterator =0;
requestIterator =0;
storageFlag =T;
printFlag =F;
currentIteration =F;
recordCount =1;
while (getline < "'"$INPUTFILE"'")
{
requestArray[requestIterator]++;
requestIterator++;
}
}
if ($1 ~ /RequestId/)
{
FS = "=";
if($2 in requestArray)
{
storage[iterator] =$0;
printFlag =T;
next
}
else
{
storageFlag =F;
next
}
}
else
{
if((storageFlag =='T' && $0 != "EOE"))
{
storage[iterator]=$0; iterator++;
}
else {if(storageFlag == 'F')
{
next
}
else
{
if(printFlag == 'T')
{
for(details in storage)
{
print storage[details] >> FILE1;
delete storage[details];
}
printFlag =F;
storageFlag =T;
next
}
}'
I am facing some syntax error in the above code. Could you ppl please help me?
awk: BEGIN{INPUTFILE =XXXX;iterator =0;requestIterator =0;storageFlag =T;printFlag =F;currentIteration =F;recordCount =1;while (getline < ""){requestArray[requestIterator]++;requestIterator++;}}if ($1 ~ /RequestId/){FS = "=";if($2 in requestArray){storage[iterator] =$0;printFlag =T;next}else{storageFlag =F;next}}else{if((storageFlag ==T && $0 != EOE)){storage[iterator]=$0;iterator++;}else{if(storageFlag == F){next}else{if(printFlag == T){for(details in storage){print storage[details] >> XXXX;delete storage[details];}printFlag = F;storageFlag =T;next}}}}
awk: ^ syntax error
awk: ^ syntax error
Quotes are the problem. The first single quotes on INPUTFILE ='XXX' is going to be parsed as matching the one before BEGIN, and from then on all the parsing is broken.
Either escape the quotes or just put the awk file into a seperate file rather than "inline".
# STARTING POINT - known bad
awk 'BEGIN { INPUTFILE ='XXX'; iterator =0; ... '
Has to be rewritten to remove all of the single quotes inside the outer pair
awk 'BEGIN { INPUTFILE ="XXX"; iterator =0; ... '
Or depending on if you need doubles or singles, use doubles outside and single inside
awk "BEGIN { INPUTFILE ='XXX'; iterator =0; ... '
or escape the singles quotes so they make it through to awk and don't get consumed by the shell.
awk 'BEGIN { INPUTFILE =\'XXX\'; iterator =0; ... '
All of your problems go away if you put the awk script into a separate file rather than inlining it the shell. You can have whatever quotes you like and no one will care !!
I have created an awk program to go through the columns of a file and count each distinct word and then output totals into separate files
awk -F"$delim" {Field_Arr1[$1]++; Field_Arr2[$2]++; Field_Arr3[$3]++; Field_Arr4[$4]++};
END{\
# output fields
out_field1="top_field1"
out_field2="top_field2"
out_field3="top_field3"
out_field4="top_field4"
for( i=1; i <= NF; i++)
{
for (element in Field_Arr$i)
{
print element"\t"Field_Arr$i[element] >>out_field$i;
}
}
}' inputfile
but I don't know the appropriate syntax, so that the for loop will iterate through Field_Arr1, Field_Arr2, Field_Arr3, Field_Arr4?
I have tried using: i, $i, ${i}, {i}, "$i", and "i".
Am I trying the wrong approach or is there a way to change Field_Arr$i to Field_Arr1..4?
Thanks for the advice.
awk variables don't work that way; you'll have to do them individually by name, or use fake multidimensional arrays and parse out the components, something along the lines of:
{Field_Arr[1, $1]++; Field_Arr[2, $2]++; Field_Arr[3, $3]++; Field_Arr[4, $4]++}
END {
for (elt in Field_Arr) {
split(elt, ec, SUBSEP)
print ec[2] "\t" Field_Arr[elt] >> ("top_field" ec[1])
}
}
To count the frequencies for each column (3 in my example), try this
# Print list of word frequencies
function p_array(t,a) {
print t
for (i in a) {
print i, a[i]
}
}
{
c1[$1]++
c2[$1]++
c3[$1]++
}
END {
p_array("1st col",c1)
p_array("2nd col",c2)
p_array("3rd col",c3)
}