Number of fields returned by awk - awk

Is there a way to get awk to return the number of fields that met a field-separator criteria? Say, for instance, my file contains
a b c d
so, awk --field-separator=" " | <something> should return 4

The NF variable is set to the total number of fields in the input record. So:
echo "a b c d" | awk --field-separator=" " "{ print NF }"
will display
4
Note, however, that:
echo -e "a b c d\na b" | awk --field-separator=" " "{ print NF }"
will display:
4
2
Hope this helps, and happy awking

NF gives the number of fields for a given record:
[]$ echo "a b c d" | gawk '{print NF}'
4

If you would like to know the set of all the numbers of fields in a multiline content you can run:
X | awk '{print NF}' | sort -n | uniq
being X a command that outputs content in the standard output: cat, echo, etc. Example:
With file.txt:
a b
b c
c d
e t a
e u
The command cat file.txt | awk '{print NF}' | sort -n | uniq will print:
2
3
And with file2.txt:
a b
b c
c d
e u
The command cat file2.txt | awk '{print NF}' | sort -n | uniq will print:
2

awk(1) on FreeBSD does not recognize --field-separator. Use -v instead:
echo "a b c d" | awk -v FS=" " "{ print NF }"
It is a portable, POSIX way to define the field separator.

Related

How can I use awk to calculate sum and replace column in file

I'm new to the site and to the programming world and I hope you have time to help me.
My problem is as follows: I have a file with several columns. In the 2nd column there are values. I'm tring to calculate the sum of each values to a given number and to replace the second column with a new column containing the results of the sum.
Here an example of my input:
A B C
x 1 t
y 2 u
z 3 v
I want to sum values in B column to 5 and obtain an output like the one below:
A B C
x 6 t
y 7 u
z 8 v
The code I tried unsucesfully is
zcat my_file.vcf.gz| tail -n +49 | awk 'BEGIN{FS=OFS="\t"} {print $0, $2+5}'>my.output.vcf
Thanks in advance
We could avoid using tail since printing of lines from 49th line could be handled within awk itself. Also you need to add value in 2nd field and then you could print the whole line itself by print command.
Important point, as per OP's sample if 2nd field is having alphabets then need NOT to add 5 in it, so taken care of that condition too here.
zcat my_file.vcf.gz |
awk '
BEGIN{ FS=OFS="\t" }
FNR>=49{
$2=($2~/[a-zA-Z]/?$2:$2+5)
print
}
' > my.output.vcf
You can use
awk 'BEGIN{FS=OFS="\t"} {$2+=5}1'
Here, $2+=5 will add 5 to Filed 2 value, and 1 will trigger the display of the record (row, line, same as print $0).
See an online awk demo:
#!/bin/bash
s='A B C
x 1 t
y 2 u
z 3 v'
awk 'BEGIN{FS=OFS="\t"} {$2+=5}1' <<< "$s"
Output:
A 5 C
x 6 t
y 7 u
z 8 v
Another form for clarity:
awk 'BEGIN{FS=OFS="\t"} {print $1, $2+5, $3}'
you can use:
awk 'BEGIN {FS=OFS="\t"} NR == 1 {print $0} NR > 1 {print $1,($2+5),$3;}'
output:
A B C
x 6 t
y 7 u
z 8 v
Maube this can help you:
cat file | awk '{if (NR > 1 && $2 = ($2+5))
print $0;
else print $0;}'
Answer apply to your code:
zcat my_file.vcf.gz| tail -n +49 | awk '{if (NR > 1 && $2 = ($2+5)) print $0; else print $0;}' > my.output.vcf
cat boo
A B C
x 1 t
y 2 u
z 3 v
cat boo | awk 'BEGIN{FS=OFS="\t"} $2 ~ /^[0-9]+$/ {print $1, $2+5, $3} $2 !~ /^[0-9]+$/ {print} '
A B C
x 6 t
y 7 u
z 8 v

How to filter empty line with a 'cut' command?

I have a tab delimited file with a few fields:
f1 f2 f3
a b c
a c
d e
f g a
I want to extract the 3rd column with a 'cut'command:
cut -f3 t
This works. However, how can I filter the empty line in the output? As it can be seen, the 2nd and 3rd lines are empty after they are extracted.
To remove empty output:
$ cut -f3 file | grep .
f3
c
a
Or:
$ awk -F'\t' '$3 {print $3}' file
f3
c
a
To replace the missing output with a filler:
$ awk -F'\t' '{if ($3) print $3; else print "FILL"}' file
f3
c
FILL
FILL
a
Or, for people who like the more compact ternary statement:
$ awk -F'\t' '{print ($3?$3:"FILL")}' file
f3
c
FILL
FILL
a
Example with multiple words in field 3
$ cat file2
f1 f2 f3
f g a b c d
$ cut -f3 file2 | grep .
f3
a b c d
$ awk -F'\t' '$3 {print $3}' file2
f3
a b c d

need to rearrange and sum column in solaris command

I have below data named atp.csv file
Date_Time,M_ID,N_ID,Status,Desc,AMount,Type
2015-01-05 00:00:00 076,1941321748,BD9010423590206,200,Transaction Successful,2000,PRETOP
2015-01-05 00:00:00 077,1941323504,BD9010423590207,351,Transaction Successful,5000,PRETOP
2015-01-05 00:00:00 078,1941321743,BD9010423590205,200,Transaction Successful,1500,PRETOP
2015-01-05 00:00:00 391,1941323498,BD9010500000003,200,Transaction Successful,1000,PRETOP
i want to count status wise using below command.
cat atp.csv|awk -F',' '{print $4}'|sort|uniq -c
The output is like below:
3 200
1 351
But i want to like below output and also want to sum the amount column in status wise.
200,3,4500
351,1,5000
That is status is first and then count value.Please help..
AWK has associative arrays.
% cat atp.csv | awk -F, 'NR>1 {n[$4]+=1;s[$4]+=$6;} END {for (k in n) { print k "," n[k] "," s[k]; }}' | sort
200,3,4500
351,1,5000
In the above:
The first line (record) is skipped with NR>1.
n[k] is the number of occurrences of key k (so we add 1), and s[k] is the running sum values in field 6 (so we add $6).
Finally, after all records are processed (END), you can iterate over associated arrays by key (for (k in n) { ... }) and print the keys and values in arrays n and s associated with the key.
You can try this awk version also
awk -F',' '{print $4,",", a[$4]+=$6}' FileName | sort -r | uniq -cw 6 | sort -r
Output :
3 200 , 4500
1 351 , 5000
Another Way:
awk -F',' '{print $4,",", a[$4]+=$6}' FileName | sort -r | uniq -cw 6 |sort -r | sed 's/\([^ ]\+\).\([^ ]\+\).../\2,\1,/'
All in (g)awk
awk -F, 'NR>1{a[$4]++;b[$4]+=$6}
END{n=asorti(a,c);for(i=1;i<=n;i++)print c[i]","a[c[i]]","b[c[i]]}' file

linux/ubuntu awk match unique values (instead of bash "sort unique grep" unique values)

My command looks like this:
cut -f 1 dummy_FILE | sort | uniq -c | awk '{print $2}' | for i in $(cat -); do grep -w $i dummy_FILE |
awk -v VAR="$i" '{distance+=$3-$2} END {print VAR, distance}'; done
cat dummy_FILE
Red 13 14
Red 39 46
Blue 45 23
Blue 34 27
Green 31 73
I want to:
For every word in $1 dummy_FILE (Red, Blue, Green) - Calculate sum of differences between $3 and $2.
To get the output like this:
Red 8
Blue -29
Green 42
My questions are:
Is it possible to replace cut -f 1 dummy_FILE | sort | uniq -c | awk '{print $2}'?
I am using sort | uniq -c to extract every word from the dataset - is it possible to do it with awk?
How can I overcome useless cat in for i in $(cat -)?
grep -w $i dummy_FILE works fine, but I want to replace it with awk (should I?); If so how can I do this?
When I am trying to awk -v VAR="$i" '/^VAR/ '{distance+=$3-$2} END {print VAR, distance}' I am getting "fatal: division by zero attempted".
I got it using:
awk '{a[$1] = a[$1] + $3 - $2;} END{for (x in a) {print x" "a[x];}}' dummy_FILE
Output:
Blue -29
Green 42
Red 8
If you want to sort the output, just append sort after the AWK command.
Here's one way using awk:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] }' dummy
Results:
Red 8
Blue -29
Green 42
If you require sorted output, you could simply pipe into sort like arutaku suggests:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] }' dummy | sort
You can however, print into sort (within the awk statement), like this:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] | "sort" }' dummy

How do I print a range of data in awk?

I am reviewing my access_logs with a statment like:
cat access_log | grep 16/Sep/2012:17 | awk '{print $12 $13 $14 $15 $16}' | sort | uniq -c | sort -n | tail -40
The purpose is to see the user agent of the anyone that has been hitting my server for the last hour sorted by number of hits. My server has unusual activity to I want stop any unwanted spiders/etc.
But the part: awk '{print $12 $13 $14 $15 $16}' would be much preferred as something like: awk '{print $12-through-end-of-line}' so that I could see the whole user agent which is a different length for each one.
Is there a way to do this with awk?
Not extremely elegant, but this works:
grep 16/Sep/2012:17 access_log | awk '{for (i=12;i<=NF;++i) printf "%s ",$i;print ""}'
It has the side effect of condensing multiple spaces between fields down to one, and putting an extra one at the end of the line, though, which probably isn't critical.
I've never found one; in situations like this, I use cut (assuming I don't need awk's flexible handling of field separation):
# Assuming tab-separated fields, cut's default
grep 16/Sep/2012:17 access_log | cut -f12- | sort | uniq -c | sort -n | tail -40
# For space-separated fields (single spaces, not arbitrary amounts of whitespace)
grep 16/Sep/2012:17 access_log | cut -d' ' -f12- | sort | uniq -c | sort -n | tail -40
(Clarification: I've never found a good way. I've used #twalberg's for-loop when necessary, but prefer using cut if possible.)
$ echo somefields:; cat somefields ; echo from-to.awk: ; \
cat from-to.awk ; echo ;awk -f from-to.awk somefields
somefields:
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
from-to.awk:
{ for (i=12; i<=NF; i++) { printf "%s ", $i }; print "" }
l m n o p q r s t u v w x y z
12 13 14 15 16 17 18 19 20 21
from man awk:
NF The number of fields in the current input record.
So you basically loop through fields (separated by spaces) from 12 to the last one.
why not
#!/bin/bash
awk "/$1/"'{for (i=12;i<=NF;i++) printf("%s ", $i) ;printf "\n"}' log | sort | uniq -c | sort -n | tail -40
in a script file.
Then you can call it like
myMonitor.sh 16/Sep/2012:17
Don't have a way to test this right. Appologies for any formatting/syntax errors.
Hopefully you get the idea.
IHTH
awk '/16/Sep/2012:17/{for(i=1;i<12;i++){$i="";}print}' access_log| sort | uniq -c | sort -n | tail -40