Using awk minus is not working although sum works fine? - awk

Any idea what I'm doing wrong with this statement, for minus I replaced += with -= ?
The idea is to sum or subtract matching rows. Sum works fine, minus will just a "-" sign before the value but not subtract.
awk '{for (i=2;i<=NF;i++) {a[$1][i]+=$i}} END{ for (j in a) {s=j; for (i=2;i<=NF;i++) {s=s" "a[j][i]}; print s}}'
awk '{for (i=2;i<=NF;i++) {a[$1][i]-=$i}} END{ for (j in a) {s=j; for (i=2;i<=NF;i++) {s=s" "a[j][i]}; print s}}'
input:
test 100 100 100 100
test2 100 90 80 0
test2 10 10 10 20
test 5 5 0 0
sum:
test2 110 100 90 20
test 105 105 100 100
minus:
test2 -110 -100 -90 -20
test -105 -105 -100 -100

Since there was no expected output, here is a guess at it:
$ awk '{
for(i=1;i<=NF;i++)
a[$1][i]=((a[$1][i]==""||i==1)?$i:a[$1][i]-$i)
}
END {
for(i in a)
for(j=1;j<=NF;j++)
printf "%s%s",a[i][j],(j==NF?ORS:OFS)
}' file
Output:
test2 90 80 70 -20
test 95 95 100 100
It's for GNU awk since I'm using two-dimensional arrays.

Related

awk: skip number of lines and apply a script on the remaining

How do I skip the first 7 lines (rows) and apply a script on the remaining lines? I tried the below script but unfortunately it removes everything in the file and makes it empty.
awk 'NR > 7 { $1="XD"$1}' file> temp && mv temp file
You are not far off the mark. Just append 1 to your script (this will cause the $0 to be printed):
awk 'NR > 7 { $1="XD"$1} 1' file> temp && mv temp file
skip number of lines
Use next statement in action to skip line
apply a script on the remaining
Your script does alter field value, but does not print it
Let file.txt content be
1 10 100
2 20 200
3 30 300
4 40 400
5 50 500
6 60 600
7 70 700
8 80 800
9 90 900
then
awk 'NR<=7{next}{$1="XD"$1;print}' file.txt
gives output
XD8 80 800
XD9 90 900
Observe that pattern-action pair with next is before other actions.
(tested in gawk 4.2.1)
full parameterized awk-based solution :::
# rows to skip "__"
string to prepend "___"
gawk '(__<NR) * ($_=___$_)^_' FS='^$' __=7 ___='XD' # gawk/nawk
mawk '(__<NR) * ($!_=___$_)^_' FS='^$' __=7 ___='XD' # any mawk
XD8 80 800
XD9 90 900

Sum column and count lines

I am trying to sum certain numbers in colum 2, it works with my code. But I want to count also how many times the same value in colum 2 is repeated and print in the last column.
file1
36 2605 1 2
36 2605 1 2
36 2603 1 2
36 2605 1 2
36 2605 1 2
36 2605 1 2
36 2606 1 2
Output Desired
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
I tried
awk '{a[$2]+=$1}{b[$2]+=$3}{c[$2]+=$4;count[$2]+=$2}END{for(i in a)print i,a[i],b[i],c[i],count[i]}' file1
Thanks in advance
Renamed the vars and added pretty print:
awk '
{
sum1[$2]+=$1
sum3[$2]+=$3
sum4[$2]+=$4
count[$2]++
len2=((l=length($2))>len2?l:len2)
len1=((l=length(sum1[$2]))>len1?l:len1)
len3=((l=length(sum3[$2]))>len3?l:len3)
len4=((l=length(sum4[$2]))>len4?l:len4)
len5=((l=length(sum5[$2]))>len5?l:len5)
}
END {
for(i in count) {
printf "%*d %*d %*d %*d %*d\n",
len2,i,len1,sum1[i],len3,sum3[i],len4,sum4[i],len5,count[i]
}
}' file
Output:
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
Space chars are relatively inexpensive these days, you should really consider getting some for your code, especially if you want other people to read it to help you debug it! Here's the code you posted:
awk '{a[$2]+=$1}{b[$2]+=$3}{c[$2]+=$4;count[$2]+=$2}END{for(i in a)print i,a[i],b[i],c[i],count[i]}' file1
and here it is after having been run through a code beautifier (I used gawk -o):
{
a[$2] += $1
}
{
b[$2] += $3
}
{
c[$2] += $4
count[$2] += $2
}
END {
for (i in a) {
print i, a[i], b[i], c[i], count[i]
}
}
See how just by adding some white space it's now vastly easier to understand and so the bug in how count[$2] is being populated is glaringly obvious? Some meaningful variable names are always extremely useful too and I hear alphanumeric chars are on special right now!
FWIW here's how I'd do this:
$ cat tst.awk
BEGIN { keyFldNr = 2 }
{
numOutFlds = 0
for (i=1; i<=NF; i++) {
if (i != keyFldNr) {
sum[$keyFldNr,++numOutFlds] += $i
}
}
cnt[$keyFldNr]++
}
END {
for (key in cnt) {
printf "%s%s", key, OFS
for (i=1; i<=numOutFlds; i++) {
printf "%s%s", sum[key,i], OFS
}
print cnt[key]
}
}
$ awk -f tst.awk file
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
$ awk -f tst.awk file | column -t
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
Notice that it'll work as-is no matter how many fields you have on each line and if you need to use a different field for the key that you count and sum on then you just change the value of keyFldNr in the BEGIN section from 2 to whatever you want it to be.
A non-awk approach, using the very useful GNU datamash, which is designed for tasks like this one:
$ datamash -Ws groupby 2 sum 1,3,4 count 2 < input.txt
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
Read as: For each group of rows with the same value in column 2, display that value, the sums of columns 1, 3 and 4, and the number of rows in the group.
You've almost nailed it, you're not increasing count[$2] properly.
$ awk '{a[$2]+=$1;b[$2]+=$3;c[$2]+=$4;count[$2]++}
END{for(i in a) print i,a[i],b[i],c[i],count[i]}' file
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
no need external program, faster ~21ms, tried on pure gnu awk
awk '{if($0~/^[A-Za-z0-9]/)a[NR]=$2" "$1" "$3" "$4}END{asort(a);$0="";for(;i++<NR;){split(a[i],b);if($1==""||b[1]==$1){$2+=b[2];$3+=b[3];$4+=b[4];$5++} else {print;$2=b[2];$3=b[3];$4=b[4];$5=1} $1=b[1]} print}' file1

AWK print next line of match between matches

Let's presume I have file test.txt with following data:
.0
41
0.0
42
0.0
43
0.0
44
0.0
45
0.0
46
0.0
START
90
34
17
34
10
100
20
2056
30
0.0
10
53
20
2345
30
0.0
10
45
20
875
30
0.0
END
0.0
48
0.0
49
0.0
140
0.0
With AWK how would I print the lines after 10 and 20 between START and END.
So the output would be.
100
2056
53
2345
45
875
I was able to get the lines with 10 and 20 with
awk '/START/,/END/ {if($0==10 || $0==20) print $0}' test.txt
but how would I get the next lines?
I actually got what I wanted with
awk '/^START/,/^END/ {if($0==10 || $0==20) {getline; print} }' test.txt
Range in awk works fine, but is less flexible than using flags.
awk '/^START/ {f=1} /^END/ {f=0} f && /^(1|2)0$/ {getline;print}' file
100
2056
53
2345
45
875
Don't use ranges as they make trivial things slightly briefer but require a complete rewrite or duplicate conditions when things get even slightly more complicated.
Don't use getline unless it's an appropriate application and you have read and fully understand http://awk.info/?tip/getline.
Just let awk read your lines as designed:
$ cat tst.awk
/START/ { inBlock=1 }
/END/ { inBlock=0 }
foundTgt { print; foundTgt=0 }
inBlock && /^[12]0$/ { foundTgt=1 }
$ awk -f tst.awk file
100
2056
53
2345
45
875
Feel free to use single-character variable names and cram it all onto one line if you find that useful:
awk '/START/{b=1} /END/{b=0} f{print;f=0} b&&/^[12]0$/{f=1}' file

updating a count depending on values in a file fulfilling criteria specified by a second file

I have two files and I want to update file A with a new column containing counts of how many times the number in $2 of file B fell with the range of $2 and $3 of file A, but only when $1 matches in both files.
file A
n01 2000 9000
n01 29000 41000
n01 60000 89000
n05 10000 15000
n80 5000 12000
n80 59000 68000
n80 100000 110000
file B
n01 6000
n01 6800
n01 35000
n05 14000
n80 65000
n80 104000
expected output
n01 2000 9000 2
n01 29000 41000 1
n01 60000 89000 0
n05 10000 15000 1
n80 5000 12000 0
n80 59000 68000 1
n80 100000 110000 1
awk '
FNR==NR{
A[$1,$2]
next
}
{
c = 0
for(i in A)
{
split(i,X,SUBSEP)
if(X[1] == $1)
{
if(X[2] >= $2 && X[2] <= $3)
{
c++
}
}
}
print $0,c
}
' fileB fileA
Not exactly strict awk, but you can help your script with some bash utils like this:
join fileA fileB -a1 | awk '{ key=$1 " " $2 " " $3; if (! (key in array) ){array[key]=0} } $4>=$2 && $4<=$3{key=$1 " " $2 " " $3; array[key]=array[key] + 1; }END{ for(val in array){print val" "array[val]} }' | sort -n
First join both files with the join command. Then create an array in AWK and sum 1 each time that the desired condition fulfills. Finally, you may want to sort your output to get elements sorted by the key.

Awk act as a substitution

I have a list of arrays with different information on which I am doing an awk.
number location length value
1 2 40 0.96
--- 5 45 0.97
4 5 47 0.96
--- 5 35 0.95
2 5 60 0.95
--- 3 55 0.96
awk '{if ($2=5 && $3 >= 40 && $3<=50 && $6>=0.96) print $0}' Infile.txt
It does give me the correct row --- 5 45 0.97 and 4 5 47 0.96.
However, if I want to add another condition, such as $1= ---, to only have the first output --- 5 45 0.97
awk '{if ($2=5 && $3 >= 40 && $3<=50 && $6>=0.96 && $1="\-\-\-") print $0}' Infile.txt >List_position.txt
it acts as a substitution, returning the previous output as 1 5 45 0.97 and 1 5 47 0.96.
I tried with $1=---, $1='\-\-\-' and they both didn't work. If I try with $1="---", it substitutes $1 to ---.
I am new with awk and I really don't understand why it does a substitution. If " " is a substitution in awk, how can I put a condition on ---?
You've done an assignment = instead of comparison == (and the result of the assignment evaluates to true because it is neither 0 nor an empty string).
awk '{if ($2==5 && $3 >= 40 && $3<=50 && $6>=0.96 && $1=="---") print }' Infile.txt >List_position.txt
You also have the '= vs ==' problem with the $2=5 assignment in the condition, but you didn't notice it because you were expecting to see 5 there (and thanks to JS웃 for pointing that out).
You also don't need the backslashes.