awk printing with and without {}? - awk

I have sample data
Person Owe
John 1
John 2
John -1
John -10
John 5
John 9
John -4
John 9
John 2
John 3
I was using script file to parse this and the command is
awk -f script data
, script content:
NR == 2 { print "Starting Here"; x = $2; next}
{x+= $2} #This x+=$2
END{ print x }
Output:
Starting Here
16
But when I remove {} from {x+=$2}
Output:
Starting Here
John 2
John -1
John -10
John 5
John 9
John -4
John 9
John 2
John 3
16
Q) Why does it print if I don't have a {} around the variable addition?

Answering own question:
A rule needs an action. If action is missing it defaults to printing the line.
Just mentioning x+=$2 is a rule, And if we give {x+=$2} or x+=$2{} those have/are action.

Related

Calculating cumulative sum and percent of total for columns grouped by row

I have a very large table of values that is formatted like this:
apple 1 1
apple 2 1
apple 3 1
apple 4 1
banana 25 4
banana 35 10
banana 36 10
banana 37 10
Column 1 has many different fruit, with varying numbers of rows for each fruit.
I would like to calculate the cumulative sum of column 3 for each type of fruit in column 1, and the cumulative percentage of the total at each row, and add these as new columns. So the desired output would be this:
apple 1 1 1 25.00
apple 2 1 2 50.00
apple 3 1 3 75.00
apple 4 1 4 100.00
banana 25 4 4 11.76
banana 35 10 14 41.18
banana 36 10 24 70.59
banana 37 10 34 100.00
I can get part way there with awk, but I am struggling with how to get the cumulative sum to reset at each new fruit. Here is my horrendous awk attempt for your viewing pleasure:
#!/bin/bash
awk '{cumsum += $3; $3 = cumsum} 1' fruitfile > cumsum.tmp
total=$(awk '{total=total+$3}END{print total}' fruitfile)
awk -v total=$total '{ printf ("%s\t%s\t%s\t%.5f\n", $1, $2, $3, ($3/total)*100)}' cumsum.tmp > cumsum.txt
rm cumsum.tmp
Could you please try following, written and tested with shown samples.
awk '
FNR==NR{
a[$1]+=$NF
next
}
{
sum[$1]+=($NF/a[$1])*100
print $0,++b[$1],sum[$1]
}
' Input_file Input_file |
column -t
Output for shown samples will be as follows.
apple 1 1 1 25
apple 2 1 2 50
apple 3 1 3 75
apple 4 1 4 100
banana 25 4 1 11.7647
banana 35 10 2 41.1765
banana 36 10 3 70.5882
banana 37 10 4 100
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
a[$1]+=$NF ##Creating array a with index $1 and keep adding its last field value to it.
next ##next will skip all further statements from here.
}
{
sum[$1]+=($NF/a[$1])*100 ##Creating sum with index 1st field and keep adding its value to it, each value will have last field/value of a[$1] and multiplying it with 100.
print $0,++b[$1],sum[$1] ##Printing current line, array b with 1st field with increasing value of 1 and sum with index of 1st field.
}
' Input_file Input_file | ##Mentioning Input_file name here.
column -t ##Sending awk output to column command for better look.

awk Standard Deviation

I'm trying to work out the standard deviation for a set of students marks in different subjects. I'm just a bit stuck on the last calculation I need to do and I'm just not sure what the issue is.
BEGIN {
i=0
printf("\nResults for form 6B\n")
}
$1=="SUBJECT" {
i++
subject[i]=$2
total[i]=0
count[i]=0
printf("\nLits of %s Students\n",subject[i])
printf("Name Mark Pass/Fail\n")
printf("---- ---- ---------\n")
}
NF>2 { mark[i] = ($3+$4)/2
student=$2" "$1
total[i] = total[i]+mark[i]
count[i] = count[i]+1
if (mark[i]>49)
result="Pass"
else
result="Fail"
printf("%-14s%-3d%10s \n",student, mark[i], result)
}
END { top = i
printf("\nSubject Mean Standard Deviation\n")
printf("------- ---- ------------------\n")
var=0
for(i=1;i<=top;i++){
mean[i]=total[i] / count[i]
var+=((mark[i]-mean[i])^2) #Standard deviation not working#
stdev=sqrt(var/count[i])
printf("%16-s%-3d%12d \n",subject[i],mean[i],stdev)
}
}
Forgot to add input file "marks"
FORM 6B
SUBJECT Maths
Smith John 40 50
Evans Mike 50 80
SUBJECT Physics
Jones Tom 35 65
Evans Mike 46 76
Smith John 34 56
SUBJECT Chemistry
Jones Tom 50 60
Evans Mike 30 40
Output I'm getting is Maths 7 Physics 7 Chemistry 11
The correct values are 10 6 10
Have a look at gawk's printf documentation. The following will illustrate what is happening:
$ awk 'BEGIN { printf "%%d:%d %%i:%i %%f:%f %%s:%s\n", 3.8, 3.8, 3.8, 3.8}'
%d:3 %i:3 %f:3.800000 %s:3.8
So, %i and %d floor the float. You can specify how the number look like in %f with some modifiers.

How do you "normalize" a list of records with awk

Say I have a tab-delimited list of records with two fields per record, like this
bobby joe, jr a,b,c
sue smith b,d
Imagine there is a TAB character between the name column and the column with the series of single letters.
The goal is to "normalize" the data so it looks like this:
bobby joe, jr a
bobby joe, jr b
bobby joe, jr c
sue smith b
sue smith d
I would like to learn how to do this specifically with awk.
You can use define spaces* or comma as possible delimiters and then loop through the string printing the first field plus another, just like this:
$ awk -F" *|," '{for (i=2; i<=NF; i++) print $1, $i}' file
bob a
bob b
bob c
sue b
sue d
Given the updated question, with data TAB records, you can split the records like this:
$ awk -F"\t" '{n=split($2,a,","); for (i=1;i<=n;i++) print $1, a[i]}' file
bobby joe, jr a
bobby joe, jr b
bobby joe, jr c
sue smith b
sue smith d
Explanation
-F"\t" sets tab as field separator.
n=split($2,a,",") slices the 2nd field in pieces given the , separator. As split() returns the number of pieces, we store that number in n.
for (i=1;i<=n;i++) print $1, a[i] loops through the pieces and prints them together with the first field.
If you want pretty printing and the whole shebang:
$ echo -e "bobby joe, jr\ta,b,c\nsue smith\tb,d" \
| awk -F"\t" '
BEGIN {MaxLen = 0}
{
a[NR] = $0;
if (length($1) > MaxLength) {
MaxLength = length($1)
}
}
END {
for (i in a) {
split(a[i], Fields);
split(Fields[2], Values, ",");
for (j = 1; j <= length(Values); j++) {
printf("%-"MaxLength"s\t%s\n", Fields[1], Values[j])
}
}
}'
bobby joe, jr a
bobby joe, jr b
bobby joe, jr c
sue smith b
sue smith d

AWK associative array

Suppose I have 2 files
File-1 map.txt
1 tony
2 sean
3 jerry
4 ada
File-2 relation.txt
tony sean
jerry ada
ada sean
Expected-Output result.txt
1 2
3 4
4 2
My code was:
awk 'FNR==NR{map[$1]=$2;next;} {$1=map[$1]; $2=map[$2]; print $0}' map.txt relation.txt > output.txt
But I got the left column only:
1
3
4
It seems that something wrong near $2=map[$2].
Very appreciated if you could help.
You've got the mapping creation the wrong way around, it needs to be:
map[$2] = $1
Your current script maps numbers to names whereas what you seem to be after is a map from names to numbers.
The following transcript shows the corrected script:
pax> cat m.txt
1 tony
2 sean
3 jerry
4 ada
pax> cat r.txt
tony sean
jerry ada
ada sean
pax> awk 'FNR==NR{map[$2]=$1;next;}{$1=map[$1];$2=map[$2];print $0}' m.txt r.txt
1 2
3 4
4 2
Using awk.
awk 'FNR==NR{map[$2]=$1;next;}{print map[$1], map[$2]}' m.txt r.txt

How to do the sum in individual line one by one in linux?

How to do the sum in individual line in Linux?
I have one file :
Course Name: Math
Credit: 4
12345 1 4 5 1 1 1 1 1 5 10 1 2 2 20
34567 2 3 4 1 10 5 3 2 5 5 10 20 5
Course Name: English
Credit: 4
12345 1 4 5 1 1 1 1 1 5 10 1 20
34567 4 1 10 5 3 2 5 5 10 20 5
Its output will be come:
Course Name: Math
Credit: 4
12345 55
34567 75
Course Name: English
Credit: 4
12345 51
34567 70
I tried this code:
awk '{for (i=2; i<=NF; i++) {tot += $1}; print $1 "\t" tot; tot =0}' file > file2
The output is like this:
Course Name: 0
Credit: 4
12345 55
34567 75
Course Name: 0
Credit: 4
12345 51
34567 70
Actually I need to display a Course name too (Math and English). I am trying to fix it but I couldn't. Can you please help?
Try:
awk '/^[0-9]/{for (i=2; i<=NF; i++) {tot += $i}; print $1 "\t" tot; tot =0} !/^[0-9]/'
This will only sum lines that start with a digit, and simply print those that don't.
Just with the shell
while read line; do
case $line in
Course*|Credit*) echo "$line" ;;
*) set -- $line
id=$1
shift 1
sum=$(IFS=+; echo "$*" | bc)
printf "%s\t%d\n" $id $sum
;;
esac
done < filename
This might work for you too!
sed '/:/{s/.*/echo "&"/;b};s/ /+/2g;s/\(\S*\) \(.*\)/echo "\1\t\$((\2))"/' file | sh