awk Standard Deviation

awk Standard Deviation - awk

I'm trying to work out the standard deviation for a set of students marks in different subjects. I'm just a bit stuck on the last calculation I need to do and I'm just not sure what the issue is.
BEGIN {
i=0
printf("\nResults for form 6B\n")
}
$1=="SUBJECT" {
i++
subject[i]=$2
total[i]=0
count[i]=0
printf("\nLits of %s Students\n",subject[i])
printf("Name Mark Pass/Fail\n")
printf("---- ---- ---------\n")
}
NF>2 { mark[i] = ($3+$4)/2
student=$2" "$1
total[i] = total[i]+mark[i]
count[i] = count[i]+1
if (mark[i]>49)
result="Pass"
else
result="Fail"
printf("%-14s%-3d%10s \n",student, mark[i], result)
}
END { top = i
printf("\nSubject Mean Standard Deviation\n")
printf("------- ---- ------------------\n")
var=0
for(i=1;i<=top;i++){
mean[i]=total[i] / count[i]
var+=((mark[i]-mean[i])^2) #Standard deviation not working#
stdev=sqrt(var/count[i])
printf("%16-s%-3d%12d \n",subject[i],mean[i],stdev)
}
}
Forgot to add input file "marks"
FORM 6B
SUBJECT Maths
Smith John 40 50
Evans Mike 50 80
SUBJECT Physics
Jones Tom 35 65
Evans Mike 46 76
Smith John 34 56
SUBJECT Chemistry
Jones Tom 50 60
Evans Mike 30 40
Output I'm getting is Maths 7 Physics 7 Chemistry 11
The correct values are 10 6 10

Have a look at gawk's printf documentation. The following will illustrate what is happening:
$ awk 'BEGIN { printf "%%d:%d %%i:%i %%f:%f %%s:%s\n", 3.8, 3.8, 3.8, 3.8}'
%d:3 %i:3 %f:3.800000 %s:3.8
So, %i and %d floor the float. You can specify how the number look like in %f with some modifiers.

Related

How to get a specific column from a list with awk

I have these statistics of a table in netezza
/nz/support-IBM_Netezza-11.2.1.1-210825-0050/bin/nz_genstats OID_DB.OID_DB.OID_PAGOS_APLICADOS_FIJO
/nz/support-IBM_Netezza-11.2.1.1-210825-0050/bin/nz_get OID_DB.OID_DB.OID_PAGOS_APLICADOS_FIJO
Table: OID_PAGOS_APLICADOS_FIJO (276666)
Rowcount: 9,602,310
Dist Key: IDFAC
attnum Column Name Statistics Status Minimum Value Maximum Value # of Unique Values # of NULLs MaxLen AvgLen
------ ------------------------ ================== =============== =============== ==================== ==================== ====== ======
1 FECHA_PROCESO Express 2020-01-01 2022-08-01 940
2 DOCUMENTO Express 0011895954 9998147 2,235,478 12 10
3 USUARIO Express AAGARCIA ZRAMIREC 1,509 20 14
4 NOMBRE_USUARIO Express ABEL DAVID SARI ZOILA ROSA RAMI 1,525 71 23
5 FECHA_PAGO Express 2009-06-19 10:2 2022-08-01 20:2 308,032
6 FECHA_PAGO_CONTABLE Express 2009-06-19 10:2 2022-08-01 20:2 305,643
7 TIPO_DOC Express AJC VKA 50 5 5
8 DESCRIPCION_TIPO_DOC Express AJUSTE TRANSFERENCIA 48 92,138 34 18
9 CODIGO_BANCO Express 003 999 10 1,815,649 5 5
10 NOMBRE_BANCO Express BOLIVARIANO BAN TELMEX RRHH 9 1,817,818 23 19
11 CTA_CORRIENTE Express 0005046294 7621019 18 1,815,649 52 52
12 CODIGO_CLIENTE Express 00000005 20523352 516,577 10 10
13 IDENTIFICACION Express 077083801 h234573 516,384 17 12
14 TIPO_IDENTIDICACION Express CEDULA DE IDENT RUC 3 21 20
15 NOMBRE_CLIENTE Express BEIERSDORF S.A �USTA SELENA QU 518,080 112 31
16 SEGMENTO_MERCADO Express CARRIER RESIDENCIAL 9 4 24 13
17 GESTOR Express ANGEL GUILLERMO RRAMIREG 6 9,539,531 32 19
18 REF_LOTE Express 6926 78937 41 9,539,282
19 VALOR_RECIBIDO Express 0.0100 3237920.0000 43,192
20 ESTADO_RECIBO_NC Express A PAGADO TOTALMEN 4 21 4
21 SALDO Express -123.38 35197.12 5,795
22 IDFAC Express 0000000094 0067735776 8,687,120 648 12 12
23 TIPO_DOC_AFEC Express AJD NDI 13 648 5 5
24 FACTURA Express 000-000-0000001 999-999-0067722 2,260,744 651 20 18
25 FECHA_EMISION_FACTURA Express 2004-09-08 00:0 2023-03-15 00:0 4,196 648
26 MES_FACTURA Express 200409 202303 220 648 8 8
27 ID_CICLO Express 1 429 22 5,803,887
28 CICLO_DOC Express CICLO 2 MENSUAL CICLO VARIOS QU 22 5,803,887 31 17
29 VALOR_APLICADO Express 0.0020 381157.3100 37,738 2
30 FECHA_APLICACION Express 2020-01-01 00:0 2022-08-01 23:4 787,990 2
31 FORMAPAGO Express CHEQUE TRANSFERENCIAS 7 5,784,974 26 15
32 ESTADO_DOCUMENTO Express EMITIDO PAGADO TOTALMEN 3 93,703 21 19
33 FECHA_VENCIMIENTO Express 2004-09-23 00:0 2025-07-26 12:2 315,756 648
34 MES_VENCIMIENTO Express 200409 202507 251 648 8 8
35 PARROQUIA Express 12 DE MARZO ZONA NAVAL 1,010 1,603,596 41 14
36 CANTON Express 24 DE MAYO ZAMORA 103 1,603,596 29 9
37 CODIGO_SUCURSAL Express 0000000003 0018313083 560,976 22,723 12 12
38 ID_CANAL Express ASP VENT 5 4,750,391 6 6
39 DESC_CANAL Express Autoservicio Ventanilla 5 4,750,391 26 16
how can i get the columns attnum, column name and # of unique values
I have this Shell Script
table="OID_DB.OID_DB.OID_PAGOS_APLICADOS_FIJO"
gen_stats=$(/nz/support-IBM_Netezza-11.2.1.1-210825-0050/bin/nz_genstats $table)
get_stats=$(/nz/support-IBM_Netezza-11.2.1.1-210825-0050/bin/nz_get $table)
echo "$get_stats" | awk '/FECHA_PROCESO/, /DESC_CANAL/' | awk '{ print $1"|"$2"|"$6 }'
but the result obtained is
1|FECHA_PROCESO|940
2|DOCUMENTO|2,235,478
3|USUARIO|1,509
4|NOMBRE_USUARIO|SARI
5|FECHA_PAGO|2022-08-01
6|FECHA_PAGO_CONTABLE|2022-08-01
7|TIPO_DOC|50
8|DESCRIPCION_TIPO_DOC|48
9|CODIGO_BANCO|10
10|NOMBRE_BANCO|TELMEX
11|CTA_CORRIENTE|18
12|CODIGO_CLIENTE|516,577
13|IDENTIFICACION|516,384
14|TIPO_IDENTIDICACION|IDENT
15|NOMBRE_CLIENTE|�USTA
16|SEGMENTO_MERCADO|9
17|GESTOR|RRAMIREG
18|REF_LOTE|41
19|VALOR_RECIBIDO|43,192
20|ESTADO_RECIBO_NC|TOTALMEN
21|SALDO|5,795
22|IDFAC|8,687,120
23|TIPO_DOC_AFEC|13
24|FACTURA|2,260,744
25|FECHA_EMISION_FACTURA|2023-03-15
26|MES_FACTURA|220
27|ID_CICLO|22
28|CICLO_DOC|MENSUAL
29|VALOR_APLICADO|37,738
30|FECHA_APLICACION|2022-08-01
31|FORMAPAGO|7
32|ESTADO_DOCUMENTO|TOTALMEN
33|FECHA_VENCIMIENTO|2025-07-26
34|MES_VENCIMIENTO|251
35|PARROQUIA|MARZO
36|CANTON|MAYO
37|CODIGO_SUCURSAL|560,976
38|ID_CANAL|5
39|DESC_CANAL|5
How can I get the values of the # of Unique Values column

Using GNU awk for FIELDWIDTHS:
$ cat tst.awk
BEGIN { OFS="|" }
/^[-= ]+$/ {
inVals = 1
for ( i=1; i<=NF; i++ ) {
wids = wids " " (length($i) + 1)
}
FIELDWIDTHS = wids
$0 = prev
for ( i=1; i<=NF; i++ ) {
gsub(/^\s+|\s+$/,"",$i)
f[$i] = i
}
}
{ prev = $0}
inVals {
for ( i=1; i<=NF; i++ ) {
gsub(/^\s+|\s+$/,"",$i)
}
print $(f["attnum"]), $(f["Column Name"]), $(f["# of Unique Values"])
}
$ awk -f tst.awk file
attnum|Column Name|# of Unique Values
1|FECHA_PROCESO|940
2|DOCUMENTO|2,235,478
3|USUARIO|1,509
4|NOMBRE_USUARIO|1,525
5|FECHA_PAGO|308,032
6|FECHA_PAGO_CONTABLE|305,643
7|TIPO_DOC|50
8|DESCRIPCION_TIPO_DOC|48
9|CODIGO_BANCO|10
10|NOMBRE_BANCO|9
11|CTA_CORRIENTE|18
12|CODIGO_CLIENTE|516,577
13|IDENTIFICACION|516,384
14|TIPO_IDENTIDICACION|3
15|NOMBRE_CLIENTE|518,080
16|SEGMENTO_MERCADO|9
17|GESTOR|6
18|REF_LOTE|41
19|VALOR_RECIBIDO|43,192
20|ESTADO_RECIBO_NC|4
21|SALDO|5,795
22|IDFAC|8,687,120
23|TIPO_DOC_AFEC|13
24|FACTURA|2,260,744
25|FECHA_EMISION_FACTURA|4,196
26|MES_FACTURA|220
27|ID_CICLO|22
28|CICLO_DOC|22
29|VALOR_APLICADO|37,738
30|FECHA_APLICACION|787,990
31|FORMAPAGO|7
32|ESTADO_DOCUMENTO|3
33|FECHA_VENCIMIENTO|315,756
34|MES_VENCIMIENTO|251
35|PARROQUIA|1,010
36|CANTON|103
37|CODIGO_SUCURSAL|560,976
38|ID_CANAL|5
39|DESC_CANAL|5

My 2 cts to print fields of fixed width with awk
gawk 'BEGIN{OFS="|"}
{ if($0 ~ /\--|==/) {
print $0
for ( i=1; i<=NF; i++ ) {
if(i == 1){
fl[i]=length($i) + 1
} else {
fl[i]= fl[i - 1] + length($i) + 1
}
# fix double space at field 3
fl[3]=fl[3] + 1
}
}
if(NR >6){
print substr($0,1,fl[1]), substr($0,fl[1],fl[2] - fl[1]), substr($0,fl[5],fl[6] - fl[5])
}
}' test.txt | tr -d ' '
Result
1|FECHA_PROCESO|940
2|DOCUMENTO|2,235,478
3|USUARIO|1,509
4|NOMBRE_USUARIO|1,525
5|FECHA_PAGO|308,032
6|FECHA_PAGO_CONTABLE|305,643
7|TIPO_DOC|50
8|DESCRIPCION_TIPO_DOC|48
9|CODIGO_BANCO|10
10|NOMBRE_BANCO|9
11|CTA_CORRIENTE|18
12|CODIGO_CLIENTE|516,577
13|IDENTIFICACION|516,384
14|TIPO_IDENTIDICACION|3
15|NOMBRE_CLIENTE|518,080
....

With your shown samples and attempts only, please try following awk code, written and tested in GNU awk. Using match function of GNU awk here where I am mentioning regex ^\s+([0-9]+)\s+(\S+)\s+\S+\s+\S+\s+\S+\s+(\S+) which is further creating 2 capturing groups and as per match function giving array named arr which stores ONLY capturing group values into it. Since there are 3 capturing groups getting created so it will create 3 items into array arr starting from index 1 till total number of total capturing groups.
awk '
BEGIN{ OFS="|" }
match($0,/^\s+([0-9]+)\s+(\S+)\s+\S+\s+\S+\s+\S+\s+(\S+)/,arr){
print arr[1],arr[2],arr[3]
}
' Input_file
OR improving my own regex in above code, this will create 4 capturing groups out of which we need to print only 1st, 2nd and 4th values only as per requirement.
awk '
BEGIN{ OFS="|" }
match($0,/^\s+([0-9]+)\s+(\S+)(\s+\S+){3}\s+(\S+)/,arr){
print arr[1],arr[2],arr[4]
}
' Input_file
Explanation of regex: Adding detailed explanation for used regex in code.
^\s+ ##Matching spaces 1 or more occurrences from starting.
([0-9]+) ##Creating 1st capturing group which has 1 or more number of digits in it.
\s+ ##matching 1 or more spaces here.
(\S+) ##Creating 2nd capturing group which has 1 or more non-spaces here.
\s+\S+\s+ ##Matching 1 or more spaces followed by 1 or more non-spaces followed by 1 or more spaces.
\S+\s+\S+ ##Matching 1 or more non-spaces followed by 1 or more spaces followed by 1 or more non-spaces.
\s+ ##matching 1 or more spaces here.
(\S+) ##Creating 3rd capturing group which has 1 or more non-spaces here.

took me long enough :
mawk '
/^[ =-]+==[ =-]+$/ {
__=sprintf("%c",_+=___=(_+=_^=FS)*_)
___+=___
do {
sub("^",__,$_) } while(++_<___)
_=match($!_, __ (".+")__)
____=NR;___ = 3;__ = RLENGTH
} +____<NR {
$(___) = substr($!NF,_,__)
gsub("[^0-9]+","",$(NF =___)); print }' OFS='\f\r\t' ____=999
1
FECHA_PROCESO
940
2
DOCUMENTO
2235478
3
USUARIO
1509
4
NOMBRE_USUARIO
1525
5
FECHA_PAGO
308032
6
FECHA_PAGO_CONTABLE
305643
7
TIPO_DOC
50
8
DESCRIPCION_TIPO_DOC
48
9
CODIGO_BANCO
101
10
NOMBRE_BANCO
91
11
CTA_CORRIENTE
181
12
CODIGO_CLIENTE
516577
13
IDENTIFICACION
516384
14
TIPO_IDENTIDICACION
3
15
NOMBRE_CLIENTE
518080
16
SEGMENTO_MERCADO
9
17
GESTOR
69
18
REF_LOTE
419
19
VALOR_RECIBIDO
43192
20
ESTADO_RECIBO_NC
4
21
SALDO
5795
22
IDFAC
8687120
23
TIPO_DOC_AFEC
13
24
FACTURA
2260744
25
FECHA_EMISION_FACTURA
4196
26
MES_FACTURA
220
27
ID_CICLO
225
28
CICLO_DOC
225
29
VALOR_APLICADO
37738
30
FECHA_APLICACION
787990
31
FORMAPAGO
75
32
ESTADO_DOCUMENTO
3
33
FECHA_VENCIMIENTO
315756
34
MES_VENCIMIENTO
251
35
PARROQUIA
10101
36
CANTON
1031
37
CODIGO_SUCURSAL
560976
38
ID_CANAL
54
39
DESC_CANAL
54

awk printing with and without {}?

I have sample data
Person Owe
John 1
John 2
John -1
John -10
John 5
John 9
John -4
John 9
John 2
John 3
I was using script file to parse this and the command is
awk -f script data
, script content:
NR == 2 { print "Starting Here"; x = $2; next}
{x+= $2} #This x+=$2
END{ print x }
Output:
Starting Here
16
But when I remove {} from {x+=$2}
Output:
Starting Here
John 2
John -1
John -10
John 5
John 9
John -4
John 9
John 2
John 3
16
Q) Why does it print if I don't have a {} around the variable addition?

Answering own question:
A rule needs an action. If action is missing it defaults to printing the line.
Just mentioning x+=$2 is a rule, And if we give {x+=$2} or x+=$2{} those have/are action.

Awk script displaying incorrect output

I'm facing an issue in awk script - I need to generate a report containing the lowest, highest and average score for each assignment in the data file. The name of the assignment is located in column 3.
Input data is:
Student,Catehory,Assignment,Score,Possible
Chelsey,Homework,H01,90,100
Chelsey,Homework,H02,89,100
Chelsey,Homework,H03,77,100
Chelsey,Homework,H04,80,100
Chelsey,Homework,H05,82,100
Chelsey,Homework,H06,84,100
Chelsey,Homework,H07,86,100
Chelsey,Lab,L01,91,100
Chelsey,Lab,L02,100,100
Chelsey,Lab,L03,100,100
Chelsey,Lab,L04,100,100
Chelsey,Lab,L05,96,100
Chelsey,Lab,L06,80,100
Chelsey,Lab,L07,81,100
Chelsey,Quiz,Q01,100,100
Chelsey,Quiz,Q02,100,100
Chelsey,Quiz,Q03,98,100
Chelsey,Quiz,Q04,93,100
Chelsey,Quiz,Q05,99,100
Chelsey,Quiz,Q06,88,100
Chelsey,Quiz,Q07,100,100
Chelsey,Final,FINAL,82,100
Chelsey,Survey,WS,5,5
Sam,Homework,H01,19,100
Sam,Homework,H02,82,100
Sam,Homework,H03,95,100
Sam,Homework,H04,46,100
Sam,Homework,H05,82,100
Sam,Homework,H06,97,100
Sam,Homework,H07,52,100
Sam,Lab,L01,41,100
Sam,Lab,L02,85,100
Sam,Lab,L03,99,100
Sam,Lab,L04,99,100
Sam,Lab,L05,0,100
Sam,Lab,L06,0,100
Sam,Lab,L07,0,100
Sam,Quiz,Q01,91,100
Sam,Quiz,Q02,85,100
Sam,Quiz,Q03,33,100
Sam,Quiz,Q04,64,100
Sam,Quiz,Q05,54,100
Sam,Quiz,Q06,95,100
Sam,Quiz,Q07,68,100
Sam,Final,FINAL,58,100
Sam,Survey,WS,5,5
Andrew,Homework,H01,25,100
Andrew,Homework,H02,47,100
Andrew,Homework,H03,85,100
Andrew,Homework,H04,65,100
Andrew,Homework,H05,54,100
Andrew,Homework,H06,58,100
Andrew,Homework,H07,52,100
Andrew,Lab,L01,87,100
Andrew,Lab,L02,45,100
Andrew,Lab,L03,92,100
Andrew,Lab,L04,48,100
Andrew,Lab,L05,42,100
Andrew,Lab,L06,99,100
Andrew,Lab,L07,86,100
Andrew,Quiz,Q01,25,100
Andrew,Quiz,Q02,84,100
Andrew,Quiz,Q03,59,100
Andrew,Quiz,Q04,93,100
Andrew,Quiz,Q05,85,100
Andrew,Quiz,Q06,94,100
Andrew,Quiz,Q07,58,100
Andrew,Final,FINAL,99,100
Andrew,Survey,WS,5,5
Ava,Homework,H01,55,100
Ava,Homework,H02,95,100
Ava,Homework,H03,84,100
Ava,Homework,H04,74,100
Ava,Homework,H05,95,100
Ava,Homework,H06,84,100
Ava,Homework,H07,55,100
Ava,Lab,L01,66,100
Ava,Lab,L02,77,100
Ava,Lab,L03,88,100
Ava,Lab,L04,99,100
Ava,Lab,L05,55,100
Ava,Lab,L06,66,100
Ava,Lab,L07,77,100
Ava,Quiz,Q01,88,100
Ava,Quiz,Q02,99,100
Ava,Quiz,Q03,44,100
Ava,Quiz,Q04,55,100
Ava,Quiz,Q05,66,100
Ava,Quiz,Q06,77,100
Ava,Quiz,Q07,88,100
Ava,Final,FINAL,99,100
Ava,Survey,WS,5,5
Shane,Homework,H01,50,100
Shane,Homework,H02,60,100
Shane,Homework,H03,70,100
Shane,Homework,H04,60,100
Shane,Homework,H05,70,100
Shane,Homework,H06,80,100
Shane,Homework,H07,90,100
Shane,Lab,L01,90,100
Shane,Lab,L02,0,100
Shane,Lab,L03,100,100
Shane,Lab,L04,50,100
Shane,Lab,L05,40,100
Shane,Lab,L06,60,100
Shane,Lab,L07,80,100
Shane,Quiz,Q01,70,100
Shane,Quiz,Q02,90,100
Shane,Quiz,Q03,100,100
Shane,Quiz,Q04,100,100
Shane,Quiz,Q05,80,100
Shane,Quiz,Q06,80,100
Shane,Quiz,Q07,80,100
Shane,Final,FINAL,90,100
Shane,Survey,WS,5,5
awk script :
BEGIN {
FS=" *\\, *"
}
FNR>1 {
min[$3]=(!($3 in min) || min[$3]> $4 )? $4 : min[$3]
max[$3]=(max[$3]> $4)? max[$3] : $4
cnt[$3]++
sum[$3]+=$4
}
END {
print "Name\tLow\tHigh\tAverage"
for (i in cnt)
printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])
}
Expected sample output:
Name Low High Average
Q06 77 95 86.80
L05 40 96 46.60
WS 5 5 5
Q07 58 100 78.80
L06 60 99 61
L07 77 86 64.80
When I run the script, I get a "Low" of 0 for all assignments which is not correct. Where am I going wrong? Please guide.

You can certainly do this with awk, but since you tagged this scripting as well, I'm assuming other tools are an option. For this sort of gathering of statistics on groups present in the data, GNU datamash often reduces the job to a simple one-liner. For example:
$ (echo Name,Low,High,Average; datamash --header-in -s -t, -g3 min 4 max 4 mean 4 < input.csv) | tr , '\t'
Name Low High Average
FINAL 58 99 85.6
H01 19 90 47.8
H02 47 95 74.6
H03 70 95 82.2
H04 46 80 65
H05 54 95 76.6
H06 58 97 80.6
H07 52 90 67
L01 41 91 75
L02 0 100 61.4
L03 88 100 95.8
L04 48 100 79.2
L05 0 96 46.6
L06 0 99 61
L07 0 86 64.8
Q01 25 100 74.8
Q02 84 100 91.6
Q03 33 100 66.8
Q04 55 100 81
Q05 54 99 76.8
Q06 77 95 86.8
Q07 58 100 78.8
WS 5 5 5
This says that for each group with the same value for the 3rd column (-g3, plus -s to sort the input (A requirement of the tool)) of simple CSV input (-t,) with a header (--header-in), display the minimum, maximum, and mean of the 4th column. It's all given a new header and piped to tr to turn the commas into tabs.

Your code works as-is with GNU awk. However, running it with the -t option to warn about non-portable constructs gives:
awk: foo.awk:6: warning: old awk does not support the keyword `in' except after `for'
awk: foo.awk:2: warning: old awk does not support regexps as value of `FS'
And running the script with a different implementation of awk (mawk in my case) does give 0's for the Low column. So, some tweaks to the script:
BEGIN {
FS=","
}
FNR>1 {
min[$3]=(cnt[$3] == 0 || min[$3]> $4 )? $4 : min[$3]
max[$3]=(max[$3]> $4)? max[$3] : $4
cnt[$3]++
sum[$3]+=$4
}
END {
print "Name\tLow\tHigh\tAverage"
PROCINFO["sorted_in"] = "#ind_str_asc" # gawk-ism for pretty output; ignored on other awks
for (i in cnt)
printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])
}
and it works as expected on that other awk too.
The changes:
Using a simple comma as the field separator instead of a regex.
Changing the min conditional to setting to the current value on the first time this assignment has been seen by checking to see if cnt[$3] is equal to 0 (Which it will be the first time because that value is incremented in a later line), or if the current min is greater than this value.

another similar approach
$ awk -F, 'NR==1 {print "name","low","high","average"; next}
{k=$3; sum[k]+=$4; count[k]++}
!(k in min) {min[k]=max[k]=$4}
min[k]>$4 {min[k]=$4}
max[k]<$4 {max[k]=$4}
END {for(k in min) print k,min[k],max[k],sum[k]/count[k]}' file |
column -t
name low high average
Q06 77 95 86.8
L05 0 96 46.6
WS 5 5 5
Q07 58 100 78.8
L06 0 99 61
L07 0 86 64.8
H01 19 90 47.8
H02 47 95 74.6
H03 70 95 82.2

Calculate Average from a file and show records of students whose average is more than 90

I have a file named input.txt which contains students data in StudentName # ClassName #SchoolName # Subject1Marks # Subject2Marks format.
Shriii#First#ADCET#95#90
Chaitraliii#Second#ADCET#80#75
Shubhangi#First#ADCET#75#70
Tushar#Second#RIT#80#79
Prathamesh#First#RIT#88#63
The output should contains record of students whose average is more than 90. I have to show the average in new column.
The expected output is Shriii|First|ADCET|95|90|92.5 I tried various ways but was not able to generate expected output. I tried Link1 Link2

awk to the rescue!
$ awk -F# -v OFS='|' '{$(NF+1)=($(NF-1)+$NF)/2}1' file
Shriii|First|ADCET|95|90|92.5
Chaitraliii|Second|ADCET|80|75|77.5
Shubhangi|First|ADCET|75|70|72.5
Tushar|Second|RIT|80|79|79.5
Prathamesh|First|RIT|88|63|75.5
Sukrut|Second|KIT|91|90|90.5
or pipe to column to get pretty output
... | column -ts'|'
Shriii First ADCET 95 90 92.5
Chaitraliii Second ADCET 80 75 77.5
Shubhangi First ADCET 75 70 72.5
Tushar Second RIT 80 79 79.5
Prathamesh First RIT 88 63 75.5
Sukrut Second KIT 91 90 90.5

SAS column input skipping lines

I was looking at the SAS base exam questions and I came across this particular one:
data test;
input employee_name $ 1-4;
if employee_name = ‘Ruth’ then input idnum 10-11;
else input age 7-8;
datalines;
Ruth 39 11
Jose 32 22
Sue 30 33
John 40 44
;
run;
At first I thought the IDNum when the employee name is "Ruth" would be 11, but it seems it skips the Ruth row and jumps down to the second row, and inputs 22 instead. And why is Sue's age 40 instead of 30? Can someone explain why this is? Thank you.
Here is the result:
Name IDnum Age
Ruth 22
Sue 40

Without a trailing # or ## at the end of an input statement, any subsequent input statements in the same data step will skip the rest of the current line start reading from the beginning of the next line.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk Standard Deviation - awk

Have a look at gawk's printf documentation. The following will illustrate what is happening: $ awk 'BEGIN { printf "%%d:%d %%i:%i %%f:%f %%s:%s\n", 3.8, 3.8, 3.8, 3.8}' %d:3 %i:3 %f:3.800000 %s:3.8 So, %i and %d floor the float. You can specify how the number look like in %f with some modifiers.

Related

How to get a specific column from a list with awk

awk printing with and without {}?

Awk script displaying incorrect output

Calculate Average from a file and show records of students whose average is more than 90

SAS column input skipping lines

Categories

Resources