Rounding floating number using AWK - awk

I have a file b.xyz as,
-19.794325 -23.350704 -9.552335
-20.313872 -23.948248 -8.924463
-18.810708 -23.571757 -9.494047
-20.048543 -23.660052 -10.478968
I want to limit each of the entries to three decimal digits.
I tried this one
awk '{ $1=sprintf("%.3f",$1)} {$2=sprintf("%.3f",$2)} {$3=sprintf("%.3f",$3)} {print $1, $2, $3}' b.xyz
it works for three columns, but how to expand it to apply for n/all columns?

If you will always have three fields, then you can use:
$ awk '{printf "%.3f %.3f %.3f\n", $1, $2, $3}' file
-19.794 -23.351 -9.552
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
For an undefined number of lines, you can do:
$ awk '{for (i=1; i<=NF; i++) printf "%.3f%s", $i, (i==NF?"\n":" ")}' file
-19.794 -23.351 -9.552
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
It will loop through all the fields and print them. (i==NF?"\n":" ") prints a new line when the last item is reached.
Or even (thanks Jotne!):
awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?RS:FS)}' file
Example
$ cat a
-19.794325 -23.350704 -9.552335 2.13423 23 23223.23 23.23442
-20.313872 -23.948248 -8.924463
-18.810708 -23.571757 -9.494047
-20.048543 -23.660052 -10.478968
$ awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?"\n":" ")}' a
-19.794 -23.351 -9.552 2.134 23.000 23223.230 23.234
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
$ awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?RS:FS)}' a
-19.794 -23.351 -9.552 2.134 23.000 23223.230 23.234
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479

Related

An awk script without hard-coded field information

We have the following awk script that extracts fields 6, 7 and 14 from a CSV file:
awk -F, '{for (i=1; i<=NF; i++) if (i in [6, 7, 14]) printf "%s,", $i; print ""}' $input_file
The script works beautifully, except that the information about the fields of interest is hard-coded. We would like to be able to pass this information as a single command line argument (or even a series of command line arguments), to make the script more versatile. We tried a few things, including the following, but we keep getting a syntax error:
awk -F, '{for (i=1; i<=NF; i++) if (i in ['$2']) printf "%s,", $i; print ""}' $input_file
awk -F, '{for (i=1; i<=NF; i++) if (i in [6, 7, 14]) printf "%s,", $i; print ""}' $input_file
is not valid awk syntax which is one reason why
awk -F, '{for (i=1; i<=NF; i++) if (i in ['$2']) printf "%s,", $i; print ""}' $input_file
or any variation of it would also give you a syntax error.
This is probably what you're trying to do:
awk -F, -v vals="$2" '
BEGIN { split(vals,tmp); for (i in tmp) arr[tmp[i]] }
{ for (i=1; i<=NF; i++) if (i in arr) printf "%s,", $i; print "" }
' "$input_file"
assuming $2 contains a comma-separated string like 6,7,14 and your input file is a CSV with unquoted fields.
That would still print a trailing , on each line and looping through all fields and discarding the ones you don't want for every input line is an inefficient way to do what you're trying to do. This would solve both of those additional problems:
awk -v vals="$2" '
BEGIN { FS=OFS=","; n=split(vals,arr) }
{ for (i=1; i<=n; i++) printf "%s%s", $(arr[i]), (i<n ? OFS : ORS) }
' "$input_file"
Another option is to not use (g)awk, and use cut:
cut -d "," -f "6,7,14" inputfile
(or: a="6,7,14"; cut -d "," -f "$a" inputfile)
When input contains:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,2,25,26
output should look like:
f,g,n
6,7,14

awk: transpose column header to first field of row

My input looks like this:
A|B|C
1|2|3
4|5|6
Using awk, I am trying to get:
A|1
B|2
C|3
A|4
B|5
C|6
My code:
gawk '
BEGIN{FS=OFS="|"}
NR==1{
for(i=1; i<=NF; i++){
x_i=$i
}
}
NR>1{
for(i=1; i<=NF; i++){
print x_i FS $i
}
}' input
But it keeps only the last iteration of the NR==1 block even if I use the same loop in NR>1 bock:
C|1
C|2
C|3
C|4
C|5
C|6
Any trick?
EDIT
Thanks to Jose, I needed to change x_i by x[i].
What about if using the same input, I would need to output:
A;B;C|1|2|3
A;B;C|4|5|6
$ awk 'BEGIN{FS=OFS="|"} NR==1{split($0,h);next} {for (i=1;i<=NF;i++) print h[i], $i}' file
A|1
B|2
C|3
A|4
B|5
C|6
$ awk 'BEGIN{FS=OFS="|"} NR==1{gsub(/\|/,";");h=$0;next} {print h, $0}' file
A;B;C|1|2|3
A;B;C|4|5|6
Read Effective Awk Programming, 4th Edition, by Arnold Robbins.
you can try,
awk 'BEGIN{FS=OFS="|"}
NR==1{for(i=1; i<=NF; ++i) d[i]=$i; next}
{for(i=1; i<=NF; ++i) print d[i], $i}
' input
you get
A|1
B|2
C|3
A|4
B|5
C|6
Important Note
your logic is correct, only x[i] instead of x_i
gawk '
BEGIN{FS=OFS="|"}
NR==1{
for(i=1; i<=NF; i++){
x[i]=$i
}
}
NR>1{
for(i=1; i<=NF; i++){
print x[i] FS $i
}
}' input
Here is another using split and for:
$ awk 'NR==1 { split($0,a,"|") }
NR>1 { n=split($0,b,"|");
for(i=1;i<=n;i++)
print a[i] "|" b[i] }' file
A|1
B|2
C|3
A|4
B|5
C|6

Calculate average of each column in a file

I have a text file with n number of rows (separated by commas) and columns and I want to find average of each column, excluding empty field.
A sample input looks like:
1,2,3
4,,6
,7,
The desired output is:
2.5, 4.5, 4.5
I tried with
awk -F',' '{ for(i=1;i<=NF;i++) sum[i]=sum[i]+$i;if(max < NF)max=NF;};END { for(j=1;j<=max;j++) printf "%d\t",sum[j]/max;}' input
But it treats consecutive delimiters as one and mixing columns.
Any help is much appreciated.
You can use this one-liner:
$ awk -F, '{for(i=1; i<=NF; i++) {a[i]+=$i; if($i!="") b[i]++}}; END {for(i=1; i<=NF; i++) printf "%s%s", a[i]/b[i], (i==NF?ORS:OFS)}' foo
2.5 4.5 4.5
Otherwise, you can save this in a file script.awk and run awk -f script.awk your_file:
{
for(i=1; i<=NF; i++) {
a[i]+=$i
if($i!="")
b[i]++}
}
END {
for(i=1; i<=NF; i++)
printf "%s%s", a[i]/b[i], (i==NF?ORS:OFS)
}

AWK command to simulate full outer join and then compare

Hello Guys I need a help in building an awk command which can simulate full outer join and then compare values
Say
cat File1
1|A|B
2|C|D
3|E|F
cat File2
1|A|X
2|C|D
3|Z|F
Assumptions
first column in both the files is the key field so no duplicates
both the files are expected to have same structure
No limit on the number of fields
Now, If I run the awk command
awk -F'|' ........... File1 File2 > output
Output format
<Key>|<File1.column1>|<File2.column1>|<Matched/Mismatched>|<File1.column2>|<File2.column2>|<Matched/Mismatched>|<File1.column3>|<File2.column3>|<Matched/Mismatched>
cat output
1|A|A|MATCHED|B|X|MISMATCHED
2|C|C|MATCHED|D|D|MATCHED
3|E|Z|MISMATCHED|F|F|MATCHED
Thank You
$ awk -v OFS=\| -F\| 'NR==FNR{for(i=2;i<=NF;i++)a[$1][i]=$i;next}{printf "%s",$1;for(i=2;i<=NF;i++){printf"%s|%s|%s",a[$1][i],$i,a[$1][i]==$i?"matched":"mismatched"}printf"\n"}' file1 file2
1|A|A|matched|B|X|mismatched
2|C|C|matched|D|D|matched
3|E|Z|mismatched|F|F|matched
BEGIN {
OFS="|"; FS="|"
}
NR==FNR { # for the first file
for(i=2;i<=NF;i++) # fill array with "non-key" fields
a[$1][i]=$i;next # and use the "key" field as an index
}
{
printf "%s",$1
for(i=2;i<=NF;i++) { # use the key field to match and print
printf"|%s|%s|%s",a[$1][i],$i,a[$1][i]==$i?"matched":"mismatched"
}
printf"\n" # sugar on the top
}
perhaps easier with join assist
$ join -t'|' file1 file2 |
awk -F'|' -v OFS='|' '{n="MIS"; m="MATCHED";
m1=($2!=$4?n:"")m;
m2=($3!=$5?n:"")m;
print $1,$2,$4,m1,$3,$5,m2}'
1|A|A|MATCHED|B|X|MISMATCHED
2|C|C|MATCHED|D|D|MATCHED
3|E|Z|MISMATCHED|F|F|MATCHED
for unspecified number of fields need more awk
$ join -t'|' file1 file2 |
awk -F'|' '{c=NF/2; printf "%s", $1;
for(i=2;i<=c+1;i++) printf "|%s|%s|%s", $i,$(i+c),($i!=$(i+c)?"MIS":"")"MATCHED";
print ""}'
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==FNR {
for (i=2; i<=NF; i++) {
a[$1,i] = $i
}
next
}
{
printf "%s%s", $1, OFS
for (i=2; i<=NF; i++) {
printf "%s%s%s%s%s%s", a[$1,i], OFS, $i, OFS, (a[$1,i]==$i ? "" : "MIS") "MATCHED", (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file1 file2
1|A|A|MATCHED|B|X|MISMATCHED
2|C|C|MATCHED|D|D|MATCHED
3|E|Z|MISMATCHED|F|F|MATCHED

Subtracting every column in a row for similar row numbers in two separate files

If I have two files:
file1
2,3,1,4,5,2,1
1,2,4,6,3,1,3
1,2,1,1,1,1,1
file2
1,1,1,1,1,1,1
1,1,1,1,1,1,1
1,1,1,1,1,1,1
I want to subtract all the numbers from the same row numbers of each file. So all the numbers of row 1 from file 1 minus all the numbers of row 1 of file 2 and so forth.
Output:
1,2,0,3,4,1,0
0,1,3,5,2,0,2
0,1,0,0,0,0,0
$ paste -d, file1 file2 | awk -F, '{n=NF/2; s=""; for (i=1;i<=n;i++) {printf "%s%s", s, $i-$(i+n); s=",";}; print ""}'
1,2,0,3,4,1,0
0,1,3,5,2,0,2
0,1,0,0,0,0,0
How it works
paste -d, file1 file2
This combines the files, row by row.
n=NF/2; s=""; for (i=1;i<=n;i++) {printf "%s%s", s, $i-$(i+n); s=",";}
This subtracts and prints.
print ""
This prints a newline character at the end of each line.
You could use two dimensional arrays with GNU Awk:
$ cat subtract_fields.awk
BEGIN
{
FS=OFS=","
}
{
if(FNR==NR) {
for(i=1; i<=NF; i++)
a[FNR][i]=$i
} else {
for(i=1; i<=NF; i++)
$i=a[FNR][i]-$i
delete a[FNR]
print
}
}
$ awk -f subtract_fields.awk file1 file2
1,2,0,3,4,1,0
0,1,3,5,2,0,2
0,1,0,0,0,0,0