Why is awk's OFS only applied to the first line?

Why is awk's OFS only applied to the first line? - awk

I'm doing some simple calculations on a file using awk, but cannot get the output formatting right. The OFS is for some reason only applied to the first line (i.e. only within the BEGIN block), and for other rows a single space is inserted between fields.
Input:
title c1 c2 c3 n
AA 14 6 3 40
BB 8 2 2 38
Oneliner:
cat file.txt | awk -F'\t' 'BEGIN {OFS="\t"; print "Title","Freq1","Freq2","Freq3","Total"}; NR>1{printf "%s %.3f %.3f %.3f %d\n", $1, $2/$5, $3/$5, $4/$5, $5;}' > file2.txt
I've tried removing the header from BEGIN but this does not make a difference, and neither does BEGIN{FS="\t";OFS="\t";...}. I'm using awk in cygwin.

Since you're using printf you are actively avoiding OFS entirely. If you want to incorporate OFS, it gets ugly:
NR>1 {printf "%s%s%.3f%s%.3f%s%.3f%s%d\n", $1, OFS, $2/$5, OFS, $3/$5, OFS, $4/$5, OFS, $5}
Or, don't use printf:
NR>1 {print $1, sprintf("%.3f",$2/$5), sprintf("%.3f",$3/$5), sprintf("%.3f",$4/$5), $5}

Related

Trying to read data from two files, and subtract values from both files using awk

I have two files
0.975301988947238963 1.75276754663189283 2.00584
0.0457467532388459441 1.21307648993841410 1.21394
-0.664000617674924687 1.57872850852906366 1.71268
-0.812129324498058969 4.86617859243825635 4.93348
and
1.98005959631337536 -3.78935155011290536 4.27549
-1.04468782080821154 4.99192849476267053 5.10007
-1.47203672235857397 -3.15493073343947694 3.48145
2.68001948430755244 -0.0630730371855307004 2.68076
I want to subtract the two values in column 3 of each file.
My first awk statement was
**awk
'BEGIN {print "Test"} FNR>1 && FNR==NR { r[$1]=$3; next} FNR>1 { print $3, r[$1], (r[$1]-$3)}' zzz0.dat zzz1.dat**
Test
5.10007 -5.10007
3.48145 -3.48145
2.68076 -2.68076
This suggests it does not recognize r[$1]=$3
I created an additional column xyz by
**awk 'NR==1{$(NF+1)="xyz"} NR>1{$(NF+1)="xyz"}1' zzz0.dat**
then
awk 'BEGIN {print "Test"} FNR>1 && FNR==NR { xyz[$4]=$3; next} FNR>1 { print $3, xyz[$4], (xyz[$4]-$3)}' zzz00.dat zzz11.dat
Test
5.10007 4.93348 -0.16659
3.48145 4.93348 1.45203
2.68076 4.93348 2.25272
This now shows three columns, but xyz[$4] is printing only the value in the last column, instead of creating a array.
My real files have thousands of lines. How can I resolve this problem ?

You can do it relatively easily using a numeric index for your array. For example:
awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
That way you preserve the ordering of the records across files. Without a numeric index, the arrays are associative and there is no specific ordering preserved.
Example Use/Output
With your files in file1 and file2 respectively, you would have:
$ awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
2.005840 - 4.275490 = -2.269650
1.213940 - 5.100070 = -3.886130
1.712680 - 3.481450 = -1.768770
4.933480 - 2.680760 = 2.252720
Let me know if that is what you intended or if you have any further questions. If I missed your intent, drop a comment and I will help further.

if the records are aligned in both files, easiest is
$ paste file1 file2 | awk '{print $3,$6,$3-$6}'
2.00584 4.27549 -2.26965
1.21394 5.10007 -3.88613
1.71268 3.48145 -1.76877
4.93348 2.68076 2.25272
if you're only interested in the difference, change to print $3-$6.

Understanding how OFS works in AWK

This is a follow-up to my question to understand more about the OFS in AWK.
My understanding is, set it once in the beginning and it will be used in "print" to separate the fields. However, it didn't work as expected, as explained in my original question.
My File: someone.txt
LN_A,FN_A<aa#xyz.com>;
LN_B,FN_B<bb#xyz.com>;
Expected output:
FN_A,LN_A,aa
FN_B,LN_B,bb
I have tried the following:
awk -F'[,<#]' -v OFS=',' '{print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3 {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3; {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $0}' someone.txt
Finally, I managed to get the required output with the following:
awk -F'[,<#]' '{print $2 "," $1 "," $3}' someone.txt

Consider these cases:
a) $ echo '1 2 3' | awk '{print}'
1 2 3
b) $ echo '1 2 3' | awk '{print $1, $2, $3}'
1 2 3
c) $ echo '1 2 3' | awk -v OFS=',' '{print}'
1 2 3
d) $ echo '1 2 3' | awk -v OFS=',' '{print $1, $2, $3}'
1,2,3
e) $ echo '1 2 3' | awk -v OFS=',' '{$1=$1; print}'
1,2,3
The above show OFS being used in "b" and "d" (when individual fields are being printed in a comma-separated list) and in "e" (when the record $0 is being reconstructed as a result of a value being assigned to a field before the record is printed).
Those are the only 2 times when OFS is used implicitly - when printing a comma-separated list of values and when reconstructing the record.
When you print the record (e.g. by print or print $0) as in "a" and "c" above or print any other string you are not using OFS. OFS may have been used earlier to reconstruct the record as in "e" above but the act of printing anything that's not a comma-separated list is not using OFS, it's just printing any old string which just happens to be $0 in this case.
Note:
Explicitly changing a field reconstructs $0 from the existing fields using OFS between the fields, it does not resplit $0 into fields again so FS is not used in this process. So $1=$1 or sub(/1/,2,$1) uses OFS but not FS.
Explicitly changing $0 (i.e. not implicitly as a result of 1 above) resplits $0 into fields using FS as the separator, it does not use OFS in any way. So $0=$0 or sub(/1/,2) uses FS but not OFS.
Understanding how FS and OFS work together and how they effect assignments to fields and $0 is very important. If you can explain this behavior then you've got it:
f) $ echo 'a b' | awk -v OFS=',' '{print NF, $0, $1, $2}'
2,a b,a,b
g) $ echo 'a b' | awk -v OFS=',' '{$1=$1; print NF, $0, $1, $2}'
2,a,b,a,b
h) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; print NF, $0, $1, $2}'
1,a,b,a,b,
i) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; print NF, $0, $1, $2}'
1,a,b,a,b,
j) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; print NF, $0, $1, $2}'
1,a,b,a,b,
k) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; $0=$0; print NF, $0, $1, $2}'
2,a,b,a,b
If not then feel free to ask questions.

It is simple, you have set the OFS="," in beginning of your awk statement but you are simply printing the fields(NOTE: without editing the line OR without mentioning field separator(using comma etc)) in that case OFS will not come in picture that is why your output is NOT having anything like separator.
awk -F'[,<#]' -v OFS=',' '{print $2,$1,$3}' Input_fie
If you use above command where I have mentioned , between printing fields you will see you are getting OFS now and this is how it works.
Or in case you want to see use of OFS you could use this(though above solution is BEST one but for your understanding I am adding this one too).
awk -F'[,<#]' -v OFS=',' '{$0=$2 OFS $1 OFS $3} 1' Input_file
Example to understand OFS by printing whole line(s): Let us understand it more clearly by printing whole line with OFS and withoutOFS` effect.
Let us run this code:
awk -F'[,<#]' -v OFS=',' 'FNR==1{$1=$1} 1' Input_file
What it does is when line number 1 is there then I am resetting $1's value as mentioned above to let OFS come into picture so that new value of OFS comes(off course wherever field separator was picked it will place OFS value there). So it will only be done for first line and REST of the lines nothing should happen. Let us see what output comes now?
LN_A,FN_A,aa,xyz.com>;
LN_B,FN_B<bb#xyz.com>;
You see the difference? See first line is having , in output and 2nd line is printing as it is, why because in only 1st line we have edited the first field so OFS came into picture.

As I just found an unused copy of Aho, Kernighan, Weinberger: The AWK Programming language from 1988, I(t)'ll take you to the source (pages 35-36):
"Field Variables. The fields of the current input line are called $1, $2,
through $NF; $0 refers to the whole line. Fields share the properties of other
variables — they may be used in arithmetic or string operations, and may be
assigned to. - -
One can assign a new string to a field:
BEGIN { FS = OFS = "\t" }
$4 == "North America" { $4 = "NA" }
$4 == "South America" { $4 = "SA" }
{ print }
In this program, the BEGIN action sets FS, the variable that controls the input
field separator, and OFS, the output field separator, both to a tab. The print
statement in the fourth line prints the value of $0 after it has been modified by
previous assignments. This is important: when $0 is changed by assignment or
substitution, $1, $2, etc., and NF will be recomputed; likewise, when one of $1, $2, etc., is changed, $0 is reconstructed using OFS to separate fields."

awk to compare two file by identifier & output in a specific format

I have 2 large files i need to compare all pipe delimited
file 1
a||d||f||a
1||2||3||4
file 2
a||d||f||a
1||1||3||4
1||2||r||f
Now I want to compare the files & print accordingly such as if any update found in file 2 will be printed as updated_value#oldvalue & any new line added to file 2 will also be updated accordingly.
So the desired output is: (only the updated & new data)
1||1#2||3||4
1||2||r||f
what I have tried so far is to get the separated changed values:
awk -F '[||]+' 'NR==FNR{for(i=1;i<=NF;i++)a[NR,i]=$i;next}{for(i=1;i<=NF;i++)if(a[FNR,i]!=$i)print $i"#"a[FNR,i]}' file1 file2 >output
But I want to print the whole line. How can I achieve that??

I would say:
awk 'BEGIN{FS=OFS="|"}
FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next}
{for (i=1; i<=NF; i+=2)
if (a[FNR,i] && a[FNR,i]!=$i)
$i=$i"#"a[FNR,i]
}1' f1 f2
This stores the file1 in a matrix a[line number, column]. Then, it compares its values with its correspondence in file2.
Note I am using the field separator | instead of || and looping in steps of two to use the proper data. This is because I for example did gawk -F'||' '{print NF}' f1 and got just 1, meaning that FS wasn't well understood. Will be grateful if someone points the error here!
Test
$ awk 'BEGIN{FS=OFS="|"} FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next} {for (i=1; i<=NF; i+=2) if (a[FNR,i] && a[FNR,i]!=$i) $i=$i"#"a[FNR,i]}1' f1 f2
a||d||f||b#a
1||1#2||3||4
1||2||r||f

Internal piping with awk

Let say i have input line:
input:
{x:y} abc det uyt llu
how to process it, to get expected output:
output:
{x:y} abc%det%uyt%llu
Question is how to concatanate fields 2-end of line, and in that string change space with %
where separator is space
I need fixed first part {x:y} and implementing pipe for fields 2-end of line

Here is another awk
awk '{$1=$1;sub(/%/," ")}1' OFS="%" file
echo '{x:y} abc det uyt llu' | awk '{$1=$1;sub(/%/," ")}1' OFS="%"
{x:y} abc%det%uyt%llu
This change all space to %, using OFS and $1=$1, then change the first % to space.

You can use this awk:
s='{x:y} abc det uyt llu'
awk '{printf "%s%s", $1, OFS; for (i=2; i<=NF; i++) printf "%s%s", $i, (i==NF)?RS:"%"}' <<< "$s"
{x:y} abc%det%uyt%llu
Another awk:
awk '{printf "%s%s", $1, OFS; OFS="%"; $1=""; print substr($0, 2)}' <<< "$s"
{x:y} abc%det%uyt%llu

group of columns in awk

The following awk statement is working as expected.
awk '{print $1, $2, $3}' test.txt
But how do I say that I need all the columns after the second column?
awk '{print $1, $2, $3 to $NF}' test.txt
I need all columns from third column till end of that line. There can be 2 to 10 columns and all are considered as a part of the last column.

if you just want $3-$NF fields, standard way would be loop (for/while)
but for your requirement, you could:
awk '{$1=$2="";}sub("^ *","")'
for example:
kent$ seq -s' ' 10|awk '{$1=$2="";}sub("^ *","")'
3 4 5 6 7 8 9 10
if you want to "group" 100 fields into 3 groups: 1,2, 3-100:
awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
same example:
kent$ seq -s' ' 10|awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
1 2 345678910
hope it is what you want.

The intuitive way.
awk 'BEGIN{ORS=""} {for(i=3; i<=NF; i++) if(i != NF){print $i " "} else {print $i "\n"}}' test.txt

Some more:
awk '{$1=$2=x; $0=$0; $1=$1}1' file
awk '{$1=$1; sub($1 FS $2 FS,x)}1' file
To keep spacing in tact:
awk 'sub($1 "[ \t]*" $2 "[ \t]*",x)' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why is awk's OFS only applied to the first line? - awk

Related

Trying to read data from two files, and subtract values from both files using awk

Understanding how OFS works in AWK

awk to compare two file by identifier & output in a specific format

Internal piping with awk

group of columns in awk

Categories

Resources