What is the meaning of $0 = $0 in Awk? - awk

While going through a piece of code I saw the below command:
grep "r" temp | awk '{FS=","; $0=$0} { print $1,$3}'
temp file contain the pattern like:
1. r,1,5
2. r,4,5
3. ...
I could not understand what does the statement $0=$0 mean in awk command.
Can anyone explain what does it mean?

When you do $1=$1 (or any other assignment to a field) it causes record recompilation where $0 is rebuilt with every FS replaced with OFS but it does not change NF (unless there was no $1 previously and then NF would change from 0 to 1) or reevaluate the record in any other way.
When you do $0=$0 it causes field splitting where NF, $1, $2, etc. are repopulated based on the current value of FS but it does not change the FSs to OFSs or modify $0 in any other way.
Look:
$ echo 'a-b-c' |
awk -F'-+' -v OFS='-' '
function p() { printf "%d) %d: $0=%s, $2=%s\n", ++c,NF,$0,$2 }
{ p(); $2=""; p(); $1=$1; p(); $0=$0; p(); $1=$1; p() }
'
1) 3: $0=a-b-c, $2=b
2) 3: $0=a--c, $2=
3) 3: $0=a--c, $2=
4) 2: $0=a--c, $2=c
5) 2: $0=a-c, $2=c
Note in the above that even though setting $2 to null resulted in 2 consecutive -s and the FS of -+ means that 2 -s are a single separator, they are not treated as such until $0=$0 causes the record to be re-split into fields as shown in output step 4.
The code you have:
awk '{FS=","; $0=$0}'
is using $0=$0 as a cludge to work around the fact that it's not setting FS until AFTER the first record has been read and split into fields:
$ printf 'a,b\nc,d\n' | awk '{print NF, $1}'
1 a,b
1 c,d
$ printf 'a,b\nc,d\n' | awk '{FS=","; print NF, $1}'
1 a,b
2 c
$ printf 'a,b\nc,d\n' | awk '{FS=","; $0=$0; print NF, $1}'
2 a
2 c
The correct solution, of course, is instead to simply set FS BEFORE The first record is read:
$ printf 'a,b\nc,d\n' | awk -F, '{print NF, $1}'
2 a
2 c
To be clear - assigning any value to $0 causes field splitting, it does not cause record recompilation while assigning any value to any field ($1, etc.) causes record recompilation but not field splitting:
$ echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$2=$2}1'
a#b#c
$ echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$0=$0}1'
a-b-c

$0 = $0 is used most often to rebuild the field separation evaluation of a modified entry. Ex: adding a field will change $NF after $0 = $0 where it stay as original (at entry of the line).
in this case, it change every line the field separator by , and (see #EdMorton comment below for strike) reparse the line with current FS info where a awk -F ',' { print $1 "," $3 }' is a lot better coding for the same idea, taking the field separator at begining for all lines (in this case, could be different if separator is modified during process depernding by example of previous line content)
ex:
echo "foo;bar" | awk '{print NF}{FS=";"; print NF}{$0=$0;print NF}'
1
1
2
based on #EdMorton comment and related post (What is the meaning of $0 = $0 in Awk)
echo "a-b-c" |\
awk ' BEGIN{ FS="-+"; OFS="-"}
function p(Ref) { printf "%12s) NF=%d $0=%s, $2=%s\n", Ref,NF,$0,$2 }
{
p("Org")
$2="-"; p( "S2=-")
$1=$1 ; p( "$1=$1")
$2=$2 ; p( "$2=$2")
$0=$0 ; p( "$0=$0")
$2=$2 ; p( "$2=$2")
$3=$3 ; p( "$3=$3")
$1=$1 ; p( "$1=$1")
} '
Org) NF=3 $0=a-b-c, $2=b
S2=-) NF=3 $0=a---c, $2=-
$1=$1) NF=3 $0=a---c, $2=-
$2=$2) NF=3 $0=a---c, $2=-
$0=$0) NF=2 $0=a---c, $2=c
$2=$2) NF=2 $0=a-c, $2=c
$3=$3) NF=3 $0=a-c-, $2=c
$1=$1) NF=3 $0=a-c-, $2=c

$0=$0 is for re-evaluate the fields
For example
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{print $2}'
1:2
2|3
EOF
# Same with $0=$0, it will force awk to have the $0 reevaluated
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{$0=$0;print $2}'
1:2
2|3
EOF
2
3
# NF - gives you the total number of fields in a record
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{print NF}'
1:2
2|3
EOF
1
1
# When we Force to re-evaluate the fields, we get correct 2 fields
akshay#db-3325:~$ cat <<EOF | awk '/:/{FS=":"}/\|/{FS="|"}{$0=$0; print NF}'
1:2
2|3
EOF
2
2

>>> echo 'a-b-c' | awk -F'-+' -v OFS='#' '{$2=$2}1'
>>> a#b#c
This can be slightly simplified to
mawk 'BEGIN { FS="[-]+"; OFS = "#"; } ($2=$2)'
Rationale being the boolean test that comes afterwards will evaluate to true upon the assignment, so that itself is sufficient to re-gen the fields in OFS and print it.

Related

Understanding how OFS works in AWK

This is a follow-up to my question to understand more about the OFS in AWK.
My understanding is, set it once in the beginning and it will be used in "print" to separate the fields. However, it didn't work as expected, as explained in my original question.
My File: someone.txt
LN_A,FN_A<aa#xyz.com>;
LN_B,FN_B<bb#xyz.com>;
Expected output:
FN_A,LN_A,aa
FN_B,LN_B,bb
I have tried the following:
awk -F'[,<#]' -v OFS=',' '{print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3 {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' 'NF=3; {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $2 $1 $3}' someone.txt
awk -F'[,<#]' -v OFS=',' '{$1=$1} {print $0}' someone.txt
Finally, I managed to get the required output with the following:
awk -F'[,<#]' '{print $2 "," $1 "," $3}' someone.txt
Consider these cases:
a) $ echo '1 2 3' | awk '{print}'
1 2 3
b) $ echo '1 2 3' | awk '{print $1, $2, $3}'
1 2 3
c) $ echo '1 2 3' | awk -v OFS=',' '{print}'
1 2 3
d) $ echo '1 2 3' | awk -v OFS=',' '{print $1, $2, $3}'
1,2,3
e) $ echo '1 2 3' | awk -v OFS=',' '{$1=$1; print}'
1,2,3
The above show OFS being used in "b" and "d" (when individual fields are being printed in a comma-separated list) and in "e" (when the record $0 is being reconstructed as a result of a value being assigned to a field before the record is printed).
Those are the only 2 times when OFS is used implicitly - when printing a comma-separated list of values and when reconstructing the record.
When you print the record (e.g. by print or print $0) as in "a" and "c" above or print any other string you are not using OFS. OFS may have been used earlier to reconstruct the record as in "e" above but the act of printing anything that's not a comma-separated list is not using OFS, it's just printing any old string which just happens to be $0 in this case.
Note:
Explicitly changing a field reconstructs $0 from the existing fields using OFS between the fields, it does not resplit $0 into fields again so FS is not used in this process. So $1=$1 or sub(/1/,2,$1) uses OFS but not FS.
Explicitly changing $0 (i.e. not implicitly as a result of 1 above) resplits $0 into fields using FS as the separator, it does not use OFS in any way. So $0=$0 or sub(/1/,2) uses FS but not OFS.
Understanding how FS and OFS work together and how they effect assignments to fields and $0 is very important. If you can explain this behavior then you've got it:
f) $ echo 'a b' | awk -v OFS=',' '{print NF, $0, $1, $2}'
2,a b,a,b
g) $ echo 'a b' | awk -v OFS=',' '{$1=$1; print NF, $0, $1, $2}'
2,a,b,a,b
h) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; print NF, $0, $1, $2}'
1,a,b,a,b,
i) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; print NF, $0, $1, $2}'
1,a,b,a,b,
j) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; print NF, $0, $1, $2}'
1,a,b,a,b,
k) $ echo 'a b' | awk -v OFS=',' '{$1=$1; $0=$0; FS=OFS; $1=$1; $0=$0; print NF, $0, $1, $2}'
2,a,b,a,b
If not then feel free to ask questions.
It is simple, you have set the OFS="," in beginning of your awk statement but you are simply printing the fields(NOTE: without editing the line OR without mentioning field separator(using comma etc)) in that case OFS will not come in picture that is why your output is NOT having anything like separator.
awk -F'[,<#]' -v OFS=',' '{print $2,$1,$3}' Input_fie
If you use above command where I have mentioned , between printing fields you will see you are getting OFS now and this is how it works.
Or in case you want to see use of OFS you could use this(though above solution is BEST one but for your understanding I am adding this one too).
awk -F'[,<#]' -v OFS=',' '{$0=$2 OFS $1 OFS $3} 1' Input_file
Example to understand OFS by printing whole line(s): Let us understand it more clearly by printing whole line with OFS and withoutOFS` effect.
Let us run this code:
awk -F'[,<#]' -v OFS=',' 'FNR==1{$1=$1} 1' Input_file
What it does is when line number 1 is there then I am resetting $1's value as mentioned above to let OFS come into picture so that new value of OFS comes(off course wherever field separator was picked it will place OFS value there). So it will only be done for first line and REST of the lines nothing should happen. Let us see what output comes now?
LN_A,FN_A,aa,xyz.com>;
LN_B,FN_B<bb#xyz.com>;
You see the difference? See first line is having , in output and 2nd line is printing as it is, why because in only 1st line we have edited the first field so OFS came into picture.
As I just found an unused copy of Aho, Kernighan, Weinberger: The AWK Programming language from 1988, I(t)'ll take you to the source (pages 35-36):
"Field Variables. The fields of the current input line are called $1, $2,
through $NF; $0 refers to the whole line. Fields share the properties of other
variables — they may be used in arithmetic or string operations, and may be
assigned to. - -
One can assign a new string to a field:
BEGIN { FS = OFS = "\t" }
$4 == "North America" { $4 = "NA" }
$4 == "South America" { $4 = "SA" }
{ print }
In this program, the BEGIN action sets FS, the variable that controls the input
field separator, and OFS, the output field separator, both to a tab. The print
statement in the fourth line prints the value of $0 after it has been modified by
previous assignments. This is important: when $0 is changed by assignment or
substitution, $1, $2, etc., and NF will be recomputed; likewise, when one of $1, $2, etc., is changed, $0 is reconstructed using OFS to separate fields."

awk multiple row and printing results

I would like to print some specific parts of a results with awk, after multiple pattern selection.
What I have is (filetest):
A : 1
B : 2
I expect to have:
1 - B : 2
So, only the result of the first row, then the whole second row.
The dash was added by me.
I have this:
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "-" }' filetest
Result:
1 -2 -
And I cannot get the full second row, without failing in showing just the result of the first one
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "$1" }' filetest
Result:
1 - A 2 - B
Is there any way to print in the same line, exactly the column/row that I need with awk?
In my case R1C2 - R2C1: R2C2?
Thanks!
This will do what you are expecting:
awk -F: '/^A/{printf "%s -", $2}/^B/{print}' filetest
$ awk -F: 'NR%2 {printf "%s - ", $2; next}1' filetest
1 - B : 2
You can try this
awk -F: 'NR%2==1{a=$2; } NR%2==0{print a " - " $0}' file
output
1 - B : 2
I'd probably go with #jas's answer as it's clear, simple, and not coupled to your data values but just to show an alternative approach:
$ awk '{printf "%s", (NR%2 ? $3 " - " : $0 ORS)}' file
1 - B : 2
tried on gnu awk
awk -F':' 'NR==1{s=$2;next}{FS="";s=s" - "$0;print s}' filetest

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!
Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line
First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

group of columns in awk

The following awk statement is working as expected.
awk '{print $1, $2, $3}' test.txt
But how do I say that I need all the columns after the second column?
awk '{print $1, $2, $3 to $NF}' test.txt
I need all columns from third column till end of that line. There can be 2 to 10 columns and all are considered as a part of the last column.
if you just want $3-$NF fields, standard way would be loop (for/while)
but for your requirement, you could:
awk '{$1=$2="";}sub("^ *","")'
for example:
kent$ seq -s' ' 10|awk '{$1=$2="";}sub("^ *","")'
3 4 5 6 7 8 9 10
if you want to "group" 100 fields into 3 groups: 1,2, 3-100:
awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
same example:
kent$ seq -s' ' 10|awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
1 2 345678910
hope it is what you want.
The intuitive way.
awk 'BEGIN{ORS=""} {for(i=3; i<=NF; i++) if(i != NF){print $i " "} else {print $i "\n"}}' test.txt
Some more:
awk '{$1=$2=x; $0=$0; $1=$1}1' file
awk '{$1=$1; sub($1 FS $2 FS,x)}1' file
To keep spacing in tact:
awk 'sub($1 "[ \t]*" $2 "[ \t]*",x)' file

Tab separated values in awk

How do I select the first column from the TAB separated string?
# echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F'\t' '{print $1}'
The above will return the entire line and not just "LOAD_SETTLED" as expected.
Update:
I need to change the third column in the tab separated values.
The following does not work.
echo $line | awk 'BEGIN { -v var="$mycol_new" FS = "[ \t]+" } ; { print $1 $2 var $4 $5 $6 $7 $8 $9 }' >> /pdump/temp.txt
This however works as expected if the separator is comma instead of tab.
echo $line | awk -v var="$mycol_new" -F'\t' '{print $1 "," $2 "," var "," $4 "," $5 "," $6 "," $7 "," $8 "," $9 "}' >> /pdump/temp.txt
You need to set the OFS variable (output field separator) to be a tab:
echo "$line" |
awk -v var="$mycol_new" -F'\t' 'BEGIN {OFS = FS} {$3 = var; print}'
(make sure you quote the $line variable in the echo statement)
Make sure they're really tabs! In bash, you can insert a tab using C-v TAB
$ echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -F$'\t' '{print $1}'
LOAD_SETTLED
Use:
awk -v FS='\t' -v OFS='\t' ...
Example from one of my scripts.
I use the FS and OFS variables to manipulate BIND zone files, which are tab delimited:
awk -v FS='\t' -v OFS='\t' \
-v record_type=$record_type \
-v hostname=$hostname \
-v ip_address=$ip_address '
$1==hostname && $3==record_type {$4=ip_address}
{print}
' $zone_file > $temp
This is a clean and easy to read way to do this.
You can set the Field Separator:
... | awk 'BEGIN {FS="\t"}; {print $1}'
Excellent read:
https://docs.freebsd.org/info/gawk/gawk.info.Field_Separators.html
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk -v var="test" 'BEGIN { FS = "[ \t]+" } ; { print $1 "\t" var "\t" $3 }'
If your fields are separated by tabs - this works for me in Linux.
awk -F'\t' '{print $1}' < tab_delimited_file.txt
I use this to process data generated by mysql, which generates tab-separated output in batch mode.
From awk man page:
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS prede‐
fined variable).
1st column only
— awk NF=1 FS='\t'
LOAD_SETTLED
First 3 columns
— awk NF=3 FS='\t' OFS='\t'
LOAD_SETTLED LOAD_INIT 2011-01-13
Except first 2 columns
— {g,n}awk NF=NF OFS= FS='^([^\t]+\t){2}'
— {m}awk NF=NF OFS= FS='^[^\t]+\t[^\t]+\t'
2011-01-13 03:50:01
Last column only
— awk '($!NF=$NF)^_' FS='\t', or
— awk NF=NF OFS= FS='^.*\t'
03:50:01
Should this not work?
echo "LOAD_SETTLED LOAD_INIT 2011-01-13 03:50:01" | awk '{print $1}'