awk sum addition math on fields & colums $ - strange results

awk sum addition math on fields & colums $ - strange results - awk

Please run this script. I wonder why I can not assing $1, $2 and $3 in BEGIN in order to calculate and print them:
BEGIN {
OFS=FS=";" ;
# #include getopt.awk
geb = 4;
dis = 3;
$1 = 10;
$2 = 0.19;
$3 = 20;
summe = geb+dis+$1;
colsum = $1+$2+$3
}
{
print $1 FS $2 FS $3 FS "Fee" " "summe FS $1+$3 FS 3+4+$1 FS colsum}
For example I hoped that
print $1+$3
gives me 30 ?!
Can't I assign new values to fields?

The BEGIN block happens before awk begins processing the file, so it doesn't make sense to assign to the individual fields, as they will be overwritten once the first record is read.
If you wish to perform calculations on the records that awk reads, this should be done in a normal block, like the one you are using to print.

BEGIN blcoks are executed before any input is read (unless use getline), so consequently none of the variables that refer to input, like NR,FNR,NF, fields like $0, $1, $2 ...$10 will be defined in any of the BEGIN blocks.
In fact BEGIN block is for actions to execute before you read the first line
(unless use getline) - Not Recommended
akshay#db-3325:/tmp$ seq 1 5 >test
akshay#db-3325:/tmp$ cat test
1
2
3
4
5
# Default action by awk
akshay#db-3325:/tmp$ awk 'BEGIN{print $1}' test
# if you use getline
akshay#db-3325:/tmp$ awk 'BEGIN{getline;print $1}' test
1

Related

assigning a var inside AWK for use outside awk

I am using ksh on AIX.
I have a file with multiple comma delimited fields. The value of each field is read into a variable inside the script.
The last field in the file may contain multiple | delimited values. I need to test each value and keep the first one that doesn't begin with R, then stop testing the values.
sample value of $principal_diagnosis0
R65.20|A41.9|G30.9|F02.80
I've tried:
echo $principal_diagnosis0 | awk -F"|" '{for (i = 1; i<=NF; i++) {if ($i !~ "R"){echo $i; primdiag = $i}}}'
but I get this message : awk: Field $i is not correct.
My goal is to have a variable that I can use outside of the awk statement that gets assigned the first non-R code (in this case it would be A41.9).
echo $principal_diagnosis0 | awk -F"|" '{for (i = 1; i<=NF; i++) {if ($i !~ "R"){print $i}}}'
gets me the output of :
A41.9
G30.9
F02.80
So I know it's reading the values and evaluating properly. But I need to stop after the first match and be able to use that value outside of awk.
Thanks!

To answer your specific question:
$ principal_diagnosis0='R65.20|A41.9|G30.9|F02.80'
$ foo=$(echo "$principal_diagnosis0" | awk -v RS='|' '/^[^R]/{sub(/\n/,""); print; exit}')
$ echo "$foo"
A41.9
The above will work with any awk, you can do it more briefly with GNU awk if you have it:
foo=$(echo "$principal_diagnosis0" | awk -v RS='[|\n]' '/^[^R]/{print; exit}')

you can make FS and OFS do all the hard work :
echo "${principal_diagnosis0}" |
mawk NF=NF FS='^(R[^|]+[|])+|[|].+$' OFS=
A41.9
——————————————————————————————————————————
another slightly different variation of the same concept — overwriting fields but leaving OFS as is :
gawk -F'^.*R[^|]+[|]|[|].+$' '$--NF=$--NF'
A41.9
this works, because when you break it out :
gawk -F'^.*R[^|]+[|]|[|].+$' '
{ print NF
} $(_=--NF)=$(__=--NF) { print _, __, NF, $0 }'
3
1 2 1 A41.9
you'll notice you start with NF = 3, and the two subsequent decrements make it equivalent to $1 = $2,
but since final NF is now reduced to just 1, it would print it out correctly instead of 2 copies of it
…… which means you can also make it $0 = $2, as such :
gawk -F'^.*R[^|]+[|]|[|].+$' '$-_=$-—NF'
A41.9
——————————————————————————————————————————
a 3rd variation, this time using RS instead of FS :
mawk NR==2 RS='^.*R[^|]+[|]|[|].+$'
A41.9
——————————————————————————————————————————
and if you REALLY don't wanna mess with FS/OFS/RS, use gsub() instead :
nawk 'gsub("^.*R[^|]+[|]|[|].+$",_)'
A41.9

Create awk script with 3 separated awk begin commands

I have this three awk commands in seperated script, but i would need to have them all in the same awk script usign the following command.
How it's supposed to be merged to work correctly with a while ?
Command:
gawk -f sc.awk sh1.csv > sh2.csv
awk script:
#Update id column
BEGIN{FS=OFS=","}FNR==1{print "new_id",$0;next} {print FNR-1,$0}
Second awk
#Extraxt year from date
BEGIN{FS=OFS=","} NR==1{$4="year,date"; print $0} NR!=1{sub(/[0-9]{4}/, "&,&", $4); print $0}
Third awk
#Delete surnames
BEGIN{
FS=","
OFS=","
}{
while(sub(/ [[:alpha:]]+$/,"",$3))
{}}
{print}
Dataset:
id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
3,Tim Elliot,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis Lee Lembke,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
8,Matthew Hoffman,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True
Output expected:
new_id,id,name,year,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
1,3,Tim,2015,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
2,4,Lewis,2015,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
3,8,Matthew,2015,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

You can write all three scripts as a single awk script. As with any awk script, you simply take the changes you need to make one-at-a-time and you write a rule to accomplish this. In your case, you can rewrite sc.awk as follows:
BEGIN { # begin rule
FS = OFS = ","
}
FNR == 1 { # first line rule
$3 = "year,date"
print "new_id", $0
next
}
{ # all other records
year = $3 # save $3 as year
sub(/-.*$/,"",year) # remove - to end leaving year
sub(/ .*$/,"",$2) # remove surname
$3 = year "," $3 # update new $3 field
print ++n, $0 # output new_id and record
}
Example Use/Output
With your sample input in file you would have:
$ awk -f sc.awk file
new_id,id,name,year,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
1,3,Tim,2015,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
2,4,Lewis,2015,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
3,8,Matthew,2015,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

If you need to maintain the 3 separate awk scripts and you want to be able to run them 'together', one simple (though not very efficient) method:
awk -f sh1.awk sh1.csv | awk -f sh2.awk | awk -f sh3.awk > sh2.csv
This generates:
$ cat sh2.csv
new_id,id,name,year,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
1,3,Tim,2015,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
2,4,Lewis,2015,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
3,8,Matthew,2015,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True
On the other hand, if the intention is to merge the 3 separate scripts into one you can re-use your current code by figuring out the correct order and adjusting the field references based on the original input file, eg:
$ cat sh_all.awk
BEGIN { FS=OFS="," }
FNR==1 { $3="year,date"; print "new_id",$0; next}
{ while(sub(/ [[:alpha:]]+$/,"",$2)) {}
sub(/[0-9]{4}/, "&,&", $3)
print FNR-1,$0
}
Taking for a spin:
awk -f sh_all.awk sh1.csv > sh2.csv
This also generates:
$ cat sh2.csv
new_id,id,name,year,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
1,3,Tim,2015,2015-01-02,shot,gun,53,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
2,4,Lewis,2015,2015-01-02,shot,gun,47,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
3,8,Matthew,2015,2015-01-04,shot,toy weapon,32,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True

selecting columns in awk discarding corresponding header

How to properly select columns in awk after some processing. My file here:
cat foo
A;B;C
9;6;7
8;5;4
1;2;3
I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way:
awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo
gives me this unexpected output:
linenumber;A;B;C
1 9 7
2 8 4
3 1 3
but expected is (note B is now the third column as we added linenumber as first):
linenumber;B
1;6
2;5
3;2
[fixed and revised]

To get your expected output, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1)
}' file
Output:
linenumber;C
1;9
2;8
3;1
To add a column with line number and extract first and last columns, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$1,$NF
}' file
Output this time:
linenumber;A;C
1;9;7
2;8;4
3;1;3

Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here).
[EDIT]: see a generic solution at the end.
If the field you want to keep is the second of the input file (the B column), try:
$ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo
linenumber;B
1;6
2;5
3;2
or
$ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number:
$ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example):
$ awk -F\; -v cols='1;3' '
BEGIN { OFS = ";"; n = split(cols, c); }
{ printf("%s", FNR == 1 ? "linenumber" : FNR - 1);
for(i = 1; i <= n; i++) printf("%s", OFS $(c[i]));
printf("\n");
}' foo
linenumber;A;C
1;9;7
2;8;4
3;1;3

Print every nth column of a file

I have a rather big file with 255 coma separated columns and I need to print out every third column only.
I was trying something like this
awk '{ for (i=0;i<=NF;i+=3) print $i }' file
but that doesn't seem to be the solution, since it prints to only one long column. Anybody can help? Thanks

Here is one way to do this.
The script prog.awk:
BEGIN {FS = ","} # field separator
{for (i = 1; i <= NF; i += 3) printf ("%s%c", $i, i + 3 <= NF ? "," : "\n");}
Invocation:
awk -f prog.awk <input.csv >output.csv
Example input.csv:
1,2,3,4,5,6,7,8,9,10
11,12,13,14,15,16,17,18,19,20
Example output.csv:
1,4,7,10
11,14,17,20

It behaves like that because by default awk splits fields in spaces. You have to tell it to split them with commas, and it's done using the FS variable or the -F switch. Besides that, first field is number one. The zero is the whole line, so also change the initial value of the for loop:
awk -F',' '{ for (i=1;i<=NF;i+=3) print $i }' file

awk scripting for simulation

hai friends good evening to all
i am the beginner to awk so i request you to please help me
i want to print total number of records(rows) in a text file.for that i use "print NR "command.when i use this command on BEGIN block it prints number of records instead of printing total.but when we use it in END block it returns total number of records.
for example i have a text file with 5 column and i tried like this
BEGIN {
print NR
} it returns
1
2
3
4
5
i want to print total number of records(rows) from BEGIN block itself,so please give me the answer.

NR is populated/incremented as the files are read, BEGIN is executed before any file is opened, so what you're specifically asking for can't be done.
A workaround is this:
awk 'BEGIN{ while ( (getline var < ARGV[1] > 0) ) nr++; print nr }' file
but on UNIX there are simpler ways if your records are newline-separated, e.g.:
awk -v nr="$(wc -l < file)" 'BEGIN{ print nr }' file
Also just in awk you could probably get the output you want using:
awk 'NR!=FNR && FNR==1 { print NR - FNR }' file file

You will need to pass a variable in order to access the number of records/lines in the BEGIN block, because the BEGIN block is executed before the file is processed:
awk -v count="$(wc -l < file.txt)" 'BEGIN { print count }' file.txt

BEGIN block will be executed before starting the file processing line by line.
so you cannot get the total records in BEGIN block.
it has to be done in some odd way like below:
awk 'BEGIN{"wc -l "FILENAME|getline result;print result}' your_file
here the above literally means that awk is doing nothing.its actually the shell which is doing most of the thing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk sum addition math on fields & colums $ - strange results - awk

Related

assigning a var inside AWK for use outside awk

Create awk script with 3 separated awk begin commands

selecting columns in awk discarding corresponding header

Print every nth column of a file

awk scripting for simulation

Categories

Resources