Cannot use Awk with arguments - awk

I am trying to argument processing on awk. It works fine if I have no body in it:
PK#rhel8:~/tmp-> cat testARG.awk
#!/usr/bin/awk -f
BEGIN{
argc = ARGC ;
CmdName = ARGV[0] ;
FirstArg = ARGV[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
#{
# printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
# print $0
#}
PK#rhel8:~/tmp-> ./testARG.awk 1 2 3 4
Argument count = 5; command name = awk; first argument = 1
PK#rhel8:~/tmp->
However when I uncomment the body it doesn't like it at all:
PK#rhel8:~/tmp-> cat testARG.awk
#!/usr/bin/awk -f
BEGIN{
argc = ARGC ;
CmdName = ARGV[0] ;
FirstArg = ARGV[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
{
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
print $0
}
PK#rhel8:~/tmp-> ./testARG.awk 1 2 3 4
Argument count = 5; command name = awk; first argument = 1
awk: ./testARG.awk:6: fatal: cannot open file `1' for reading (No such file or directory)
PK#rhel8:~/tmp->
Is there some different way I have to use awk to allow it to see the arguments as arguments and not files?

You may want to get used to a more standard way of passing non-file args to awk. A common method is to define awk variable assignments on the command line. The general format:
awk -v awk_var1="OS val 1" -v awk_var2="OS val 2" '... awk script ...' [ optional list of filenames]
# or
./script.awk -v awk_var1="OS val 1" -v awk_var2="OS val 2" [ optional list of filenames]
If you won't be processing any files then all processing within the awk script will need to take place within the BEGIN{} block.
Since the current script is looking to count input args and print the 'first', I take it to mean the number of input args could be variable. One common approach would be to provide the args in a single delimited string, eg:
$ cat testARG.awk
#!/usr/bin/awk -f
BEGIN { n=split(inlist,vars,";") # split awk input variable "inlist" on a ";" delimiter and put results in the vars[] array; split() returns the number of entries in the array, we save this count in awk variable "n"
CmdName = ARGV[0]
printf "Argument count = %d; command name = %s; first argument = %s\n", n, CmdName, vars[1]
}
$ ./testARG.awk -v inlist="1;2;3;4" # define awk variable "inlist" as a ";"-delimited list of values
Argument count = 4; command name = awk; first argument = 1

The fix is to add a -:
PK#rhel8:~/tmp-> ./testARG.awk - 1 2 3 4
Argument count = 6; command name = awk; first argument = -
howdy
Argument count = 6; command name = awk; first argument = -
howdy
^C
PK#rhel8:~/tmp->
Of course I have to move the arguments down by one to skip the -. However it still gets confused if you use ^D instead of ^C:
PK#rhel8:~/tmp-> ./testARG.awk - 1 2 3 4
Argument count = 6; command name = awk; first argument = -
howdy
Argument count = 6; command name = awk; first argument = -
howdy
howdy there
Argument count = 6; command name = awk; first argument = -
howdy there
awk: ./testARG.awk:10: fatal: cannot open file `1' for reading (No such file or directory)
PK#rhel8:~/tmp->
Not sure how the coders intended it to be used. So it would appear the solution suggested #markp-fuso is better choice.

It's not entirely clear what you want to do since you're passing numbers to your Unix command and apparently don't want any of them treated as file names, but then your uncommented code relies on having an input file to process.
I'm assuming since you have a part of your script operating line by line printing $0 that you'll pass a file name for awk to work on as the final argument to your command but you want the command arguments before that last one to not be treated as files.
Given that, here's how to write a Unix command to see initial arguments to it as values to be passed to your awk script and not as files without changing the interface to your script and without tightly coupling it to being implemented in awk:
$ cat testArg
#!/usr/bin/env bash
args="${*:1:$#-1}"
shift $(( $# - 1 ))
awk -v args="$args" '
BEGIN{
argc = split(args,vals)
CmdName = ARGV[0] ;
FirstArg = vals[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
{
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
print $0
}
' "${#:--}"
$ seq 2 > file
$ ./testArg 1 2 3 file
Argument count = 3; command name = awk; first argument = 1
Argument count = 3; command name = awk; first argument = 1
1
Argument count = 3; command name = awk; first argument = 1
2
Note that with this approach you don't need do modify your desired command line in any way, including having anything awk-specific on the command-line when you call your script, and if you decide to replace awk with perl or a compiled C program or anything else inside your script you don't need to change the script's API nor any of the places it's called from.

Related

read file and extract variables based on what is in the line

I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight

How to evaluate or process if statements in data?

Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

perl6 Unable to initialize a state variable. Help needed

I want to use a one-liner to print a middle section of a file by using a state variable to indicate whether the current line is within the desired section of the file. But I am unable to initialize the state variable. Initialization is so simple, and I just cannot find what the problem is. Please help. Thanks.
The file is name testFile.txt and has the following lines:
section 0; state 0; not needed
= start section 1 =
state 1; needed
= end section 1 =
section 2; state 2; not needed
And my one-liner is
cat testFile.txt | perl6 -ne ' state $x = 0; say "$x--> "; if $_ ~~ m/ "start" / { $x=1; }; if $x == 1 { .say; }; if $_ ~~ m/ "end" / { $x = 2; }'
And the output showed that $x=0 is not doing initialization:
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
Use of uninitialized value of type Any in numeric context
in block at -e line 1
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
= start section 1 =
1-->
state 1; needed
1-->
= end section 1 =
2-->
2-->
This looks like a bug to me: Apparently, -n does not properly set up a lexical environment.
As a workaround, you can wrap the whole things in a block, eg by surrounding your code with do { ... } or even just { ... }.
Also note that depending on your use case, the whole thing can probably be simplified by using the flip-flop operator, eg
cat testFile.txt | perl6 -ne '.say if / "start" / ff / "end" /'

How to detect the last line in awk before END?

I'm trying to concatenate String values and print them, but if the last types are Strings and there is no change of type then the concatenation won't print:
input.txt:
String 1
String 2
Number 5
Number 2
String 3
String 3
awk:
awk '
BEGIN { tot=0; ant_t=""; }
{
t = $1; val=$2;
#if string, concatenate its value
if (t == "String") {
tot+=val;
nx=1;
} else {
nx=0;
}
#if type change, add tot to res
if (t != "String" && ant_t == "String") {
res=res tot;
tot=0;
}
ant_t=t;
#if string, go next
if (nx == 1) {
next;
}
res=res"\n"val;
}
END { print res; }' input.txt
Current output:
3
5
2
Expected output:
3
5
2
6
How can I detect if awk is reading last line, so if there won't be change of type it will check if it is the last line?
awk reads line by line hence it cannot determine if it is reading the last line or not. The END block can be useful to perform actions once the end of file has reached.
To perform what you expect
awk '/String/{sum+=$2} /Number/{if(sum) print sum; sum=0; print $2} END{if(sum) print sum}'
will produce output as
3
5
2
6
what it does?
/String/ selects line that matches String so is Number
sum+=$2 performs the concatanation with String lines. When Number occurs, print the sum and reset to zero
Like this maybe:
awk -v lines="$(wc -l < /etc/hosts)" 'NR==lines{print "LAST"};1' /etc/hosts
I am pre-calculating the number of lines (using wc) and passing that into awk as a variable called lines, if that is unclear.
Just change last line to:
END { print res; print tot;}'
awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
Explanation
y is used as a boolean, and I check at the END if the last pattern was a string and print the sum
You can actually use x as the boolean like nu11p01n73R does which is smarter
Test
$ cat file
String 1
String 2
Number 5
Number 2
String 3
String 3
$ awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
3
5
2
6

AWK -- How to assign a variable's value from matching regex which comes later?

While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }