I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight
Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.
I want to use a one-liner to print a middle section of a file by using a state variable to indicate whether the current line is within the desired section of the file. But I am unable to initialize the state variable. Initialization is so simple, and I just cannot find what the problem is. Please help. Thanks.
The file is name testFile.txt and has the following lines:
section 0; state 0; not needed
= start section 1 =
state 1; needed
= end section 1 =
section 2; state 2; not needed
And my one-liner is
cat testFile.txt | perl6 -ne ' state $x = 0; say "$x--> "; if $_ ~~ m/ "start" / { $x=1; }; if $x == 1 { .say; }; if $_ ~~ m/ "end" / { $x = 2; }'
And the output showed that $x=0 is not doing initialization:
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
Use of uninitialized value of type Any in numeric context
in block at -e line 1
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
= start section 1 =
1-->
state 1; needed
1-->
= end section 1 =
2-->
2-->
This looks like a bug to me: Apparently, -n does not properly set up a lexical environment.
As a workaround, you can wrap the whole things in a block, eg by surrounding your code with do { ... } or even just { ... }.
Also note that depending on your use case, the whole thing can probably be simplified by using the flip-flop operator, eg
cat testFile.txt | perl6 -ne '.say if / "start" / ff / "end" /'
I'm trying to concatenate String values and print them, but if the last types are Strings and there is no change of type then the concatenation won't print:
input.txt:
String 1
String 2
Number 5
Number 2
String 3
String 3
awk:
awk '
BEGIN { tot=0; ant_t=""; }
{
t = $1; val=$2;
#if string, concatenate its value
if (t == "String") {
tot+=val;
nx=1;
} else {
nx=0;
}
#if type change, add tot to res
if (t != "String" && ant_t == "String") {
res=res tot;
tot=0;
}
ant_t=t;
#if string, go next
if (nx == 1) {
next;
}
res=res"\n"val;
}
END { print res; }' input.txt
Current output:
3
5
2
Expected output:
3
5
2
6
How can I detect if awk is reading last line, so if there won't be change of type it will check if it is the last line?
awk reads line by line hence it cannot determine if it is reading the last line or not. The END block can be useful to perform actions once the end of file has reached.
To perform what you expect
awk '/String/{sum+=$2} /Number/{if(sum) print sum; sum=0; print $2} END{if(sum) print sum}'
will produce output as
3
5
2
6
what it does?
/String/ selects line that matches String so is Number
sum+=$2 performs the concatanation with String lines. When Number occurs, print the sum and reset to zero
Like this maybe:
awk -v lines="$(wc -l < /etc/hosts)" 'NR==lines{print "LAST"};1' /etc/hosts
I am pre-calculating the number of lines (using wc) and passing that into awk as a variable called lines, if that is unclear.
Just change last line to:
END { print res; print tot;}'
awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
Explanation
y is used as a boolean, and I check at the END if the last pattern was a string and print the sum
You can actually use x as the boolean like nu11p01n73R does which is smarter
Test
$ cat file
String 1
String 2
Number 5
Number 2
String 3
String 3
$ awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
3
5
2
6
While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }