perl6 Unable to initialize a state variable. Help needed - variables

I want to use a one-liner to print a middle section of a file by using a state variable to indicate whether the current line is within the desired section of the file. But I am unable to initialize the state variable. Initialization is so simple, and I just cannot find what the problem is. Please help. Thanks.
The file is name testFile.txt and has the following lines:
section 0; state 0; not needed
= start section 1 =
state 1; needed
= end section 1 =
section 2; state 2; not needed
And my one-liner is
cat testFile.txt | perl6 -ne ' state $x = 0; say "$x--> "; if $_ ~~ m/ "start" / { $x=1; }; if $x == 1 { .say; }; if $_ ~~ m/ "end" / { $x = 2; }'
And the output showed that $x=0 is not doing initialization:
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
Use of uninitialized value of type Any in numeric context
in block at -e line 1
Use of uninitialized value $x of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block at -e line 1
-->
= start section 1 =
1-->
state 1; needed
1-->
= end section 1 =
2-->
2-->

This looks like a bug to me: Apparently, -n does not properly set up a lexical environment.
As a workaround, you can wrap the whole things in a block, eg by surrounding your code with do { ... } or even just { ... }.
Also note that depending on your use case, the whole thing can probably be simplified by using the flip-flop operator, eg
cat testFile.txt | perl6 -ne '.say if / "start" / ff / "end" /'

Related

Cannot use Awk with arguments

I am trying to argument processing on awk. It works fine if I have no body in it:
PK#rhel8:~/tmp-> cat testARG.awk
#!/usr/bin/awk -f
BEGIN{
argc = ARGC ;
CmdName = ARGV[0] ;
FirstArg = ARGV[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
#{
# printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
# print $0
#}
PK#rhel8:~/tmp-> ./testARG.awk 1 2 3 4
Argument count = 5; command name = awk; first argument = 1
PK#rhel8:~/tmp->
However when I uncomment the body it doesn't like it at all:
PK#rhel8:~/tmp-> cat testARG.awk
#!/usr/bin/awk -f
BEGIN{
argc = ARGC ;
CmdName = ARGV[0] ;
FirstArg = ARGV[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
{
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
print $0
}
PK#rhel8:~/tmp-> ./testARG.awk 1 2 3 4
Argument count = 5; command name = awk; first argument = 1
awk: ./testARG.awk:6: fatal: cannot open file `1' for reading (No such file or directory)
PK#rhel8:~/tmp->
Is there some different way I have to use awk to allow it to see the arguments as arguments and not files?
You may want to get used to a more standard way of passing non-file args to awk. A common method is to define awk variable assignments on the command line. The general format:
awk -v awk_var1="OS val 1" -v awk_var2="OS val 2" '... awk script ...' [ optional list of filenames]
# or
./script.awk -v awk_var1="OS val 1" -v awk_var2="OS val 2" [ optional list of filenames]
If you won't be processing any files then all processing within the awk script will need to take place within the BEGIN{} block.
Since the current script is looking to count input args and print the 'first', I take it to mean the number of input args could be variable. One common approach would be to provide the args in a single delimited string, eg:
$ cat testARG.awk
#!/usr/bin/awk -f
BEGIN { n=split(inlist,vars,";") # split awk input variable "inlist" on a ";" delimiter and put results in the vars[] array; split() returns the number of entries in the array, we save this count in awk variable "n"
CmdName = ARGV[0]
printf "Argument count = %d; command name = %s; first argument = %s\n", n, CmdName, vars[1]
}
$ ./testARG.awk -v inlist="1;2;3;4" # define awk variable "inlist" as a ";"-delimited list of values
Argument count = 4; command name = awk; first argument = 1
The fix is to add a -:
PK#rhel8:~/tmp-> ./testARG.awk - 1 2 3 4
Argument count = 6; command name = awk; first argument = -
howdy
Argument count = 6; command name = awk; first argument = -
howdy
^C
PK#rhel8:~/tmp->
Of course I have to move the arguments down by one to skip the -. However it still gets confused if you use ^D instead of ^C:
PK#rhel8:~/tmp-> ./testARG.awk - 1 2 3 4
Argument count = 6; command name = awk; first argument = -
howdy
Argument count = 6; command name = awk; first argument = -
howdy
howdy there
Argument count = 6; command name = awk; first argument = -
howdy there
awk: ./testARG.awk:10: fatal: cannot open file `1' for reading (No such file or directory)
PK#rhel8:~/tmp->
Not sure how the coders intended it to be used. So it would appear the solution suggested #markp-fuso is better choice.
It's not entirely clear what you want to do since you're passing numbers to your Unix command and apparently don't want any of them treated as file names, but then your uncommented code relies on having an input file to process.
I'm assuming since you have a part of your script operating line by line printing $0 that you'll pass a file name for awk to work on as the final argument to your command but you want the command arguments before that last one to not be treated as files.
Given that, here's how to write a Unix command to see initial arguments to it as values to be passed to your awk script and not as files without changing the interface to your script and without tightly coupling it to being implemented in awk:
$ cat testArg
#!/usr/bin/env bash
args="${*:1:$#-1}"
shift $(( $# - 1 ))
awk -v args="$args" '
BEGIN{
argc = split(args,vals)
CmdName = ARGV[0] ;
FirstArg = vals[1]
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
}
{
printf("Argument count = %d; command name = %s; first argument = %s\n",argc,CmdName,FirstArg) ;
print $0
}
' "${#:--}"
$ seq 2 > file
$ ./testArg 1 2 3 file
Argument count = 3; command name = awk; first argument = 1
Argument count = 3; command name = awk; first argument = 1
1
Argument count = 3; command name = awk; first argument = 1
2
Note that with this approach you don't need do modify your desired command line in any way, including having anything awk-specific on the command-line when you call your script, and if you decide to replace awk with perl or a compiled C program or anything else inside your script you don't need to change the script's API nor any of the places it's called from.

How to return 0 if awk returns null from processing an expression?

I currently have a awk method to parse through whether or not an expression output contains more than one line. If it does, it aggregates and prints the sum. For example:
someexpression=$'JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)'
might be the one-liner where it DOESN'T yield any information. Then,
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
printf "%d\n", a[i]
}
}'
this will yield NULL or an empty return. Instead, I would like to have it return a numeric value of $0$ if empty. How can I modify the above to do this?
Nothing in UNIX "returns" anything (despite the unfortunately named keyword for setting the exit status of a function), everything (tools, functions, scripts) outputs X and exits with status Y.
Consider these 2 identical functions named foo(), one in C and one in shell:
C (x=foo() means set x to the return code of foo()):
foo() {
printf "7\n"; // this is outputting 7 from the full program
return 3; // this is returning 3 from this function
}
x=foo(); <- 7 is output on screen and x has value '3'
shell (x=foo means set x to the output of foo()):
foo() {
printf "7\n"; # this is outputting 7 from just this function
return 3; # this is setting this functions exit status to 3
}
x=foo <- nothing is output on screen, x has value '7', and '$?' has value '3'
Note that what the return statement does is vastly different in each. Within an awk script, printing and return codes from functions behave the same as they do in C but in terms of a call to the awk tool, externally it behaves the same as every other UNIX tool and shell script and produces output and sets an exit status.
So when discussing anything in UNIX avoid using the term "return" as it's imprecise and ambiguous and so different people will think you mean "output" while others think you mean "exit status".
In this case I assume you mean "output" BUT you should instead consider setting a non-zero exit status when there's no match like grep does, e.g.:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
print a[i]
}
exit (NR < 2)
}'
and then your code that uses the above can test for the success/fail exit status rather than testing for a specific output value, just like if you were doing the equivalent with grep.
You can of course tweak the above to:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
if ( NR > 1 ) {
for (i in a) {
print a[i]
}
}
else {
print "$0$"
exit 1
}
}'
if necessary and then you have both a specific output value and a success/fail exit status.
You may keep a flag inside for loop to detect whether loop has executed or not:
echo "$someexpression" |
awk 'NR>1 {
a[$4]++
}
END
{
for (i in a) {
p = 1
printf "%d\n", a[i]
}
if (!p)
print "$0$"
}'
$0$

read file and extract variables based on what is in the line

I have a file that looks like this:
$ cat file_test
garbage text A=one B=two C=three D=four
garbage text A= B=six D=seven
garbage text A=eight E=nine D=ten B=eleven
I want to go through each line and extract specific "variables" to use in the loop. And if a line doesn't have a variable then set it to an empty string.
So, for the above example, lets say I want to extract the variables A, B, and C, then for each line, the loop would have this:
garbage text A=one B=two C=three D=four
A = "one"
B = "two"
C = "three"
garbage text A= B=six D=seven
A = ""
B = "six"
C = ""
garbage text A=eight E=nine D=ten B=eleven
A = "eight"
B = "eleven"
C = ""
My original plan was to use sed but that won't work since the order of the "variables" is not consistent (the last line for example) and a "variable" may be missing (the second line for example).
My next thought is to go through line by line, then split the line into fields using awk and set variables based on each field but I have no clue where or how to start.
I'm open to other ideas or better suggestions.
right answer depends on what you're going to do with the variables.
assuming you need them as shell variables, here is a different approach
$ while IFS= read -r line;
do A=""; B=""; C="";
source <(echo "$line" | grep -oP "(A|B|C)=\w*" );
echo "A=$A B=$B C=$C";
done < file
A=one B=two C=three
A= B=six C=
A=eight B=eleven C=
the trick is using source for variable declarations extracted from each line with grep. Since value assignments carry over, you need to reset them before each new line.
If perl is your option, please try:
perl -ne 'undef %a; while (/([\w]+)=([\w]*)/g) {$a{$1}=$2;}
for ("A", "B", "C") {print "$_=\"$a{$_}\"\n";}' file_test
Output:
A="one"
B="two"
C="three"
A=""
B="six"
C=""
A="eight"
B="eleven"
C=""
It parses each line for assignments with =, store the key-value pair in an assoc array %a, then finally reports the values for A, B and C.
I'm partial to the awk solution, e.g.
$ awk '{for (i = 1; i <= NF; i++) if ($i ~ /^[A-Za-z_][^=]*[=]/) print $i}' file
A=one
B=two
C=three
D=four
A=
B=six
D=seven
A=eight
E=nine
D=ten
B=eleven
Explanation
for (i = 1; i <= NF; i++) loop over each space separated field;
if ($i ~ /^[A-Za-z_][^=]*[=]/) if the field begins with at least one character that is [A-Za-z_] followed by an '='; then
print $i print the field.
On my first 3 solutions, I am considering that your need to use shell variables from the values of strings A,B,C and you do not want to simply print them, if this is the case then following(s) may help you.
1st Solution: It considers that your variables A,B,C are always coming in same field number.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=${third#*=}
b_var=${fourth#*=}
c_var=${fifth#*=}
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
It is simply printing the variables values in each line since you have NOT told what use you are going to do with these variables so I am simply printing them you could use them as per your use case too.
2nd solution: This considers that variables are coming in same order but it does check if A is coming on 3rd place or not, B is coming on 4th place or not etc and prints accordingly.
while read first second third fourth fifth sixth
do
echo $third,$fourth,$fifth ##Printing values here.
a_var=$(echo "$third" | awk '$0 ~ /^A/{sub(/.*=/,"");print}')
b_var=$(echo "$fourth" | awk '$0 ~ /^B/{sub(/.*=/,"");print}')
c_var=$(echo "$fifth" | awk '$0 ~ /^C/{sub(/.*=/,"");print}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file"
3rd Solution: Which looks perfect FIT for your requirement, not sure how much efficient from coding vice(I am still analyzing more if we could do something else here too). This code will NOT look for A,B, or C's order in line it will match it let them be anywhere in line, if match found it will assign value of variable OR else it will be NULL value.
while read line
do
a_var=$(echo "$line" | awk 'match($0,/A=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
b_var=$(echo "$line" | awk 'match($0,/B=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
c_var=$(echo "$line" | awk 'match($0,/C=[^ ]*/){val=substr($0,RSTART,RLENGTH);sub(/.*=/,"",val);print val}')
echo "Using new values of variables here...."
echo "NEW A="$a_var
echo "NEW B="$b_var
echo "NEW C="$c_var
done < "Input_file
Output will be as follows.
Using new values of variables here....
NEW A=one
NEW B=two
NEW C=three
Using new values of variables here....
NEW A=
NEW B=six
NEW C=
Using new values of variables here....
NEW A=eight
NEW B=eleven
NEW C=
EDIT1: In case you simply want to print values of A,B,C then try following.
awk '{
for(i=1;i<=NF;i++){
if($i ~ /[ABCabc]=/){
sub(/.*=/,"",$i)
a[++count]=$i
}
}
print "A="a[1] ORS "B=" a[2] ORS "C="a[3];count=""
delete a
}' Input_file
Another Perl
perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() '
with the input file
$ perl -lne ' %x = /(\S+)=(\S+)/g ; for("A","B","C") { print "$_ = $x{$_}" } %x=() ' file_test
A = one
B = two
C = three
A =
B = six
C =
A = eight
B = eleven
C =
$
a generic variable awk seld documented.
Assuming variable separator are = and not part of text before nor variable content itself.
awk 'BEGIN {
# load the list of variable and order to print
VarSize = split( "A B C", aIdx )
# create a pattern filter for variable catch in lines
for ( Idx in aIdx ) VarEntry = ( VarEntry ? ( VarEntry "|^" ) : "^" ) aIdx[Idx] "="
}
{
# reset varaible value
split( "", aVar )
# for each part of the line
for ( Fld=1; Fld<=NF; Fld++ ) {
# if part is a varaible assignation
if( $Fld ~ VarEntry ) {
# separate variable name and content in array
split( $Fld, aTemp, /=/ )
# put variable content in corresponding varaible name container
aVar[aTemp[1]] = aTemp[2]
}
}
# print all variable content (empty or not) found on this line
for ( Idx in aIdx ) printf( "%s = \042%s\042\n", aIdx[Idx], aVar[aIdx[Idx]] )
}
' YourFile
Its unclear whether you're trying to set awk variables or shell variables but here's how to populate an associative awk array and then use that to populate an associative shell array:
$ cat tst.awk
BEGIN {
numKeys = split("A B C",keys)
}
{
delete f
for (i=1; i<=NF; i++) {
if ( split($i,t,/=/) == 2 ) {
f[t[1]] = t[2]
}
}
for (keyNr=1; keyNr<=numKeys; keyNr++) {
key = keys[keyNr]
printf "[%s]=\"%s\"%s", key, f[key], (keyNr<numKeys ? OFS : ORS)
}
}
$ awk -f tst.awk file
[A]="one" [B]="two" [C]="three"
[A]="" [B]="six" [C]=""
[A]="eight" [B]="eleven" [C]=""
$ while IFS= read -r out; do declare -A arr="( $out )"; declare -p arr; done < <(awk -f tst.awk file)
declare -A arr=([A]="one" [B]="two" [C]="three" )
declare -A arr=([A]="" [B]="six" [C]="" )
declare -A arr=([A]="eight" [B]="eleven" [C]="" )
$ echo "${arr["A"]}"
eight

Endless recursion in gawk-script

Please pardon me in advance for posting such a big part of my problem, but I just can't put my finger on the part that fails...
I got input-files like this (abas-FO if you care to know):
.fo U|xiininputfile = whatever
.type text U|xigibsgarnich
.assign U|xigibsgarnich
..
..Comment
.copy U|xigibswohl = Spaß
.ein "ow1/UWEDEFTEST.FOP"
.in "ow1/UWEINPUT2"
.continue BOTTOM
.read "SOemthing" U|xttmp
!BOTTOM
..
..
Now I want to recursivly follow each .in[put]/.ein[gabe]-statement, parse the mentioned file and if I don't know it yet, add it to an array. My code looks like this:
#!/bin/awk -f
function getFopMap(inputregex, infile, mandantdir, infiles){
while(getline f < infile){
#printf "*"
#don't match if there is a '
if(f ~ inputregex "[^']"){
#remove .input-part
sub(inputregex, "", f)
#trim right
sub(/[[:blank:]]+$/, "", f)
#remove leading and trailing "
gsub(/(^\"|\"$)/,"" ,f)
if(!(f in infiles)){
infiles[f] = "found"
}
}
}
close(infile)
for (i in infiles){
if(infiles[i] == "found"){
infiles[i] = "parsed"
cmd = "test -f \"" i "\""
if(system(cmd) == 0){
close(cmd)
getFopMap(inputregex, f, mandantdir, infiles)
}
}
}
}
BEGIN{
#Matches something like [.input myfile] or [.ein "ow1/myfile"]
inputregex = "^\\.(in|ein)[^[:blank:]]*[[:blank:]]+"
#Get absolute path of infile
cmd = "python -c 'import os;print os.path.abspath(\"" ARGV[1] "\")'"
cmd | getline rootfile
close(cmd)
infiles[rootfile] = "parsed"
getFopMap(inputregex, rootfile, mandantdir, infiles)
#output result
for(infile in infiles) print infile
exit
}
I call the script (in the same directory the paths are relative to) like this:
./script ow1/UWEDEFTEST.FOP
I get no output. It just hangs up. If I remove the comment before the printf "*" command, I'm seeing stars, without end.
I appreciate every help and hints how to do it better.
My awk:
gawk Version 3.1.7
idk it it's your only problem but you're calling getline incorrectly and consequently will go into an infinite loop in some scenarios. Make sure you fully understand all of the caveats at http://awk.info/?tip/getline and you might want to use the recursion example there as the starting point for your code.
The most important item initially for your code is that when getline fails it can return a negative value so then while(getline f < infile) will create an infinite loop since the failing getline will always be returning non-zero and will so continue to be called and continue to fail. You need to use while ( (getline f < infile) > 0) instead.

AWK -- How to assign a variable's value from matching regex which comes later?

While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }