How to define dynamic array in Begin Statement with AWK - awk

I want to define an array in my BEGIN statement with undefined index number; how can I do this in AWK?
BEGIN {send_packets_0to1 = 0;rcvd_packets_0to1=0;seqno=0;count=0; n_to_n_delay[];};
I have problem with n_to_n_delay[].

info gawk says, in part:
Arrays in 'awk' superficially resemble arrays in other programming
languages, but there are fundamental differences. In 'awk', it isn't
necessary to specify the size of an array before starting to use it.
Additionally, any number or string in 'awk', not just consecutive
integers, may be used as an array index.
In most other languages, arrays must be "declared" before use,
including a specification of how many elements or components they
contain. In such languages, the declaration causes a contiguous block
of memory to be allocated for that many elements. Usually, an index in
the array must be a positive integer.
However, if you want to "declare" a variable as an array so referencing it later erroneously as a scalar produces an error, you can include this in your BEGIN clause:
split("", n_to_n_delay)
which will create an empty array.
This can also be used to empty an existing array. While gawk has the ability to use delete for this, other versions of AWK do not.

I don't think you need to define arrays in awk. You just use them as in the example below:
{
if ($1 > max)
max = $1
arr[$1] = $0
}
END {
for (x = 1; x <= max; x++)
print arr[x]
}
Notice how there's no separate definition. The example is taken from The AWK Manual.

Related

Why is the order of evaluation of expressions used for concatenation undefined in Awk?

In GNU Awk User's Guide, I went through the section 6.2.2 String Concatenation and found interesting insights:
Because string concatenation does not have an explicit operator, it is often necessary to ensure that it happens at the right time by using parentheses to enclose the items to concatenate.
Then, I was quite surprised to read the following:
Parentheses should be used around concatenation in all but the most common contexts, such as on the righthand side of ‘=’. Be careful about the kinds of expressions used in string concatenation. In particular, the order of evaluation of expressions used for concatenation is undefined in the awk language. Consider this example:
BEGIN {
a = "don't"
print (a " " (a = "panic"))
}
It is not defined whether the second assignment to a happens before or after the value of a is retrieved for producing the concatenated value. The result could be either ‘don't panic’, or ‘panic panic’.
In particular, in my GNU Awk 5.0.0 it performs like this, doing the replacement before printing the value:
$ gawk 'BEGIN {a = "dont"; print (a " " (a = "panic"))}'
dont panic
However, I wonder: why isn't the order of evaluation of expressions defined? What are the benefits of having "undefined" outputs that may vary depending on the version of Awk you are running?
This particular example is about expressions with side-effects. Traditionally, in C and awk syntax (closely inspired by C), assignments are allowed inside expressions. How those expressions are then evaluated is up to the implementation.
Leaving something unspecified would make sure that people don't use potentially confusing or ambiguous language constructs. But that assumes they are aware of the lack of specification.

Raku control statement to make numeric strings interpreted as numeric

I have a large hash of arrays,
%qual<discordant> (~approx. 13199 values like '88.23', '99.23', etc.
which ranges from 88-100, and are read in from text files,
and when I print %qual<discordant>.min and %qual<discordant>.max I can see the values are clearly wrong.
I can fix this by changing how the data is read in from the text files:
%qual{$type}.push: #line[5]
to
%qual{$type}.push: #line[5].Num
but this wasn't intuitive, this took me a few minutes to figure out why Raku/Perl6 was giving clearly incorrect answers at first. It would have been very easy to miss this error. In perl5, the default behavior would be to treat these strings like numbers anyway.
There should be some control statement to make this the default behavior, how can I do this?
The problem / feature is really that in Raku when you read lines from a file, they become strings (aka objects of type Str). If you call .min and .max on an array of Str objects, then string semantics will be used to determine whether something is bigger or smaller.
There are special values in Raku that act like values in Perl. In Raku these are called "allomorphs". They are Str, but also Num, or Rat, or Int, or Complex.
The syntax for creating an appropriate allomorph for a string in $_ is << $_ >>. So if you change the line that reads the words to:
my #line = $line.words.map: { << $_ >> }
then the values in #line will either be Str, or IntStr or RatStr. Which should make .min and .max work like you expect.
However, if you are sure that only the 5th element of #line is going to be numeric, then it is probably more efficient to convert the Str to a number before pushing to the array. A shorter syntax for that would be to prefix a +:
%qual{$type}.push: +#line[5]
Although you might find that too line-noisy.
UPDATE: I had forgotten that there's actually a sub called val that takes an Str and creates an appropriate allomorph of it (or returns the original Str). So the code for creating #line could be written as:
my #line = $line.words>>.&val

awk print 4 columns with different colours - from a declared variable

I'm just after a little help pulling in a value from a variable. I'm writing a statement to print the contents of a file to a 4 columns output on screen, colouring the 3rd column depending on what the 4th columns value is.
The file has contents as follows...
Col1=date(yymmdd)
Col2=time(hhmmss)
Col3=Jobname(test1, test2, test3, test4)
Col4=Value(null, 0, 1, 2)
Column 4 should be a value of null, 0, 1 or 2 and this is the value that will determine the colour of the 3rd column. I'm declaring the colour codes in a variable at the top of the script as follows...
declare -A colours
colours["0"]="\033[0;31m"
colours["1"]="\033[0;34m"
colours["2"]="\033[0;32m"
(note I don't have a colour for a null value, I don't know how to code this yet but I'm wanting it to be red)
My code is as follows...
cat TestScript.txt | awk '{ printf "%20s %20s %20s %10s\n", "\033[1;31m"$1,"\033[1;32m"$2,${colours[$4]}$3,"\033[1;34m"$4}'
But I get a syntax error and can't for the life of me figure a way around it no matter what I do.
Thanks for any help
Amended code below to show working solution.
I've removed the variable set originally which was done in bash, added an inline variable into the awk line...
cat TestScript.txt | awk 'BEGIN {
colours[0]="\033[0;31m"
colours[1]="\033[0;34m"
colours[2]="\033[0;32m"
}
{printf "%20s %20s %20s %10s\n","\033[1;31m"$1,"\033[1;32m"$2,colours[$4]$3,"\033[1;34m"$4}'
}
Just define the colours array in awk.
Either
BEGIN {
colours[0]="\033[0;31m"
colours[1]="\033[0;34m"
colours[2]="\033[0;32m"
}
or
BEGIN { split("\033[0;31m \033[0;34m \033[0;32m", colours) }
But in the second way, remind the first index in the array is 1, not 0.
Then, in your printf sentence the use of colours array must be changed to:
,colours[$4]$3,
But if you have defined the array using the second method, then a +1 is required:
,colours[$4+1]$3,
Best regards
In awk you can use the built-in ENVIRON hash to access the environment variables.
So instead of ${colours[$4]} (which syntax is for bash not for awk) you can write ENVIRON["something"]. Unfortunately arrays cannot accessed on this way. So instead of using colours array in environment you should use colours_1, ..., colours_2. and then you can use ENVIRON["colours_"$4].

How to pass a regular expression to a function in AWK

I do not know how to pass an regular expression as an argument to a function.
If I pass a string, it is OK,
I have the following awk file,
#!/usr/bin/awk -f
function find(name){
for(i=0;i<NF;i++)if($(i+1)~name)print $(i+1)
}
{
find("mysql")
}
I do something like
$ ./fct.awk <(echo "$str")
This works OK.
But when I call in the awk file,
{
find(/mysql/)
}
This does not work.
What am I doing wrong?
Thanks,
Eric J.
you cannot (should not) pass regex constant to a user-defined function. you have to use dynamic regex in this case. like find("mysql")
if you do find(/mysql/), what does awk do is : find($0~/mysql/) so it pass a 0 or 1 to your find(..) function.
see this question for detail.
awk variable assignment statement explanation needed
also
http://www.gnu.org/software/gawk/manual/gawk.html#Using-Constant-Regexps
section: 6.1.2 Using Regular Expression Constants
warning: regexp constant for parameter #1 yields boolean value
The regex gets evaluated (matching against $0) before it's passed to the function. You have to use strings.
Note: make sure you do proper escaping: http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps
If you use GNU awk, you can use regular expression as user defined function parameter.
You have to define your regex as #/.../.
In your example, you would use it like this:
function find(regex){
for(i=1;i<=NF;i++)
if($i ~ regex)
print $i
}
{
find(#/mysql/)
}
It's called strongly type regexp constant and it's available since GNU awk version 4.2 (Oct 2017).
Example here.
use quotations, treat them as a string. this way it works for mawk, mawk2, and gnu-gawk. but you'll also need to double the backslashes since making them strings will eat away one of them right off the bat.
in your examplem just find("mysql") will suffice.
you can actually get it to pass arbitrary regex as you wish, and not be confined to just gnu-gawk, as long as you're willing to make them strings not the #/../ syntax others have mentioned. This is where the # of backslashes make a difference.
You can even make regex out of arbitrary bytes too, preferably via octal codes. if you do "\342\234\234" as a regex, the system will convert that into actual bytes in the regex before matching.
While there's nothing with that approach, if you wanna be 100% safe and prefer not having arbitrary bytes flying around , write it as
"[\\342][\\234][\\234]" ----> ✜
Once initially read by awk to create an internal representation, it'll look like this :
[\342][\234][\234]
which will still match the identical objects you desire (in this case, some sort of cross-looking dingbat). This will spit out annoying warnings in unicode-aware mode of gawk due to attempting to enclose non-ASCII bytes directly into square brackets. For that use case,
"\\342\\234\\234" ------(eqv to )---> /\342\234\234/
will keep gawk happy and quiet. Lately I've been filling the gaps in my own codes and write regex that can mimic all the Unicode-script classes that perl enjoys.

Reorganizing named fields with AWK

I have to deal with various input files with a number of fields, arbitrarily arranged, but all consistently named and labelled with a header line. These files need to be reformatted such that all the desired fields are in a particular order, with irrelevant fields stripped and missing fields accounted for. I was hoping to use AWK to handle this, since it has done me so well when dealing with field-related dilemmata in the past.
After a bit of mucking around, I ended up with something much like the following (writing from memory, untested):
# imagine a perfectly-functional BEGIN {} block here
NR==1 {
fldname[1] = "first_name"
fldname[2] = "last_name"
fldname[3] = "middle_name"
maxflds = 3
# this is just a sample -- my real script went through forty-odd fields
for (i=1;i<=NF;i++) for (j=1;j<=maxflds;j++) if ($i == fldname[j]) fldpos[j]=i
}
NR!=1 {
for (j=1;j<=maxflds;j++) {
if (fldpos[j]) printf "%s",$fldpos[j]
printf "%s","/t"
}
print ""
}
Now this solution works fine. I run it, I get my output exactly how I want it. No complaints there. However, for anything longer than three fields or so (such as the forty-odd fields I had to work with), it's a lot of painfully redundant code which always has and always will bother me. And the thought of having to insert a field somewhere else into that mess makes me shudder.
I die a little inside each time I look at it.
I'm sure there must be a more elegant solution out there. Or, if not, perhaps there is a tool better suited for this sort of task. AWK is awesome in it's own domain, but I fear I may be stretching it's limits some with this.
Any insight?
The only suggestion that I can think of is to move the initial array setup into the BEGIN block and read the ordered field names from a separate template file in a loop. Then your awk program consists only of loops with no embedded data. Your external template file would be a simple newline-separated list.
BEGIN {while ((getline < "fieldfile") > 0) fldname[++maxflds] = $0}
You would still read the header line in the same way you are now, of course. However, it occurs to me that you could use an associative array and reduce the nested for loops to a single for loop. Something like (untested):
BEGIN {while ((getline < "fieldfile") > 0) fldname[$0] = ++maxflds}
NR==1 {
for (i=1;i<=NF;i++) fldpos[i] = fldname[$i]
}