awk print 4 columns with different colours - from a declared variable - awk

I'm just after a little help pulling in a value from a variable. I'm writing a statement to print the contents of a file to a 4 columns output on screen, colouring the 3rd column depending on what the 4th columns value is.
The file has contents as follows...
Col1=date(yymmdd)
Col2=time(hhmmss)
Col3=Jobname(test1, test2, test3, test4)
Col4=Value(null, 0, 1, 2)
Column 4 should be a value of null, 0, 1 or 2 and this is the value that will determine the colour of the 3rd column. I'm declaring the colour codes in a variable at the top of the script as follows...
declare -A colours
colours["0"]="\033[0;31m"
colours["1"]="\033[0;34m"
colours["2"]="\033[0;32m"
(note I don't have a colour for a null value, I don't know how to code this yet but I'm wanting it to be red)
My code is as follows...
cat TestScript.txt | awk '{ printf "%20s %20s %20s %10s\n", "\033[1;31m"$1,"\033[1;32m"$2,${colours[$4]}$3,"\033[1;34m"$4}'
But I get a syntax error and can't for the life of me figure a way around it no matter what I do.
Thanks for any help
Amended code below to show working solution.
I've removed the variable set originally which was done in bash, added an inline variable into the awk line...
cat TestScript.txt | awk 'BEGIN {
colours[0]="\033[0;31m"
colours[1]="\033[0;34m"
colours[2]="\033[0;32m"
}
{printf "%20s %20s %20s %10s\n","\033[1;31m"$1,"\033[1;32m"$2,colours[$4]$3,"\033[1;34m"$4}'
}

Just define the colours array in awk.
Either
BEGIN {
colours[0]="\033[0;31m"
colours[1]="\033[0;34m"
colours[2]="\033[0;32m"
}
or
BEGIN { split("\033[0;31m \033[0;34m \033[0;32m", colours) }
But in the second way, remind the first index in the array is 1, not 0.
Then, in your printf sentence the use of colours array must be changed to:
,colours[$4]$3,
But if you have defined the array using the second method, then a +1 is required:
,colours[$4+1]$3,
Best regards

In awk you can use the built-in ENVIRON hash to access the environment variables.
So instead of ${colours[$4]} (which syntax is for bash not for awk) you can write ENVIRON["something"]. Unfortunately arrays cannot accessed on this way. So instead of using colours array in environment you should use colours_1, ..., colours_2. and then you can use ENVIRON["colours_"$4].

Related

How does associative arrays work in awk?

I wanted to remove duplicate lines from a file based on a column. A quick search let me this page which had the following solution:
awk '!x[$1]++' filename
It works, but I am not sure how it works. I know it uses associate arrays in awk but I am not able to infer anything beyond it.
Update:
Thanks everyone for the explanation. With my new knowledge, I have wrote a blog post with further explanation of how it works.
That awk script !x[$1]++ fills an array named x. Suppose the first word ($1 refers to the first word in a line of text) in a line of text is line1. It effectively results in this operation on the array:
x["line1"]++
The "index" (the key) of the array is the text encountered in the file (line1 in this example), and the value associated with that key is an integer that is incremented by 1.
When a unique line of text is encountered, the current value of the array is zero, which is then post-incremented to 1. The not operator ! evaluates to non-zero (true) for each new unique line of text and so prints it. The next time the same value is encountered, the value in the array is non-zero and so the not operation results in zero (false), so the line is not printed.
A less "clever" way of writing the same thing (but possibly more clear and less fun) would be this:
{
if (x[$1] == 0 )
print
x[$1]++
}

How to skip records that turn on/off the range pattern?

gawk '/<Lexer>/,/<\/Lexer>/' file
this works but it prints the first and last records, which I'd like to omit. How to do so?
It says: "The record that turns on the range pattern and the one that turns it off both match the range pattern. If you don't want to operate on these records, you can write if statements in the rule's action to distinguish them from the records you are interested in." but no example.
I tried something like
gawk '/<Lexer>/,/<\/Lexer>/' {1,FNR-1} file
but it doesn't work.
If you have a better way to do this, without using awk, say so.
You can do it with 2 separate match statements and a variable
gawk '/<Lexer>/{p=1; next} /<\/Lexer>/ {p=0} p==1 {print}' file
This matches <Lexer> and sets p to 1 and then skips to the next line. While p is 1 it prints the current line. When it matches </Lexer> it sets p to 0 and skips. As p is 0 printing is suppressed.

Assigning value of a filed (positional variable) into a user defined variable in gawk/awk

I am creating a variable called "size" and trying to assign a value to it from gawk positional variable. But, that does not seem to work. In the example below, I am trying to store the value of field 4 into a variable "size". When I print the variable size, entire line is printed instead just the filed 4.
How can I save the filed value into a variable for later use?
prompt> echo "Live in a big city" | gawk '/Live/ {size=$4; print $size}'
The following is outputted:
Live in a big city
I would like to see just this:
big
Leave out the dollar sign. awk is like C, not like shell or perl, where you don't need any extra punctuation to dereference a variable. You only use a dollar sign to get the value of the n'th field on the current line.
echo "Live in a big city" | gawk '/Live/ {size=$4; print size}'
The reason you get the whole line printed is this: the awk variable size is assigned the value big. Then, in the print statement, awk dereferences the size variable and attempts print $big. The string "big" is interpreted as an integer and, as it does not begin with any digits, it is treated as the number 0. So you get print $0, and hence the complete line.

How to define dynamic array in Begin Statement with AWK

I want to define an array in my BEGIN statement with undefined index number; how can I do this in AWK?
BEGIN {send_packets_0to1 = 0;rcvd_packets_0to1=0;seqno=0;count=0; n_to_n_delay[];};
I have problem with n_to_n_delay[].
info gawk says, in part:
Arrays in 'awk' superficially resemble arrays in other programming
languages, but there are fundamental differences. In 'awk', it isn't
necessary to specify the size of an array before starting to use it.
Additionally, any number or string in 'awk', not just consecutive
integers, may be used as an array index.
In most other languages, arrays must be "declared" before use,
including a specification of how many elements or components they
contain. In such languages, the declaration causes a contiguous block
of memory to be allocated for that many elements. Usually, an index in
the array must be a positive integer.
However, if you want to "declare" a variable as an array so referencing it later erroneously as a scalar produces an error, you can include this in your BEGIN clause:
split("", n_to_n_delay)
which will create an empty array.
This can also be used to empty an existing array. While gawk has the ability to use delete for this, other versions of AWK do not.
I don't think you need to define arrays in awk. You just use them as in the example below:
{
if ($1 > max)
max = $1
arr[$1] = $0
}
END {
for (x = 1; x <= max; x++)
print arr[x]
}
Notice how there's no separate definition. The example is taken from The AWK Manual.

Reorganizing named fields with AWK

I have to deal with various input files with a number of fields, arbitrarily arranged, but all consistently named and labelled with a header line. These files need to be reformatted such that all the desired fields are in a particular order, with irrelevant fields stripped and missing fields accounted for. I was hoping to use AWK to handle this, since it has done me so well when dealing with field-related dilemmata in the past.
After a bit of mucking around, I ended up with something much like the following (writing from memory, untested):
# imagine a perfectly-functional BEGIN {} block here
NR==1 {
fldname[1] = "first_name"
fldname[2] = "last_name"
fldname[3] = "middle_name"
maxflds = 3
# this is just a sample -- my real script went through forty-odd fields
for (i=1;i<=NF;i++) for (j=1;j<=maxflds;j++) if ($i == fldname[j]) fldpos[j]=i
}
NR!=1 {
for (j=1;j<=maxflds;j++) {
if (fldpos[j]) printf "%s",$fldpos[j]
printf "%s","/t"
}
print ""
}
Now this solution works fine. I run it, I get my output exactly how I want it. No complaints there. However, for anything longer than three fields or so (such as the forty-odd fields I had to work with), it's a lot of painfully redundant code which always has and always will bother me. And the thought of having to insert a field somewhere else into that mess makes me shudder.
I die a little inside each time I look at it.
I'm sure there must be a more elegant solution out there. Or, if not, perhaps there is a tool better suited for this sort of task. AWK is awesome in it's own domain, but I fear I may be stretching it's limits some with this.
Any insight?
The only suggestion that I can think of is to move the initial array setup into the BEGIN block and read the ordered field names from a separate template file in a loop. Then your awk program consists only of loops with no embedded data. Your external template file would be a simple newline-separated list.
BEGIN {while ((getline < "fieldfile") > 0) fldname[++maxflds] = $0}
You would still read the header line in the same way you are now, of course. However, it occurs to me that you could use an associative array and reduce the nested for loops to a single for loop. Something like (untested):
BEGIN {while ((getline < "fieldfile") > 0) fldname[$0] = ++maxflds}
NR==1 {
for (i=1;i<=NF;i++) fldpos[i] = fldname[$i]
}