awk switch case with constant variable - gawk

I have some problems with gawk an the switch case statement. When I use switch case with an constant string every thing works fine, but wenn I use a constant variable it doesn't.
For better explanation two examples.
This example works fine:
BEGIN {
...
}
END {
split($0,a,", ")
for (k in a)
{
switch (a[k])
{
case "COLUMN 1":
POSITION = k
print k,a[k]
break
default:
print "Error"
exit
break
}
}
This example gives me a Syntax Error:
BEGIN {
COLUMN_NAME = "COLUMN 1"
}
END {
split($0,a,", ")
for (k in a)
{
switch (a[k])
{
case COLUMN_NAME : #Syntax Error in this line
POSITION = k
print k,a[k]
break
default:
print "Error"
exit
break
}
}
I don't know if awk makes COLUMN_NAME an constant, but I did not find any way to force this.
I even try to use an if/else this works fine in both cases.
Edit:
Here is an explaination what the awk script should do. I have a CSV file looked like this:
COLUMN 1, COLUMN 2, COLUMN 3, COLUMN 4
1, 2, 3, 4
5, 6, 7, 8
...
but the file can even look like this:
COLUMN 3, COLUMN 2, COLUMN 4, COLUMN 1
1, 2, 3, 4
5, 6, 7, 8
...
I know the name of the column's, but I didn't know the positon. So I parse the column names with the split function and would use an switch to find the right position.

Here is a way to sort it out using array in awk
awk -F, 'NR==1 {for (i=1;i<=NF;i++) {split($i,t," ");c[i]=t[2]}} NR>1 {for (j=1;j<i;j++) arr[(NR-1)FS c[j]]=$j+0} END {print arr[2 FS 1]}' file
Then END prints second row, column 1
This will for first file give 5
and 8 for second file
Some more readable:
awk -F, '
NR==1 { # get the column order
for (i=1;i<=NF;i++) { # loop trough all fields
split($i,tmp," ") # get the column number
col[i]=tmp[2]} # store the column order in col
}
NR>1 { # for all data do:
for (j=1;j<i;j++) # loop trough all element
arr[(NR-1)FS col[j]]=$j+0} # store data in array arr
END {
print arr[2 FS 1]} # print data from row 2 column 1
' file

You can't use a variable the way you want in a switch statement.
Any switch statement can be expressed as a sequence of if/thens, so just use that instead, e.g.:
if (a[k] == COLUMN_NAME) {POSITION = k; print k,a[k]}

I just hit this problem... 🥴
I had to put an if statement outside of switch to handle the variable
I don't know if this happens in C as well... 🤔
🤪🔥💥💀

Related

awk does not get multiple matches in a line with match

AWK has the match(s, r [, a]) function which according to the manual is capable of recording all occuring patterns into array "a":
...If array a is provided, a is cleared and then elements 1 through n are filled with the portions of s that match the corresponding parenthesized subexpression in r. The 0'th element of a contains the portion of s matched by the entire regular expression r. Subscripts a[n, "start"], and a[n, "length"] provide the starting index in the string and length respectively, of EACH matching substring.
I expect that the following line:
echo 123412341234 | awk '{match($0,"1",arr); print arr[0] arr[1] arr[2];)'
prints 111
But in fact "match" ignores all other matches except the first one.
Could please someone tell me please what is the proper syntax here to populate "arr" with all occurrences of "1"?
match only finds first match and stops there. You will have to run match in a loop or else use this way where we use split input on anything this is not 1:
echo '123412341234' | awk -F '[^1]+' '{print $1 $2 $3}'
111
Or using split in gnu-awk:
echo '123412341234' | awk 'split($0, a, /1/, m) {print m[1] m[2] m[3]}'
111
I would harness GNU AWK patsplit function for that task following way, let file.txt content be
123412341234
then
awk '{patsplit($0,arr,"1");print arr[1] arr[2] arr[3]}' file.txt
gives output
111
Explanation: patsplit is function which allows you to get similar effect to using FPAT variable, it does put all matches of 3rd argument into array provided as 2nd argument (clearing it if is not empty) found in string provided as 1st argument. Observe that 1st finding does goes under key 1, 2nd under 2, 3rd under 3 and so on (there is nothing under 0)
(tested in GNU Awk 5.0.1)
If sub is allowed then you can do a substitution here. Try following awk code once.
awk '{gsub(/[^1]+/,"")} 1' Input_file
patsplit() is basically same as wrapping the desired regex pattern with a custom pair of SEPs before splitting, which is what anysplit() is emulating here, while being UTF-8 friendly.
echo "123\uC350abc:\uF8FF:|\U1F921#xyz" |
mawk2x '{ print ("\t\f"($0)"\n")>>(STDERR)
anysplit($_, reFLEX_UCode8 "|[[-_!-/3-?]",___=2,__)
OFS="\t"
for(_ in __) { if (!(_%___)) {
printf(" matched_items[ %2d ] = # %-2d = \42%s\42\n",
_,_/___,__[_])
} } } END { printf(ORS) }'
123썐abc::|🤡#xyz
matched_items[ 2 ] = # 1 = "3썐"
matched_items[ 4 ] = # 2 = "::"
matched_items[ 6 ] = # 3 = "🤡#"
In the background, anysplit() is nothing all that complicated either :
xs3pFS is a 3-byte string of \301\032\365 that I assumed would be extremely rare to show up even in binary data.
gsub(patRE, xs3pFS ((pat=="&")?"\\":"") "&" xs3pFS,_)
gsub(xs3pFS "("xs3pFS")+", "",_)
return split(_, ar8, xs3pFS)
By splitting the input string in this manner, all the desired items would exist in even-numbered array indices, while the rest of the string would be distributed along odd-numbered indices,
somewhat similar to the 2nd array i.e. 4th argument in gawk's split() and patsplit() for the seps, but difference being that both the matches and the seps, whichever way you want to see them, are in the same array.
When you print out every cell in the array, you'll see :
_SEPS_[ 1 ] = # 1 = "123"
matched_items
[ 2 ] = # 1 = "썐"
_SEPS_[ 3 ] = # 2 = "abc"
matched_items
[ 4 ] = # 2 = "::"
_SEPS_[ 5 ] = # 3 = "|"
matched_items
[ 6 ] = # 3 = "🤡#"
_SEPS_[ 7 ] = # 4 = "xyz"

How to return 0 if awk returns null from processing an expression?

I currently have a awk method to parse through whether or not an expression output contains more than one line. If it does, it aggregates and prints the sum. For example:
someexpression=$'JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)'
might be the one-liner where it DOESN'T yield any information. Then,
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
printf "%d\n", a[i]
}
}'
this will yield NULL or an empty return. Instead, I would like to have it return a numeric value of $0$ if empty. How can I modify the above to do this?
Nothing in UNIX "returns" anything (despite the unfortunately named keyword for setting the exit status of a function), everything (tools, functions, scripts) outputs X and exits with status Y.
Consider these 2 identical functions named foo(), one in C and one in shell:
C (x=foo() means set x to the return code of foo()):
foo() {
printf "7\n"; // this is outputting 7 from the full program
return 3; // this is returning 3 from this function
}
x=foo(); <- 7 is output on screen and x has value '3'
shell (x=foo means set x to the output of foo()):
foo() {
printf "7\n"; # this is outputting 7 from just this function
return 3; # this is setting this functions exit status to 3
}
x=foo <- nothing is output on screen, x has value '7', and '$?' has value '3'
Note that what the return statement does is vastly different in each. Within an awk script, printing and return codes from functions behave the same as they do in C but in terms of a call to the awk tool, externally it behaves the same as every other UNIX tool and shell script and produces output and sets an exit status.
So when discussing anything in UNIX avoid using the term "return" as it's imprecise and ambiguous and so different people will think you mean "output" while others think you mean "exit status".
In this case I assume you mean "output" BUT you should instead consider setting a non-zero exit status when there's no match like grep does, e.g.:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
for (i in a) {
print a[i]
}
exit (NR < 2)
}'
and then your code that uses the above can test for the success/fail exit status rather than testing for a specific output value, just like if you were doing the equivalent with grep.
You can of course tweak the above to:
echo "$someexpression" | awk '
NR>1 {a[$4]++}
END {
if ( NR > 1 ) {
for (i in a) {
print a[i]
}
}
else {
print "$0$"
exit 1
}
}'
if necessary and then you have both a specific output value and a success/fail exit status.
You may keep a flag inside for loop to detect whether loop has executed or not:
echo "$someexpression" |
awk 'NR>1 {
a[$4]++
}
END
{
for (i in a) {
p = 1
printf "%d\n", a[i]
}
if (!p)
print "$0$"
}'
$0$

AWK: How can I print averages of consecutive numbers in a file, but skip over alphabetical characters/strings?

I've figured out how to get the average of a file that contains numbers in all lines such as:
Numbers.txt
1
2
4
8
Output:
Average: 3.75
This is the code I use for that:
awk '{ sum += $1; tot++ } END { print sum / tot; }' Numbers.txt
However, the problem is that this doesn't take into account possible strings that might be in the file. For example, a file that looks like this:
NumbersAndExtras.txt
1
2
4
8
Hello
4
5
6
Cat
Dog
2
4
3
For such a file I'd want to print the multiple averages of the consecutive numbers, ignoring the strings such that the result looks something like this:
Output:
Average: 3.75
Average: 5
Average: 3
I could devise some complicated code that might accomplish that with variables and 'if' statements and loops and whatnot, but I've been told it's easier than that given some of awk features. I'd like to know how that might look like, along with an explanation of why it works.
BEGIN runs before reading the first line from file. Set sum and count to 0.
awk 'BEGIN{ sum=0; count=0} {if ( /[a-z][A-Z]/ ) { if (count > 0) {avg = sum/count; print avg;} count=0; sum=0} else { count++; sum += $1} } END{if (count > 0) {avg = sum/count; print avg}} ' NumbersAndExtras.txt
When there is an alphabet on the line, calculate and print average so far.
And do the same in the END block that runs after processing the whole file.
Keep it simple:
awk '/^$/{next}
/^[0-9]+/{a+=$1+0;c++;next}
c&&a{print "Average: "a/c;a=c=0}
END{if(c&&a){print "Average: "a/c}}' input_file
Results:
Average: 3.75
Average: 5
Average: 3
Another one:
$ awk '
function avg(s, c) { print "Average: ", s/c }
NF && !/^[[:digit:]]/ { if (count) avg(sum, count); sum = 0; count = 0; next}
NF { sum += $1; count++ }
END {if (count) avg(sum, count)}
' <file
Note: The value of this answer in explaining the solution; other answers offer more concise alternatives.
Try the following:
Note that this is an awk command with a script specified as a multi-line shell string literal - you can paste the whole thing into your terminal to try it; while it is possible to cram this into a single line, it hurts readability and the ability to comment:
awk '
# Define output function that prints an average.
function printAvg() { print "Average: ", sum/count }
# Skip blank lines
NF == 0 { next}
# Is the line non-numeric?
/[[:alpha:]]/ {
# If this line ends a numeric block, print its
# average now and reset the variables to start the next group.
if (count) {
printAvg()
wasNum = sum = count = 0
}
# Skip to next line.
next
}
# Numeric line: set flag, sum, and increment counter.
{ sum += $1; count++ }
# Finally:
END {
# If there is a group whose average has not been printed yet,
# do it now.
if (count) printAvg()
}
' NumbersAndExtras.txt
If we condense whitespace and strip the comments, we still get a reasonably readable solution, as long as we still use multiple lines:
awk '
function printAvg() { print "Average: ", sum/count }
NF == 0 { next}
/[[:alpha:]]/ { if (count) { printAvg(); sum = count = 0 } next }
{ sum += $1; count++ }
END { if (count) printAvg() }
' NumbersAndExtras.txt

How to detect the last line in awk before END?

I'm trying to concatenate String values and print them, but if the last types are Strings and there is no change of type then the concatenation won't print:
input.txt:
String 1
String 2
Number 5
Number 2
String 3
String 3
awk:
awk '
BEGIN { tot=0; ant_t=""; }
{
t = $1; val=$2;
#if string, concatenate its value
if (t == "String") {
tot+=val;
nx=1;
} else {
nx=0;
}
#if type change, add tot to res
if (t != "String" && ant_t == "String") {
res=res tot;
tot=0;
}
ant_t=t;
#if string, go next
if (nx == 1) {
next;
}
res=res"\n"val;
}
END { print res; }' input.txt
Current output:
3
5
2
Expected output:
3
5
2
6
How can I detect if awk is reading last line, so if there won't be change of type it will check if it is the last line?
awk reads line by line hence it cannot determine if it is reading the last line or not. The END block can be useful to perform actions once the end of file has reached.
To perform what you expect
awk '/String/{sum+=$2} /Number/{if(sum) print sum; sum=0; print $2} END{if(sum) print sum}'
will produce output as
3
5
2
6
what it does?
/String/ selects line that matches String so is Number
sum+=$2 performs the concatanation with String lines. When Number occurs, print the sum and reset to zero
Like this maybe:
awk -v lines="$(wc -l < /etc/hosts)" 'NR==lines{print "LAST"};1' /etc/hosts
I am pre-calculating the number of lines (using wc) and passing that into awk as a variable called lines, if that is unclear.
Just change last line to:
END { print res; print tot;}'
awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
Explanation
y is used as a boolean, and I check at the END if the last pattern was a string and print the sum
You can actually use x as the boolean like nu11p01n73R does which is smarter
Test
$ cat file
String 1
String 2
Number 5
Number 2
String 3
String 3
$ awk '$1~"String"{x+=$2;y=1}$1~"Number"{if (y){print x;x=0;y=0;}print $2}END{if(y) print x}' file
3
5
2
6

Declaring all elements of an associative array in a single statement - AWK

I am fairly new to awk and am trying to figure out how to declare all elements of an associative array in one go. For example, if I wanted to declare an associative array in Python (which is effectively the dictionary) I would do this:
numbers = {'uno': 1, 'sero': 0}
Now, in awk is it possible to convert the two lines of code below into one?
numbers["uno"] = 1
numbers["sero"] = 0
AWK doesn't have array literals as far as I know, but this script demonstrates something you can do to get close:
BEGIN {
split("uno|1|sero|0",a,"|");
for (i = 1; i < 4; i += 2) {b[a[i]] = a[i+1];}
}
END {
print b["sero"];
print b["uno"];
}
Of course, you can always make a function that could be called like
newarray("uno", 1, "sero", 0);
or like
newarray("uno|1|sero|0");
No. Best you can do is:
$ awk 'BEGIN {
# populate the "numbers" array:
split("uno:1,sero:0",a,/[:,]/)
for (i=1;i in a;i+=2)
numbers[a[i]] = a[i+1]
# print the "numbers" array:
for (i in numbers)
print i, numbers[i]
}'
uno 1
sero 0