Declaring all elements of an associative array in a single statement - AWK - awk

I am fairly new to awk and am trying to figure out how to declare all elements of an associative array in one go. For example, if I wanted to declare an associative array in Python (which is effectively the dictionary) I would do this:
numbers = {'uno': 1, 'sero': 0}
Now, in awk is it possible to convert the two lines of code below into one?
numbers["uno"] = 1
numbers["sero"] = 0

AWK doesn't have array literals as far as I know, but this script demonstrates something you can do to get close:
BEGIN {
split("uno|1|sero|0",a,"|");
for (i = 1; i < 4; i += 2) {b[a[i]] = a[i+1];}
}
END {
print b["sero"];
print b["uno"];
}
Of course, you can always make a function that could be called like
newarray("uno", 1, "sero", 0);
or like
newarray("uno|1|sero|0");

No. Best you can do is:
$ awk 'BEGIN {
# populate the "numbers" array:
split("uno:1,sero:0",a,/[:,]/)
for (i=1;i in a;i+=2)
numbers[a[i]] = a[i+1]
# print the "numbers" array:
for (i in numbers)
print i, numbers[i]
}'
uno 1
sero 0

Related

Passing arrays in awk function

I want to write a function which accepts two arguments one is a constant values and another is an array. The function finds the index of the element in the arrays and returns it.I want to call this function with multiple arrays just as below what I have tried.
BEGIN{
a[1]=2;
a[2]=4;
a[3]=3;
b[1]=4;
b[2]=2;
b[3]=6;
c[1]=5;
c[2]=1;
c[3]=6;
arr[1]=a;
arr[2]=b;
arr[3]=c
}
function pos(val,ar[]) {
for (m=1;m<=length(ar);m++) { if (val == ar[m] )
return m;
else continue }
}
{for( k=1;k<=NF;k++) { for(l=1;l<=length(arr);l++) { print "pos=" pos($i,arr[l])} } }
but I am getting errors :
fatal: attempt to use array `a' in a scalar context
Looking at the code can anyone tell me how can I achieve what I am trying to achieve using awk. The challenge I have here is to assign and array as an element to another array as in arr[1]=a and passing the the array as a parameter by referencing it with its index as in pos($i,arr[l] . I dont know how to make these statements syntactically and functionally correct in awk .
the input is :
2 4 6
3 5 6
1 2 5
and in the out put the code should return the position of the value read from the file if it is present in any of the arrays defined
output:
1 1 3
6
2 1
in first line of output indexed of corresponding elements in the array a b and c have been returned respectively . 1 is index of 2 in a , 1 is index of 4 in b and 3 is index of 6 in c and so on for the upcoming lines in the input file.
I truly don't understand what it is you're trying to do (especially why an input of 2 produces the index from a but not the index from b while an input of 4 does the reverse) but to create a multi-dimensional array arr[][] from a[], b[], and c[] with GNU awk (the only awk that supports true multi-dimensional arrays) would be:
for (i in a) arr[1][i] = a[i]
for (i in b) arr[2][i] = b[i]
for (i in c) arr[3][i] = c[i]
not just arr[1] = a, etc. Note that you're storing a copy of the contents of a[] in arr[1][], not a reference to a[], so if a[] changes then arr[1][] won't. What you might want to do instead (again GNU awk only) is store the sub-array names in arr[] and then access them through the builtin variable SYMTAB (see the man page), e.g.:
$ cat tst.awk
BEGIN{
split("2 4 3",a)
split("4 2 6",b)
split("5 1 6",c)
arr[1] = "a"
arr[2] = "b"
arr[3] = "c"
prtArr(arr)
}
function prtArr(arr, i,subArrName) {
for (i=1; i in arr; i++) {
subArrName = arr[i]
printf "arr[%d] -> %s[] =\n", i, subArrName
prtSubArr(SYMTAB[subArrName])
}
}
function prtSubArr(subArr, j) {
for (j=1; j in subArr; j++) {
print "\t" subArr[j]
}
}
.
$ awk -f tst.awk
arr[1] -> a[] =
2
4
3
arr[2] -> b[] =
4
2
6
arr[3] -> c[] =
5
1
6
Now arr[] is no longer a multi-dimensional array, it's just an array of array name strings, and the contents of a[] are only stored in 1 place (in a[]) and just referenced from SYMTAB[] indexed by the contents of arr[N] rather than copied into arr[N][].

Print smallest integer from file using awk custom function?

awk function looks like this in a file name fun.awk:
{
print small()
}
function small()
{
a[NR]=$0
smal=0
for(i=1;i<=3;i++)
{
if( a[i]<a[i+1])
smal=a[i]
else
smal=a[i+1]
}
return smal
}
The contents of awk.write:
1
23
32
The awk command is:
awk -f fun.awk awk.write
It gives me no result? Why?
I think you are going about this the wrong way. In awk, one approach might be:
NR == 1 {
small = $0
}
$0 < small {
small = $0
}
END {
print small
}
which simply simply sets small to the smallest integer we've seen so far on each line, and prints it at the end. (Note: you need to start with a initializing small on the first line.
A simpler approach might just be to sort the lines as numbers with sort, and pick the first one.

Reinitialization of awk variables

I am struggling with resetting some awk variables. I have multiple lines of the form:
one two three ... ten
with various appearances of each word in every line. I am trying to count the number of times each word is one each line, separate from the counts from the other lines.
this is what I have so far:
{ for(i=length(Num); i>0; i--)
if( Num[i] == "one" )
{
oneCount++
}
else if( Num[i] == "two" )
{
twoCount++
}
else if( Num[i] == "three" )
{
threeCount++
}
...
}
when I print out the count values, the count doesn't reinitialize with each new line. how do i fix this?
any help is much appreciated
You seem very confused. To get a count of each field in a ;-separated line would be:
awk -F';' '{
split("",cnt) # or "delete cnt" if using GNU awk.
for (i=1;i<=NF;i++) {
cnt[$i]++
}
for (word in cnt) {
print word, cnt[word]
}
}' file
Now is there anything else you need it to do?
Try initializing an array in the BEGIN portion to however many variables you'd like to count. You can run a loop in the portion to clear the array at the beginning of every new line.
Alternatively, you could just reset the value of each variable to 0 or null in the portion of the program that executes every line, but I'm guessing you have many variables.

awk switch case with constant variable

I have some problems with gawk an the switch case statement. When I use switch case with an constant string every thing works fine, but wenn I use a constant variable it doesn't.
For better explanation two examples.
This example works fine:
BEGIN {
...
}
END {
split($0,a,", ")
for (k in a)
{
switch (a[k])
{
case "COLUMN 1":
POSITION = k
print k,a[k]
break
default:
print "Error"
exit
break
}
}
This example gives me a Syntax Error:
BEGIN {
COLUMN_NAME = "COLUMN 1"
}
END {
split($0,a,", ")
for (k in a)
{
switch (a[k])
{
case COLUMN_NAME : #Syntax Error in this line
POSITION = k
print k,a[k]
break
default:
print "Error"
exit
break
}
}
I don't know if awk makes COLUMN_NAME an constant, but I did not find any way to force this.
I even try to use an if/else this works fine in both cases.
Edit:
Here is an explaination what the awk script should do. I have a CSV file looked like this:
COLUMN 1, COLUMN 2, COLUMN 3, COLUMN 4
1, 2, 3, 4
5, 6, 7, 8
...
but the file can even look like this:
COLUMN 3, COLUMN 2, COLUMN 4, COLUMN 1
1, 2, 3, 4
5, 6, 7, 8
...
I know the name of the column's, but I didn't know the positon. So I parse the column names with the split function and would use an switch to find the right position.
Here is a way to sort it out using array in awk
awk -F, 'NR==1 {for (i=1;i<=NF;i++) {split($i,t," ");c[i]=t[2]}} NR>1 {for (j=1;j<i;j++) arr[(NR-1)FS c[j]]=$j+0} END {print arr[2 FS 1]}' file
Then END prints second row, column 1
This will for first file give 5
and 8 for second file
Some more readable:
awk -F, '
NR==1 { # get the column order
for (i=1;i<=NF;i++) { # loop trough all fields
split($i,tmp," ") # get the column number
col[i]=tmp[2]} # store the column order in col
}
NR>1 { # for all data do:
for (j=1;j<i;j++) # loop trough all element
arr[(NR-1)FS col[j]]=$j+0} # store data in array arr
END {
print arr[2 FS 1]} # print data from row 2 column 1
' file
You can't use a variable the way you want in a switch statement.
Any switch statement can be expressed as a sequence of if/thens, so just use that instead, e.g.:
if (a[k] == COLUMN_NAME) {POSITION = k; print k,a[k]}
I just hit this problem... 🥴
I had to put an if statement outside of switch to handle the variable
I don't know if this happens in C as well... 🤔
🤪🔥💥💀

Awk Iterate through several Arrays in a for loop

I have created an awk program to go through the columns of a file and count each distinct word and then output totals into separate files
awk -F"$delim" {Field_Arr1[$1]++; Field_Arr2[$2]++; Field_Arr3[$3]++; Field_Arr4[$4]++};
END{\
# output fields
out_field1="top_field1"
out_field2="top_field2"
out_field3="top_field3"
out_field4="top_field4"
for( i=1; i <= NF; i++)
{
for (element in Field_Arr$i)
{
print element"\t"Field_Arr$i[element] >>out_field$i;
}
}
}' inputfile
but I don't know the appropriate syntax, so that the for loop will iterate through Field_Arr1, Field_Arr2, Field_Arr3, Field_Arr4?
I have tried using: i, $i, ${i}, {i}, "$i", and "i".
Am I trying the wrong approach or is there a way to change Field_Arr$i to Field_Arr1..4?
Thanks for the advice.
awk variables don't work that way; you'll have to do them individually by name, or use fake multidimensional arrays and parse out the components, something along the lines of:
{Field_Arr[1, $1]++; Field_Arr[2, $2]++; Field_Arr[3, $3]++; Field_Arr[4, $4]++}
END {
for (elt in Field_Arr) {
split(elt, ec, SUBSEP)
print ec[2] "\t" Field_Arr[elt] >> ("top_field" ec[1])
}
}
To count the frequencies for each column (3 in my example), try this
# Print list of word frequencies
function p_array(t,a) {
print t
for (i in a) {
print i, a[i]
}
}
{
c1[$1]++
c2[$1]++
c3[$1]++
}
END {
p_array("1st col",c1)
p_array("2nd col",c2)
p_array("3rd col",c3)
}