AWK function w/ a variable number of arguments - awk

How to define an AWK function w/ a variable number of arguments? I can emulate this w/ command line arguments:
awk 'BEGIN {for (i in ARGV) printf ARGV[i]" "}' 1 2 3
but BEGIN {for (i in ARGV) printf ARGV[i]" "} isn't a function (in AWK).
Currently I'm using MAWK (but can probably switch to GAWK if it would help)
NOTE: I cannot reveal the task (it's an exercise which I'm supposed to solve by myself)

There are no variadic functions in awk because they're not needed since you can just populate an array and pass that into a function:
$ cat tst.awk
BEGIN {
split("foo 17 bar",a)
foo(a)
}
function foo(arr, i,n) {
n = length(arr) # or loop incrementing n if length(arr) unsupported
for (i=1; i<=n; i++) {
printf "%s%s", arr[i], (i<n ? OFS : ORS)
}
}
$ awk -f tst.awk
foo 17 bar
or just define the function with a bunch of dummy argument names as #triplee mentions.

As per https://rosettacode.org/wiki/Variadic_function you can define a function with more arguments than you pass in; the ones you omit will come out as empty.
It's not clear from your example what you are actually trying to accomplish, but this is the answer to your actual question.
$ awk 'function handlemany(first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, eleventh, twelfth) {
> print first, second, third, fourth, fifth, sixt, seventh, eighth, ninth, tenth, eleventh, twelfth
> }
> BEGIN { handlemany("one", "two", "three") }'
one two three
This is less than ideal, of course, but there is no support for proper variadic functions / varargs in the Awk language.

Related

Extract columns by matching, rename, and assign value using AWK

I have a tab delimited csv file containing summary statistics for object lengths:
sampled. objs. obj. min. len. obj. mean. len. obj. max. len. obj. std.
50 22 60 95 5
I want the information about minimum and maximum lengths by searching matching column headers obj. min. len. and obj. max. len.. I then want to create a new csv file, comma-delimited with new column headers to get the result
object_minimum,object_maximum
22,95
I first print the new headers. Then I tried retrieving the indices of the match and then extracting from the second row using these indices:
#!/bin/awk -f
BEGIN {
cols="object_minimum:object_maximum"
FS="\t"
RS="\n"
col_count=split(cols, col_arr, ":");
for(i=1; i<=col_count; i++) printf col_arr[i] ((i==col_count) ? "\n" : ",");
}
{
for (i=1; i<=NF; i++) {
if(index($i,"obj. min. len.") !=0) {
data["object_minimum"]=i;
}
if(index($i,"obj. max. len.") !=0) {
data["object_maximum"]=i;
}
}
}
END NR==1 {
for (j=1; j<=col_count; j++) printf NF==data[j] ((i==col_count) ? "\n" : ",");
}
There could be more columns and in a different order so it is necessary to do the matching to find the position, and also I may have to select for more columns by changing cols and looking for more matches. I execute by running
awk -f awk_script.awk original.csv > new.csv
With awk:
awk 'BEGIN {FS="\t"; OFS=","}
NR==1 {for (i=1; i<=NF; i++){f[$i] = i}} # fill array with header
NR> 1 {print $(f["obj. min. len."]), $(f["obj. max. len."])}' file
Output:
22,95
Source: https://unix.stackexchange.com/a/359699/74329
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
here is one working prototype, add formatting and error checking...
$ awk -F'\t' -v OFS=, '
NR==1 {for(i=1;i<=NF;i++)
if($i=="obj. min. len.") min=i;
else if($i=="obj. max. len.") max=i;
print "min","max"}
NR==2 {print $min,$max; exit}' file
min,max
22,95
Could you please try following, completely based on your shown samples only, written and tested in GNU awk. Created an awk variable named sep="###" this could be changed as per need too.
awk -v sep="###" '
BEGIN{
OFS=","
}
FNR==1{
while(match($0,/ +obj\./)){
val=substr($0,RSTART,RLENGTH)
sub(/^ +/,"",val)
line=(line?line:"")substr($0,1,RSTART-1)sep val
$0=substr($0,RSTART+RLENGTH)
}
if(substr($0,RSTART+RLENGTH)!=""){
line=line substr($0,RSTART+RLENGTH)
}
num=split(line,arr,sep)
for(i=1;i<=num;i++){
if(arr[i]=="obj. min. len."){ min=i }
if(arr[i]=="obj. max. len."){ max=i }
}
print "object_minimum,object_maximum"
next
}
{
print $min,$max
}
' Input_file
Logical explanation: Working on the very first line of Input_file. Then using awk's match function to look for matches +obj\. in current line. In this creating a variable which has values of matched and before matched values. Once all searching of specific regex is done(means all occurrences of matched regex are found). Then splitting newly created variable(which has value of first line with separators ### assuming these are NOT present in your Input_file else change them to something else) into array. Finally going through all elements of that array and putting condition if a column is obj. min. len. then setting min variable value to that specific index number(which is actually field number for rest of the lines) and if value is obj. max. len. then setting max variable. After processing first line simply printing corresponding fields by doing $min,$max.

Find/replace within a line only if line does not contain a certain string (awk)

I'm trying to reproduce an awk command using different syntax. I have a file (test.txt) that looks like this:
>NAME_123_CONSENSUS
GACTATACA
ATACTAGA
>NAME2_48_TEST
ATAGCGA
and I'm hoping to replace all occurences of "A" with "1" using different syntax of awk. I can solve this using the following line:
awk '!/_/{gsub("A", "1"); 1' test.txt
However, I cannot get the same result using a for loop,
awk '{for(j=1; j<=NF; j++) if ($j ~ "_") print; else print gsub("A","1")}' test.txt
nor using the following input
awk '{ if ($0 ~ "_") print $0; else print gsub("A", "1"); }' test.txt
Both of these last commands give the following output. Why are they giving different output and what am I missing to make both of the last two commands give the desired output?
>NAME_123_CONSENSUS
4
4
5
>NAME2_48_TEST
3
You are incorrectly using the gsub() function here. The sub()/gsub() function return the number of substitutions made and not the modified string. You set the string to modify as the last argument and print it back
awk '{ for(j=1; j<=NF; j++) if ($j ~ "_") print; else { gsub("A","1",$0); print } }'
That said your first command is most efficient/terse way of writing this. Notice you were missing a } in the OP. It should been written as
awk '!/_/{ gsub("A", "1") }1'
Or use gensub() available in GNU Awk's that return the modified string that you can use to print. See more about it on String-Functions of GNU Awk
awk '{ for(j=1; j<=NF; j++) if ($j ~ "_") print; else print gensub(/A/, "1", "g") }'

Delete a variable in awk

I wonder if it is possible to delete a variable in awk. For an array, you can say delete a[2] and the index 2 of the array a[] will be deleted. However, for a variable I cannot find a way.
The closest I get is to say var="" or var=0.
But then, it seems that the default value of a non-existing variable is 0 or False:
$ awk 'BEGIN {if (b==0) print 5}'
5
$ awk 'BEGIN {if (!b) print 5}'
5
So I also wonder if it is possible to distinguish between a variable that is set to 0 and a variable that has not been set, because it seems not to:
$ awk 'BEGIN {a=0; if (a==b) print 5}'
5
There is no operation to unset/delete a variable. The only time a variable becomes unset again is at the end of a function call when it's an unused function argument being used as a local variable:
$ cat tst.awk
function foo( arg ) {
if ( (arg=="") && (arg==0) ) {
print "arg is not set"
}
else {
printf "before assignment: arg=<%s>\n",arg
}
arg = rand()
printf "after assignment: arg=<%s>\n",arg
print "----"
}
BEGIN {
foo()
foo()
}
$ awk -f tst.awk file
arg is not set
after assignment: arg=<0.237788>
----
arg is not set
after assignment: arg=<0.291066>
----
so if you want to perform some actions A then unset the variable X and then perform actions B, you could encapsulate A and/or B in functions using X as a local var.
Note though that the default value is zero or null, not zero or false, since its type is "numeric string".
You test for an unset variable by comparing it to both null and zero:
$ awk 'BEGIN{ if ((x=="") && (x==0)) print "y" }'
y
$ awk 'BEGIN{ x=0; if ((x=="") && (x==0)) print "y" }'
$ awk 'BEGIN{ x=""; if ((x=="") && (x==0)) print "y" }'
If you NEED to have a variable you delete then you can always use a single-element array:
$ awk 'BEGIN{ if ((x[1]=="") && (x[1]==0)) print "y" }'
y
$ awk 'BEGIN{ x[1]=""; if ((x[1]=="") && (x[1]==0)) print "y" }'
$ awk 'BEGIN{ x[1]=""; delete x; if ((x[1]=="") && (x[1]==0)) print "y" }'
y
but IMHO that obfuscates your code.
What would be the use case for unsetting a variable? What would you do with it that you can't do with var="" or var=0?
An unset variable expands to "" or 0, depending on the context in which it is being evaluated.
For this reason, I would say that it's a matter of preference and depends on the usage of the variable.
Given that we use a + 0 (or the slightly controversial +a) in the END block to coerce the potentially unset variable a to a numeric type, I guess you could argue that the natural "empty" value would be "".
I'm not sure that there's too much to read in to the cases that you've shown in the question, given the following:
$ awk 'BEGIN { if (!"") print }'
5
("" is false, unsurprisingly)
$ awk 'BEGIN { if (b == "") print 5 }'
5
(unset variable evaluates equal to "", just the same as 0)

Awk reverse both lines and words

I'm new to programming language and stuff
so I have to reverse with awk all the lines and as well all the words in those lines, from a file and print them out.
"File1" to reverse:
aa bb cc
foo do as
And the output printing of the "File1" should be this:
as do foo
cc bb aa
I tried this for word reverse in each line:
for (i=NF; i>1; i--) printf("%s ",$i); printf("%s\n",$1)
but if I want to print the reversed lines I have to do this
{a[NR]=$0
}END{for(i=NR; i; i--) print a[i]}
I need to work with two files with this command in terminal:
awk -f commandFile FileToBePrinted
The problem is I'm beginer in all this and I don't know how to combine those two.
Thanks!
Kev's solution looks to reverse the text in each word. You example output doesn't show that, but his key point is to use a function.
You have the code you need, you just need to rearrange it a little.
cat file1
aa bb cc
foo do as
cat commandFile
function reverse( line ) {
n=split(line, tmpLine)
for (j=n; j>0; j--) {
printf("%s ",tmpLine[j] )
}
}
# main loop
{ a[NR]=$0 }
# print reversed array
END{ for(i=NR; i>0; i--) printf( "%s\n", reverse(a[i]) ) }
Running
awk -f commandFile file1
output
as do foo
cc bb aa
There were a couple of minor changes I made, using n=split(line, tmpLine) ... print tmpLine[j], is a common method of parsing a line of input in a function to print it out. I don't think the $1 vars have scope from a value passed in from an array (your a[i] value), so I changed it to split..tmpLine[j]. I also found that the 'i' variable from END section was kept in scope in the function reverse, so I changed that to j to disambiguate the situation.
I had to figure out a few things, so below is the debug version that I used.
If you're going to have access to gawk, then you'll do well to learn how to use the debugger that is available. If you're using awk/gawk/nawk on systems without a debugger, then this is one method for understanding what is happening in your code. If you're redirecting your programs output to a file or pipe, AND if you system supports "/dev/stderr" notation, you could print the debug lines there, i.e.
#dbg print "#dbg0:line=" line > "/dev/stderr"
Some systems have other notations for accessing stderr, so if you'll be doing this much, it is worthwhile to find out what is available.
cat commandFile.debug
function reverse( line ) {
n=split(line, tmpLine)
#dbg print "#dbg0:line=" line
#dbg print "#dbg1:n=" n "\tj=" j "\ttmpLine[j]=" tmpLine[j]
for (j=n; j>0; j--) {
#dbg print "#dbg2:n=" n "\tj=" j "\ttempLine[j]=" tmpLine[j]
printf("%s ",tmpLine[j] )
}
}
# main loop
{ a[NR]=$0 }
# print reversed array
#dbg END{ print "AT END"; for(i=NR; i>0; i--) printf( "#dbg4:i=%d\t%s\n%s\n", i, a[i] , reverse(a[i])
) }
END{ for(i=NR; i>0; i--) printf( "%s\n", reverse(a[i]) ) }
I hope this helps.
awk '
{
x=reverse($0) (x?"\n":"") x
}
END{
print x
}
function reverse(s, p)
{
for(i=length(s);i>0;i--)
p=p substr(s,i,1)
return p
}' File1
use tac to reverse the lines of the file, then awk to reverse the words.
tac filename | awk '{for (i=NF; i>1; i--) printf("%s ",$i); printf("%s\n",$1)}'
You could also use something like Perl that has list-reversing functions built-in:
tac filename | perl -lane 'print join(" ", reverse(#F))'

Use Awk to Print every character as its own column?

I am in need of reorganizing a large CSV file. The first column, which is currently a 6 digit number needs to be split up, using commas as the field separator.
For example, I need this:
022250,10:50 AM,274,22,50
022255,11:55 AM,275,22,55
turned into this:
0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55
Let me know what you think!
Thanks!
It's a lot shorter in perl:
perl -F, -ane '$,=","; print split("",$F[0]), #F[1..$#F]' <file>
Since you don't know perl, a quick explanation. -F, indicates the input field separator is the comma (like awk). -a activates auto-split (into the array #F), -n implicitly wraps the code in a while (<>) { ... } loop, which reads input line-by-line. -e indicates the next argument is the script to run. $, is the output field separator (it gets set iteration of the loop this way, but oh well). split has obvious purpose, and you can see how the array is indexed/sliced. print, when lists as arguments like this, uses the output field separator and prints all their fields.
In awk:
awk -F, '{n=split($1,a,""); for (i=1;i<=n;i++) {printf("%s,",a[i])}; for (i=2;i<NF;i++) {printf("%s,",$i)}; print $NF}' <file>
I think this might work. The split function (at least in the version I am running) splits the value into individual characters if the third parameter is an empty string.
BEGIN{ FS="," }
{
n = split( $1, a, "" );
for ( i = 1; i <= n; i++ )
printf("%s,", a[i] );
sep = "";
for ( i = 2; i <= NF; i++ )
{
printf( "%s%s", sep, $i );
sep = ",";
}
printf("\n");
}
here's another way in awk
$ awk -F"," '{gsub(".",",&",$1);sub("^,","",$1)}1' OFS="," file
0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55
Here's a variation on a theme. One thing to note is it prints the remaining fields without using a loop. Another is that since you're looping over the characters in the first field anyway, why not just do it without using the null-delimiter feature of split() (which may not be present in some versions of AWK):
awk -F, 'BEGIN{OFS=","} {len=length($1); for (i=1;i<len; i++) {printf "%s,", substr($1,i,1)}; printf "%s", substr($1,len,1);$1=""; print $0}' filename
As a script:
BEGIN {FS = OFS = ","}
{
len = length($1);
for (i=1; i<len; i++)
{printf "%s,", substr($1, i, 1)};
printf "%s", substr($1, len, 1)
$1 = "";
print $0
}