include library of functions in awk

include library of functions in awk - awk

There are many common functions (especially arithmetic/mathematics) that are not built into awk that I need to write myself all the time.
For example:
There is no c=min(a,b) , so in awk i constantly write c=a<b?a:b
same for maximum i.e. c=max(a,b)
same for absolute value i.e. c=abs(a) so i have to constantly write c=a>0?a:-a
and so on....
Ideally, I could write these functions into an awk source file, and "include" it into all of my instances of awk, so I can call them at will.
I looked into the "#include" functionality of GNU's gawk , but it just executes whatever is in the included script - i.e. I cannot call functions.
I was hoping to write some functions in e.g. mylib.awk, and then "include" this whenever I call awk.
I tried the -f mylib.awk option to awk, but the script is executed - the functions therein are not callable.

With GNU awk:
$ ls lib
prims.awk
$ cat lib/prims.awk
function abs(num) { return (num > 0 ? num : -num) }
function max(a,b) { return (a > b ? a : b) }
function min(a,b) { return (a < b ? a : b) }
$ export AWKPATH="$PWD/lib"
$ awk -i prims.awk 'BEGIN{print min(4,7), abs(-3)}'
4 3
$ cat tst.awk
#include "prims.awk"
BEGIN { print min(4,7), abs(-3) }
$ awk -f tst.awk
4 3

You can have multiple -f program-file options, so one can be your common functions and the other can be a specific problem solving awk script, which will have access to those functions.
awk -f common-funcs.awk -f specific.awk file-to-process.txt
I don't know if this is what you were looking for, but it's the best I've come up with. Here's an example:
$ cat common_func.awk
# Remove spaces from front and back of string
function trim(s) {
gsub(/^[ \t]+/, "", s);
gsub(/[ \t]+$/, "", s);
return s;
}
$ cat specific.awk
{ print $1, $2 }
{ print trim($1), trim($2) }
$ cat file-to-process.txt
abc | def |
2$ awk -F\| -f common_func.awk -f specific.awk file-to-process.txt
abc def
abc def
With regular awk (non-gnu) you can't mix the -f program-file option with an inline program. That is, the following won't work:
awk -f common_func.awk '{ print trim($1) }' file-to-process.txt # WRONG
As pointed out in the comments, however, with gawk you can use the -f option together with -e:
awk -f file.awk -e '{stuff}' file.txt

In case if you can't use -i (if your awk < 4.1 version ), which EdMorton suggested, make a try of below works with GNU Awk 3.1.7
--source program-text
Provide program source code in the program-text. This option allows you to mix source code in files with source code that you enter
on the command line. This is particularly useful when you have library
functions that you want to use from your command-line programs
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat primes.awk
function abs(num) { return (num > 0 ? num : -num) }
function max(a,b) { return (a > b ? a : b) }
function min(a,b) { return (a < b ? a : b) }
$ awk -f primes.awk --source 'BEGIN{print min(4,7), abs(-3)}'
4 3

on regular awk (non gnu) you can still fake a bit using the shell using a cat of the file(s) into the 'code' (generally in front, but could be everywhere since it respect the awk way of working order)
> cat /tmp/delme.awk
function PrintIt( a) { printf( "#%s\n", a )}
> echo "aze\nqsd" | awk "$( cat /tmp/delme.awk)"'{ sub( /./, ""); PrintIt( $0 )}'
#ze
#sd

With GNU awk you can use the -i command line option or from inside a script the #include directive, but if you want a POSIX solution then awk -f functions.awk -f script.awk file.txt is the way you need to go.

Related

Portable way to split an external variable containing newlines in awk?

Consider these awk commands:
#!/bin/bash
awk 'BEGIN { print split("X\nX",a,"\n") }'
awk -v s=$'X\nX' 'BEGIN { print split(s,a,"\n") }'
Results:
Linux:
2
2
macOS, FreeBSD:
2
/usr/bin/awk: newline in string X
X... at source line 1
Solaris:
2
/usr/xpg4/bin/awk: file "(null)": line 1: Newline in string
Context is:
>>> X
>>> <<<
Is there a way to work around that?
Edit:
There's not even the need to use an external variable, the following will also fail in all awk implementations but the GNU one:
awk 'BEGIN { s = "X\nX"; print split(s,a,"\n") }'

POSIX awk does not allow physical newlines in string values.
When you use C/BASH string notation like $'a\nb' then any POSIX compliant awk implementation will fail.
Even with gnu-awk, when you enable posix option following error will be returned:
awk --posix -v s=$'X\nX' 'BEGIN { print split(s,a,"\n") }'
awk: fatal: POSIX does not allow physical newlines in string values
However if you remove $'...' notation then error will not be there:
awk --posix -v s="X\nX" 'BEGIN { print split(s,a,"\n") }'
2

Bash how to split file on empty line with awk

I have a text file (A.in) and I want to split it into multiple files. The split should occur everytime an empty line is found. The filenames should be progressive (A1.in, A2.in, ..)
I found this answer that suggests using awk, but I can't make it work with my desired naming convention
awk -v RS="" '{print $0 > $1".txt"}' file
I also found other answers telling me to use the command csplit -l but I can't make it match empty lines, I tried matching the pattern '' but I am not that familiar with regex and I get the following
bash-3.2$ csplit A.in ""
csplit: : unrecognised pattern
Input file:
A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
Desired output:
A1.in
4
RURDDD
A2.in
6
RRULDD
KKKKKK
A3.in
26
RRRULU

Another fix for the awk:
$ awk -v RS="" '{
split(FILENAME,a,".") # separate name and extension
f=a[1] NR "." a[2] # form the filename, use NR as number
print > f # output to file
close(f) # in case there are MANY to avoid running out f fds
}' A.in

In any normal case, the following script should work:
awk 'BEGIN{RS=""}{ print > ("A" NR ".in") }' file
The reason why this might fail is most likely due to some CRLF terminations (See here and here).
As mentioned by James, making it a bit more robust as:
awk 'BEGIN{RS=""}{ f = "A" NR ".in"; print > f; close(f) }' file
If you want to use csplit, the following will do the trick:
csplit --suppress-matched -f "A" -b "%0.2d.in" A.in '/^$/' '{*}'
See man csplit for understanding the above.

Input file content:
$ cat A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
AWK file content:
BEGIN{
n=1
}
{
if(NF!=0){
print $0 >> "A"n".in"
}else{
n++
}
}
Execution:
awk -f ctrl.awk A.in
Output:
$ cat A1.in
4
RURDDD
$ cat A2.in
6
RRULDD
KKKKKK
$ cat A3.in
26
RRRULU
PS: One-liner execution without AWK file:
awk 'BEGIN{n=1}{if(NF!=0){print $0 >> "A"n".in"}else{n++}}' A.in

How to avoid the return of system command with awk?

When I use awk with system command like this :
awk 'BEGIN{ if ( system("wc -l file_1") == 0 ) {print "something"} }' text.txt >> file_1
the result of system command is writen in my file file_1 :
0 file_1
something
How to avoid that? or just to redirect the output?

You appear to be under the impression that the output of the system() function includes the stdout of the command it runs. It does not.
If you want to test only for the existence of a non-zero-sized file, you might do it using the test command (on POSIX systems):
awk '
BEGIN{
if ( system("test -s file_1") ) { # a return value of 0 is "false" to awk
print "something"
}
}' text.txt >> file_1

Is possible to set global AWK separator

Is it possible to set a global separator of awk, e.g., in a .conf file or environment var?
I'm handling lot of files that using customized separator, don't feel right to manually set the FS and OFS every time.
thanks.

No, best you can do is either export a shell variable and then set FS from that, e.g.:
$ cat tst.awk
BEGIN { FS=ENVIRON["AWK_FS"] }
{ print NF }
$ export AWK_FS=","
$ echo 'a,b,c' | awk -f tst.awk
3
or include a file that sets the FS:
$ cat setFS.awk
BEGIN{ FS="," }
$ cat tst.awk
#include "setFS.awk"
{ print NF }
$ echo 'a,b,c' | awk -f tst.awk
3
The "#include" construct is gawk-specific, see https://www.gnu.org/software/gawk/manual/gawk.html#Include-Files. You still have to put the same code into every script to include that file but at least the actual FS setting (and OFS and whatever else you commonly do) code will only be specified once in that included file.

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?

If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).

do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

include library of functions in awk - awk

With GNU awk you can use the -i command line option or from inside a script the #include directive, but if you want a POSIX solution then awk -f functions.awk -f script.awk file.txt is the way you need to go.

Related

Portable way to split an external variable containing newlines in awk?

Bash how to split file on empty line with awk

How to avoid the return of system command with awk?

Is possible to set global AWK separator

awk won't print new line characters

Categories

Resources