How to avoid the return of system command with awk? - awk

When I use awk with system command like this :
awk 'BEGIN{ if ( system("wc -l file_1") == 0 ) {print "something"} }' text.txt >> file_1
the result of system command is writen in my file file_1 :
0 file_1
something
How to avoid that? or just to redirect the output?

You appear to be under the impression that the output of the system() function includes the stdout of the command it runs. It does not.
If you want to test only for the existence of a non-zero-sized file, you might do it using the test command (on POSIX systems):
awk '
BEGIN{
if ( system("test -s file_1") ) { # a return value of 0 is "false" to awk
print "something"
}
}' text.txt >> file_1

Related

How can I send the output of an AWK script to a file?

Within an AWK script, I'm needing to send the output of the script to a file while also printing it to the terminal. Is there a nice and tidy way I can do this without having a copy of every print redirect to the file?
I'm not particularly good at making SSCCE examples but here's my attempt at demonstrating my problem;
BEGIN{
print "This is an awk script"
# I don't want to have to do this for every print
print "This is an awk script" > thisiswhack.out
}
{
# data manip. stuff here
# ...
print "%s %s %s" blah, blah blah
# I don't want to have to do this for every print again
print "%s %s %s" blah blah blah >> thisiswhack.out
}
END{
print "Yay we're done!"
# Seriously, there has to be a better way to do this within the script
print "Yay we're done!" >> thisiswhack.out
}
Surely there must be a way to send the entire output of the script to an output file within the script itself, right?
The command to duplicate streams is tee, and we can use it inside awk:
awk '
BEGIN {tee = "tee out.txt"}
{print | tee}' in.txt
This invokes tee with the file argument out.txt, and opens a stream to this command.
The stream (and therefore tee) remains open until awk exits, or close(tee) is called.
Every time print | tee is used, the data is printed to that stream. tee then appends this data both to the file out.txt, and stdout.
The | command feature is POSIX awk. Also the tee variable isn't compulsory (you can use the string).
Of course, we can use tee outside awk too: awk ... | tee out.txt.
GNU AWK's Redirection allows sending output to command, rather than file, therefore I suggest following exploit of said feature:
awk 'BEGIN{command="tee output.txt"}{print tolower($0) | command}' input.txt
Note: I use tolower($0) for demonstration purposes. I redirect print into tee command, which does output to mentioned file and standard output, thus you should get lowercase version of input.txt written to output.txt and standard output.
If you are not confined to single awk usage then you might alternatively use tee outside, like so
awk '{print tolower($0)}' input.txt | tee output.txt
awk '
function prtf(str) {
printf "%s", str > "thisiswhack.out"
printf "%s", str
fflush()
}
function prt(str) {
prtf( str ORS )
}
{
# to print adding a newline at the end:
prt( "foo" )
# to print as-is without adding a newline:
prtf( sprintf("%s, %s, %d", $2, "bar", 17) )
}
' file
In the above we are not spawning a subshell to call any other command so it's efficient, and we're using fflush() after every print to ensure both output streams (stdout and the extra file) don't get out of sync with respect to each other (e.g. stdout displays less text than the file or vice-versa if the command is killed).
The above always overwrites the contents of "thisiswhack.out" with whatever the script outputs. If you want to append instead then change > to >>. If you want the option of doing both, introduce a variable (which I've named prtappend below) to control it which you can set on the command line, e.g. change:
printf "%s", str > "thisiswhack.out"
to:
printf "%s", str >> "thisiswhack.out"
and add:
BEGIN {
if ( !prtappend ) {
printf "" > "thisiswhack.out"
}
}
then if you do awk -v prtappend=1 '...' it'll append to thisiswhack.out instead of overwriting it.
Of course, the better approach if you're on a Unix system is to have your awk script called from a shell script with it's output piped to tee, e.g.:
#!/usr/bin/env bash
awk '
{
print "foo"
printf"%s, %s, %d", $2, "bar", 17
}
' "${#:--}" |
tee 'thisiswhack.out'
Note that this is one more example of why you should not call awk from a shebang.

how to check if file exist and not empty in awk using wildcard in filename or path

I am trying to create a small awk line that should go through several paths and in each path find a specific file that should not be empty (wildcard). If the file is not found or empty it should print "NULL".
I did some searching in stackoverflow and other places but couldn't really make it work.
Example: path is /home/test[1..5]/test.json
awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
If the test.txt is empty or does not exists it should print "NULL" but meanwhile when it is not empty it should print $2.
In the above example it will just skip the empty file and not write "NULL"!
Example execution /home/ has test1, test2, test3 path and each path has one test.txt (/home/test1/test.txt is empty):
The test.txt file in each of the /home/test* path will be empty or the below kind of text (always one line):
{"test":1033}
# awk -F"[{}]" '{ if (system("[ ! -f FILENAME ]") == 0 && NR > 0 && NF > 0) print $2; else print "NULL"}' /home/test*/test.txt
"test":1033
"test":209
File examples:
/home/test0/test.txt (not empty -> {"test":1033})
/home/test1/test.txt (empty)
/home/test2/test.txt (not empty -> {"test":209})
/home/test3/test.txt (not exist)
But for ../test1/test.txt I would like to see "NULL" but instead I see nothing!
I would like to have a printout like the below:
"test":1033
NULL
"test":209
NULL
What am I doing wrong?
BR
If I understand what you are asking correctly, there is no need for a system call. One can use ENDFILE to check to see if a file was empty.
Try this:
awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' /home/test*/test.txt
FNR is the number of records in a file. If FNR is zero at the end of a file, then that file had not records and we print NULL.
Note: Since this solution use ENDFILE, Ed Morton points out that GNU awk (sometimes called gawk) is required.
Example
Suppose that we have these three files:
$ ls -1 home/test*/test.txt
home/test1/test.txt
home/test2/test.txt
home/test3/test.txt
All are empty except home/test2/test.txt which contains:
$ cat home/test2/test.txt
first{second}
1st{2nd}
Our command produces the output:
$ awk -F"[{}]" '{print $2} ENDFILE{if(FNR==0)print "NULL"}' home/test*/test.txt
NULL
second
2nd
NULL
Test for non-existent files
for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Sample output:
$ for d in home/test*/; do [ -f "$d/test.txt" ] || echo "Missing $d/test.txt"; done
Missing home/test4//test.txt
for dir in home/test*; do
file="$dir/test.txt"
if [ -s "$file" ]; then
# exists and is non-empty
val=$( awk -F'[{}]' '{print $2}' "$file" )
else
# does not exist or is empty
val="NULL"
fi
printf '%s\n' "$val"
done

Bash how to split file on empty line with awk

I have a text file (A.in) and I want to split it into multiple files. The split should occur everytime an empty line is found. The filenames should be progressive (A1.in, A2.in, ..)
I found this answer that suggests using awk, but I can't make it work with my desired naming convention
awk -v RS="" '{print $0 > $1".txt"}' file
I also found other answers telling me to use the command csplit -l but I can't make it match empty lines, I tried matching the pattern '' but I am not that familiar with regex and I get the following
bash-3.2$ csplit A.in ""
csplit: : unrecognised pattern
Input file:
A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
Desired output:
A1.in
4
RURDDD
A2.in
6
RRULDD
KKKKKK
A3.in
26
RRRULU
Another fix for the awk:
$ awk -v RS="" '{
split(FILENAME,a,".") # separate name and extension
f=a[1] NR "." a[2] # form the filename, use NR as number
print > f # output to file
close(f) # in case there are MANY to avoid running out f fds
}' A.in
In any normal case, the following script should work:
awk 'BEGIN{RS=""}{ print > ("A" NR ".in") }' file
The reason why this might fail is most likely due to some CRLF terminations (See here and here).
As mentioned by James, making it a bit more robust as:
awk 'BEGIN{RS=""}{ f = "A" NR ".in"; print > f; close(f) }' file
If you want to use csplit, the following will do the trick:
csplit --suppress-matched -f "A" -b "%0.2d.in" A.in '/^$/' '{*}'
See man csplit for understanding the above.
Input file content:
$ cat A.in
4
RURDDD
6
RRULDD
KKKKKK
26
RRRULU
AWK file content:
BEGIN{
n=1
}
{
if(NF!=0){
print $0 >> "A"n".in"
}else{
n++
}
}
Execution:
awk -f ctrl.awk A.in
Output:
$ cat A1.in
4
RURDDD
$ cat A2.in
6
RRULDD
KKKKKK
$ cat A3.in
26
RRRULU
PS: One-liner execution without AWK file:
awk 'BEGIN{n=1}{if(NF!=0){print $0 >> "A"n".in"}else{n++}}' A.in

include library of functions in awk

There are many common functions (especially arithmetic/mathematics) that are not built into awk that I need to write myself all the time.
For example:
There is no c=min(a,b) , so in awk i constantly write c=a<b?a:b
same for maximum i.e. c=max(a,b)
same for absolute value i.e. c=abs(a) so i have to constantly write c=a>0?a:-a
and so on....
Ideally, I could write these functions into an awk source file, and "include" it into all of my instances of awk, so I can call them at will.
I looked into the "#include" functionality of GNU's gawk , but it just executes whatever is in the included script - i.e. I cannot call functions.
I was hoping to write some functions in e.g. mylib.awk, and then "include" this whenever I call awk.
I tried the -f mylib.awk option to awk, but the script is executed - the functions therein are not callable.
With GNU awk:
$ ls lib
prims.awk
$ cat lib/prims.awk
function abs(num) { return (num > 0 ? num : -num) }
function max(a,b) { return (a > b ? a : b) }
function min(a,b) { return (a < b ? a : b) }
$ export AWKPATH="$PWD/lib"
$ awk -i prims.awk 'BEGIN{print min(4,7), abs(-3)}'
4 3
$ cat tst.awk
#include "prims.awk"
BEGIN { print min(4,7), abs(-3) }
$ awk -f tst.awk
4 3
You can have multiple -f program-file options, so one can be your common functions and the other can be a specific problem solving awk script, which will have access to those functions.
awk -f common-funcs.awk -f specific.awk file-to-process.txt
I don't know if this is what you were looking for, but it's the best I've come up with. Here's an example:
$ cat common_func.awk
# Remove spaces from front and back of string
function trim(s) {
gsub(/^[ \t]+/, "", s);
gsub(/[ \t]+$/, "", s);
return s;
}
$ cat specific.awk
{ print $1, $2 }
{ print trim($1), trim($2) }
$ cat file-to-process.txt
abc | def |
2$ awk -F\| -f common_func.awk -f specific.awk file-to-process.txt
abc def
abc def
With regular awk (non-gnu) you can't mix the -f program-file option with an inline program. That is, the following won't work:
awk -f common_func.awk '{ print trim($1) }' file-to-process.txt # WRONG
As pointed out in the comments, however, with gawk you can use the -f option together with -e:
awk -f file.awk -e '{stuff}' file.txt
In case if you can't use -i (if your awk < 4.1 version ), which EdMorton suggested, make a try of below works with GNU Awk 3.1.7
--source program-text
Provide program source code in the program-text. This option allows you to mix source code in files with source code that you enter
on the command line. This is particularly useful when you have library
functions that you want to use from your command-line programs
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat primes.awk
function abs(num) { return (num > 0 ? num : -num) }
function max(a,b) { return (a > b ? a : b) }
function min(a,b) { return (a < b ? a : b) }
$ awk -f primes.awk --source 'BEGIN{print min(4,7), abs(-3)}'
4 3
on regular awk (non gnu) you can still fake a bit using the shell using a cat of the file(s) into the 'code' (generally in front, but could be everywhere since it respect the awk way of working order)
> cat /tmp/delme.awk
function PrintIt( a) { printf( "#%s\n", a )}
> echo "aze\nqsd" | awk "$( cat /tmp/delme.awk)"'{ sub( /./, ""); PrintIt( $0 )}'
#ze
#sd
With GNU awk you can use the -i command line option or from inside a script the #include directive, but if you want a POSIX solution then awk -f functions.awk -f script.awk file.txt is the way you need to go.

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?
If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).
do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}