Add #y to the end of the string if it contains x - awk

I have a file with the following contents:
# cat.txt
AB-1 text: foo
AB-1 test3: test cat dog
AB-1 test4: abc
# cat2.txt
AB-4 test: qwerty
AB-5 test2: Foo bar
AB-6 abc: Dog
and try to get the following from it, if the string contains x then add #y to its end:
# cat.txt
AB-1 text: foo#foo #text
AB-1 test3: test cat dog#animal
AB-1 test4: abc
# cat2.txt
AB-4 test: qwerty
AB-5 test2: Foo bar#foo
AB-6 abc: Dog#animal
I ran into the following problems:
ignore strings starting with #
two keywords in one line is possible
case sensitive
I solved the last two points, but not sure if it is logical and proper, anaway:
awk '{print $0 (tolower($0) ~ /foo/ ? "#foo " : "" ) (tolower($0) ~ /cat|dog/ ? "#animal " : "") (tolower($0) ~ /text/ ? "#text " : "")}' ./file.txt
Thus, at this point is the following result:
# cat.txt#animal
AB-1 text: foo#foo #text
AB-1 test3: test cat dog#animal
AB-1 test4: abc
# cat2.txt#animal
AB-4 test: qwerty
AB-5 test2: Foo bar#foo
AB-6 abc: Dog#animal
The result is pretty close to what is needed, but the following points still cause concern:
As you can see i am using tolower($0) ~ to make the concept case insensitive. I use macOS and have not succeeded using -v IGNORECASE=1 flag or BEGIN{IGNORECASE = 1}, for some reason it also does not work in my concept. Is it possible to improve this?
Is there any way to ignore strings that start with #?

You may use this awk:
awk '/^#/ { # if line starts with #
print # print it
next # skip to next line
}
{
lcr = tolower($0)
print $0 (lcr ~ /foo/ ? "#foo " : "" ) \
(lcr ~ /cat|dog/ ? "#animal " : "") (lcr ~ /text/ ? "#text " : "")
}' file
# cat.txt
AB-1 text: foo#foo #text
AB-1 test3: test cat dog#animal
AB-1 test4: abc
# cat2.txt
AB-4 test: qwerty
AB-5 test2: Foo bar#foo
AB-6 abc: Dog#animal

macOS and have not succeeded using -v IGNORECASE=1 flag or
BEGIN{IGNORECASE = 1}, for some reason it also does not work in my
concept. Is it possible to improve this?
IGNORECASE is GNU AWK specific feature, so if this does not work on your machine you are not using GNU AWK. Use awk --version to detect what version of AWK are you actually using. macappstore.org suggest that there exist gawk for MacOS, but I do not have ability to test it.

Related

awk if else condition when string is not present in the file

I am trying to use awk to convert a multi-line block to a single record and then trying to run a search operation on it. I am running lspci -v as the input command, but to I have mock the data for this question.
Input data:
name: foobar
data: 123 bad
name: foozoo
data: 123 good
name: foozoo
data: 123 bad
name: zoobar
data: 123 good
name: barzpp
data: 123 bad
First I converted the input data that was in blocks into single-line records.
awk -v RS='' '{$1=$1}1' xx
name: foobar data: 123 bad
name: foozoo data: 123 good
name: foozoo data: 123 bad
name: zoobar data: 123 good
name: barzpp data: 123 bad
Now I am searching for a string "foozoo" and this gives me desired results. Here, I am first checking if foozoo is present on the line, and then I am checking if .*good is present in the same line. This works fine.
awk -v RS='' -v var=foozoo '{$1=$1}; {if(match($0,var)) if(match($0,var ".*good")) print var " is good"; else print var " is missing"}' xx
foozoo is good
foozoo is missing
Now, when I supply a non-existing string the awk will return nothing, which make sense as there is no else block.
awk -v RS='' -v var=THIS_DOES_NOT_EXIST '{$1=$1}; {if(match($0,var)) if(match($0,var ".*good")) print var " is good"; else print var " is missing"}' xx
When I put else block and search for an existing, string in the input. I get this, I do not want this. I only want the foozoo is good and foozoo is bad lines.
awk -v RS='' -v var=foozoo '{$1=$1}; {if(match($0,var)) {if(match($0,var ".*good")) print var " is good"; else print var " is missing"} else {print "NON-EXISTING_DATA_REQUESTED"}}' xx
NON-EXISTING_DATA_REQUESTED
foozoo is good
foozoo is missing
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
Similarly, when I run for non-existing data, I get the line NON-EXISTING_DATA_REQUESTED for each, record, how to print just one line saying data does not exist.
awk -v RS='' -v var=monkistrying '{$1=$1}; {if(match($0,var)) {if(match($0,var ".*good")) print var " is good"; else print var " is missing"} else {print "NON-EXISTING_DATA_REQUESTED"}}' xx
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
NON-EXISTING_DATA_REQUESTED
Here's that last script above formatted legibly by gawk -o-:
{
$1 = $1
}
{
if (match($0, var)) {
if (match($0, var ".*good")) {
print var " is good"
} else {
print var " is missing"
}
} else {
print "NON-EXISTING_DATA_REQUESTED"
}
}
It sounds to me that you only want to print NON-EXISTING_DATA_REQUESTED if no matches (foozoo and good) are found and then only print one occurrence of NON-EXISTING_DATA_REQUESTED; if this is correct, one idea would be to keep track of the number of matches and in an END{...} block if that count is zero then print the single occurrence of NON-EXISTING_DATA_REQUESTED ...
Found a match:
awk -v RS='' -v var=foozoo '
{ $1=$1 }
{ if(match($0,var)) {
# found++ # uncomment if both "is good" AND "is missing" should be considered as "found"
if(match($0,var ".*good"))
{ print var " is good"; found++ } # remove "found++" if the previous line is uncommented
else
{ print var " is missing" }
}
}
END { if (!found) print "NON-EXISTING_DATA_REQUESTED" }
' xx
foozoo is good
foozoo is missing
Found no matches:
awk -v RS='' -v var=monkistrying '
{ $1=$1 }
{ if(match($0,var)) {
# found++
if(match($0,var ".*good"))
{ print var " is good"; found++ }
else
{ print var " is missing" }
}
}
END { if (!found) print "NON-EXISTING_DATA_REQUESTED" }
' xx
NON-EXISTING_DATA_REQUESTED
There's no need to compress your records onto individual lines, that's just wasting time and potentially making the comparisons harder, and by using match() you're treating var as a regexp and doing a partial record comparison when it looks like you just want a string full-field comparison. Try match($0,var) when the input contains badfoozoohere and foozoo given -v var=foozoo to see one way in which the way you're using match() will fail (there are several others). Also since you aren't using RSTART or RLENGTH, using match($0,var) instead of $0 ~ var was inefficient anyway.
$ cat tst.awk
BEGIN { RS="" }
$2 == var {
print var, "is", ( $NF == "good" ? "good" : "missing" )
found = 1
}
END {
if ( !found ) {
print "NON-EXISTING_DATA_REQUESTED"
}
}
$ awk -v var='foozoo' -f tst.awk file
foozoo is good
foozoo is missing
$ awk -v var='monkistrying' -f tst.awk file
NON-EXISTING_DATA_REQUESTED
single-pass awk based solution w/o needing to transform the data:
bytes xC0 \300, xC1 \301, and xF9 \371 aren't UTF-8 valid,
so chances of them appearing in input data are absolutely minuscule
INPUT
name: foobar
data: 123 bad
name: foozoo
data: 123 good
name: foozoo
data: 123 bad
name: zoobar
data: 123 good
name: barzpp
data: 123 bad
CODE (gawk, mawk 1/2, or LC_ALL=C nawk)
{m,n~,g}awk '
BEGIN {
______ = "NON-EXISTING_DATA_REQUESTED\n"
FS = "((data|name): ([0-9]+ )?|\n)+"
RS = "^$" (ORS = _)
___ = "\300"
____ = "\371"
_____ =(_="\301")(__="foozoo")(\
OFS = _)
} ! ( NF *= /name: foozoo[ \n]/) ? $NF = ______\
: gsub(_____ "bad"_, (_)(___)_) + \
gsub(_____ "good"_,(_)(____)_) + gsub("[\1-~]+","")+\
gsub( ___, __ " is missing\n") + \
gsub(____, __ " is " "good\n") + gsub((___)"|"(_)("|")____,"")'
OUTPUT
foozoo is good
foozoo is missing

Awk: I want to use the input filename to generate an output file with same name different extension

I have a script that looks like this:
#! /bin/awk -f
BEGIN { print "start" }
{ print $0 }
END { print "end" }
Call the script like this: ./myscript.awk test.txt
Pretty simple - takes a file and adds "start" to the start and "end" to the end.
Now I want to take the input filename, lets call it test.txt, and print the output to a file called test.out.
So I tried to print the input filename:
BEGIN { print "fname: '" FILENAME "'" }
But that printed: fname: '' :(
The rest I can figure out I think, I have this following to print to a hard-coded filename:
#! /bin/awk -f
BEGIN { print "start" > "test.out" }
{ print $0 >> "test.out" }
END { print "end" >> "test.out" }
And that works great.
So the questions are:
how do I get the input filename?
Assuming somehow I get the input file name in a variable, e.g. FILENAME which contains "test.txt" how would I make another variable, e.g. OUTFILE, which contains "test.out"?
Note: I will be doing much more awk processing so please don't suggest to use sed or other languages :))
Try something like this:
#! /bin/awk -f
BEGIN {
file = gensub(".txt",".out","g",ARGV[1])
print "start" > file
}
{ print $0 >> file }
END {
print "end" >> file
close(file)
}
I'd suggest to close() the file too in the END{} statement. Good call to Sundeep for pointing out that FILENAME is empty in BEGIN.
$ echo 'foo' > ip.txt
$ awk 'NR==1{op=FILENAME; sub(/\.[^.]+$/, ".log", op); print "start" > op}
{print > op}
END{print "end" > op}' ip.txt
$ cat ip.log
start
foo
end
Save FILENAME to a variable, change the extension using sub and then print as required
From gawk manual
Inside a BEGIN rule, the value of FILENAME is "", because there are no input files being processed yet
If you're using GNU awk (gawk), you can use the patterns BEGINFILE and ENDFILE
awk 'BEGINFILE{
outfile=FILENAME;
sub(".txt",".out",outfile);
print "start" > outfile
}
ENDFILE{
print "stop" >outfile
}' file1.txt file2.txt
You can then use the variable outfile your the main {...} loop.
Doing so will allow you to process more that 1 file in a single awk command.

How to evaluate or process if statements in data?

Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

AWK - execute string as command?

This command prints:
$ echo "123456789" | awk '{ print substr ($1,1,4) }'
1234
Is it possible to execute a string as command? For example, this command:
echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
Result:
$ echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
awk: {a="substr"; print a ($1,1,4) }
awk: ^ syntax error
EDIT:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
bolek#bolek-desktop:~/Pulpit$ echo "123456789" | gawk -f tst.awk
gawk: tst.awk:3: { a="my_substr"; print #a($1,1,4) }
gawk: tst.awk:3: ^ nieprawidłowy znak '#' w wyrażeniu
bolek#bolek-desktop:~/Pulpit$
I don't think it is possible to do that in awk directly, but you can get a similar effect by using the shell. Recall that the awk program is given as a string, and strings are concatenated in the shell just by writing them next to one another. Thus, you can do this:
a=substr
echo "123456789" | awk '{ print '"$a"'($1, 1, 4) }'
resulting in 1234.
You can call user-defined functions via variables in GNU awk using indirect function calls, see http://www.gnu.org/software/gawk/manual/gawk.html#Indirect-Calls
$ cat tst.awk
function foo() { print "foo() called" }
function bar() { print "bar() called" }
BEGIN {
the_func = "foo"
#the_func()
the_func = "bar"
#the_func()
}
$ gawk -f tst.awk
foo() called
bar() called
Unfortunately due to internal implementation issues, if you want to call builtin functions that way then you need to write a wrapper for each:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
$ echo "123456789" | gawk -f tst.awk
1234