Running awk from inside CMake execute_process - awk

I am trying to run awk inside CMake execute_process and I am somehow failing to get the expected result. I think I must be running afoul of some escaping rules, but I am not sure what else I need to escape here. Let me demonstrate by way of an MWE.
Consider the following version.h (header guard left out for brevity):
#define LIB_MAJVER 1
#define LIB_MINVER 2
#define LIB_PATCH 3
#define LIB_BUILD 4
Now consider the following straightforward awk command:
awk '$1 ~ /^#define/ && $2 ~ /^LIB_(MAJVER|MINVER|PATCH|BUILD)$/ { print "set(${PRJNAME}_" $2 " " $3 ")" }' version.h
The output is expected to look like this (and does when executed from the shell):
set(${PRJNAME}_LIB_MAJVER 2)
set(${PRJNAME}_LIB_MINVER 9)
set(${PRJNAME}_LIB_PATCH 9)
set(${PRJNAME}_LIB_BUILD 1)
... which is what I intend to include() right after generating it from the version header.
Now consider the following CMakeLists.txt:
cmake_minimum_required(VERSION 3.1)
set(PRJNAME foobar)
project (${PRJNAME})
find_program(AWK awk mawk gawk)
if(AWK MATCHES ".+-NOTFOUND")
message(FATAL_ERROR "FATAL: awk (and mawk and gawk) could not be found (${AWK}).")
else()
execute_process(
COMMAND "${AWK}" '$1 ~ /^#define/ && $2 ~ /^LIB_(MAJVER|MINVER|PATCH|BUILD)$/\ { print "set(${PRJNAME}_" $2 " " $3 ")" }' "${CMAKE_SOURCE_DIR}/version.h"
RESULT_VARIABLE AWK_EXITCODE
OUTPUT_FILE "${CMAKE_CURRENT_BINARY_DIR}/version.cmake"
)
message(STATUS "Exit code from awk: ${AWK_EXITCODE}")
include(${CMAKE_CURRENT_BINARY_DIR}/version.cmake)
endif()
The output of execute_process is:
awk: cmd. line:1: '$1
awk: cmd. line:1: ^ invalid char ''' in expression
... which suggests that what I am trying to execute gets passed verbatim rather than arguments being stripped of their outer quotes. So I didn't even bother to escape the occurrences of $ in the awk command, which I expect have to be escaped.
Now one way to go about this would be to dump this into a shell script as a file (and during development verify that it works with the CMake versions we use). Another should be to use /bin/sh -c, but that raises the question how I can pass that one long command string ...
However, this seems cumbersome and redundant, given there should be a way to achieve what I want.
NB: I am using CMake 3.12.2 on Linux.
I should add that I tried to replace the COMMAND string by the following to test my hypothesis of being able to pull this off with /bin/sh -c.
COMMAND /bin/sh -c [[[ "${AWK}" '$1 ~ /^#define/ && $2 ~ /^LIB_(MAJVER|MINVER|PATCH|BUILD)$/\ { print "set(${PRJNAME}_" $2 " " $3 ")" }' "${CMAKE_SOURCE_DIR}/version.h" ]]]
The result was this error:
CMake Error at CMakeLists.txt:11:
Syntax Error in cmake code at column 180
Argument not separated from preceding token by whitespace.
-- Configuring incomplete, errors occurred!

The comment by Tsyvarev helped me figuring out what was going on (reproduced here, since the actual comment may go away):
Unlike to double quote ("), single quote (') has no special meaning in
CMake. This is why it separates words in '...' argument. Inside double
quoted string, double quotes can be escaped with \".
So I rewrote the COMMAND as follows:
cmake_minimum_required(VERSION 3.1)
set(PRJNAME foobar)
project (${PRJNAME})
find_program(AWK awk mawk gawk)
if(AWK MATCHES ".+-NOTFOUND")
message(FATAL_ERROR "FATAL: awk (and mawk and gawk) could not be found (${AWK}).")
else()
execute_process(
COMMAND /bin/sh -c "\"${AWK}\" '$1 ~ /^#define/ && $2 ~ /^LIB_(MAJVER|MINVER|PATCH|BUILD)$/\ { print \"set(\${PRJNAME}_\" $2 \" \" $3 \")\" }' \"${CMAKE_SOURCE_DIR}/version.h\""
RESULT_VARIABLE AWK_EXITCODE
OUTPUT_FILE "${CMAKE_CURRENT_BINARY_DIR}/version.cmake"
)
message(STATUS "Exit code from awk: ${AWK_EXITCODE}")
include(${CMAKE_CURRENT_BINARY_DIR}/version.cmake)
endif()
The gist is that I am passing the whole awk command to /bin/sh -c, quoted by double quotes and any quotes embedded inside the outer double quotes needed to be escaped, just like the $ character.

Related

GNU awk on win-7 cmd, won't redirect output to file

If relevant I have GNU awk V 3.1.6 downloaded directly from GNU pointed source in sourceforge.
I am getting a page of URLs using wget for windows. After prcoessing the incoming file, I reduce it to single line, from which I have to extract a key value, which is quite a long string. The final line looks something like this:
<ENUM_TAG>content"href:e#5nUtw3Fc^b=tZjqpszvja$sb=Lp4YGH=+J_XuupctY9zE9=&KNWbphdFnM3=x4*A#a=W4YXZKV3TMSseQx66AHz9MBwdxY#B#&57t3%s6ZyQz3!aktRNzcWeUm*8^$B6L&rs5X%H3C3UT&BhnhXgAXnKZ7f2Luy*jYjRLLwn$P29WzuVzKVnd3nVc2AKRFRPb79gQ$w$Nea6cA!A5dGRQ6q+L7QxzCM%XcVaap-ezduw?W#YSz!^7SwwkKc"</ENUM_TAG>
I need the long string between the two " signs.
So I use this construct with awk
type processedFile | awk -F "\"" "{print $2}"
and I get the output as expected
href:e#5nUtw3Fc^b=tZjqpszvja$sb=Lp4YGH=+J_XuupctY9zE9=&KNWbphdFnM3=x4*A#a=W4YXZKV3TMSseQx66AHz9MBwdxY#B#&57t3%s6ZyQz3!aktRNzcWeUm*8^$B6L&rs5X%H3C3UT&BhnhXgAXnKZ7f2Luy*jYjRLLwn$P29WzuVzKVnd3nVc2AKRFRPb79gQ$w$Nea6cA!A5dGRQ6q+L7QxzCM%XcVaap-ezduw?W#YSz!^7SwwkKc
but when I run the same command with output redirected to a file, such as
type processedFile | awk -F "\"" "{print $2}" > tempDummy
I get this error message:
awk: cmd. line:1: fatal: cannot open file `>' for reading (Invalid argument)
I am thinking the \" field separator is causing me some grief and making the last " character as a non-closed string value, but I am not sure how to make this right. The same construct runs on my centos box perfectly well by the way.
Any pointers are greatly appreciated. I tried reading all the readme files I could find but none of them touches the output redirection.
Yes, you have problems with how cmd parser deals with where quoted areas start/end. What cmd sees is
awk -F "\"" "{print $2}" > tempDummy
^-^^-^ ^-------------
1 2 3
that is, three quoted areas. As the > falls inside a quoted area it is not handled as a
redirection operator, it is an argument to the command in the rigth side of the pipe.
This can be solved by just escaping (^ is cmd's general escape character) a quote to ensure cmd properly generates the final command after parsing the line and that the redirection is not part of the awk command
type processedFile | awk -F ^"\"" "{print $2}" > tempDummy
^^ ^..........^
Or you can reorder the command to place the redirection operation where it could not interfere
type processedFile | > tempDummy awk -F "\"" "{print $2}"
but while this works using this approach may later fail in other cases because the awk code ({print $2}) is placed in an unquoted area.
There is a simpler, standard, portable way of doing it without having to deal with quote escaping: instead of passing the quote as argument it is better to use the awk string handling and just include the escape sequence of the quote character
type processedFile | awk -F "\x22" "{print $2}" > tempDummy
You were close. The issue here is that you are mixing awk redirection with cmd one.
For completness sake I'm using MSYS2 awk version (version should not matter in this issue):
awk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)
Windows version is in this case irrelevant - will work both on Win7 and Win10
Your command:
type processedFile | awk -F "\"" "{print $2}" > tempDummy
uses > which you expect to be a cmd.exe redirection, but awk expects a file, thus you get the error: awk: cmd. line:1: fatal: cannot open file ``>'
1) Fixing the redirection
You can fix that by doing the redirection directly at awk:
type processedFile | awk -F "\"" "{ print $2 > "tempDummy"; }"
2) Using awk to read the file
The type command is here superfluous as you can use directly awk to read the file:
awk -F "\"" "{ print $2 > "tempDummy"; }" processedFile
Don't forget note: What is important to note is that GNU utils are case sensitive but the default filesystem settings at windows is case-insensitive.

Why is field separator taken into account differently if set before or after the expression?

The code print split("foo:bar", a) returns how many slices did split() when trying to cut based on the field separator. Since the default field separator is the space and there is none in "foo:bar", the result is 1:
$ awk 'BEGIN{print split("foo:bar",a)}'
1
However, if the field separator is ":" then the result is obviously 2 ("foo" and "bar"):
$ awk 'BEGIN{FS=":"; print split("foo:bar", a)}'
2
$ awk -F: 'BEGIN{print split("foo:bar", a)}'
2
However, it does not if FS is defined after the Awk expression:
$ awk 'BEGIN{print split("foo:bar", a)}' FS=":"
1
If I print it not in the BEGIN block but when processing a file, the FS is already taken into account:
$ echo "bla" > file
$ awk '{print split("foo:bar",a)}' FS=":" file
2
So it looks like FS set before the expression is already taken into account in the BEGIN block, while it is not if defined after.
Why is this happening? I could not find details on this in GNU Awk User's Guide → 4.5.4 Setting FS from the Command Line. I am working on GNU Awk 5.
This feature is not inherent to GNU awk but is POSIX.
Calling convention:
The awk calling convention is the following:
awk [-F sepstring] [-v assignment]... program [argument...]
awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]...
[argument...]
This shows that any option (flags -F,-v,-f) passed to awk should occur before the program definition and possible arguments. This shows that:
# this works
$ awk -F: '1' /dev/null
# this fails
$ awk '1' -F: /dev/null
awk: fatal: cannot open file `-F:' for reading (No such file or directory)
Fieldseparators and assignments as options:
The Standard states:
-F sepstring: Define the input field separator. This option shall be equivalent to: -v FS=sepstring
-v assignment:
The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.
source: POSIX awk standard
So, if you define a variable assignment or declare a field separator using the options, BEGIN will know them:
$ awk -F: -v a=1 'BEGIN{print FS,a}'
: 1
What are arguments?:
The Standard states:
argument: Either of the following two types of argument can be intermixed:
file
A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no file operands are specified, or if a file operand is '-', the standard input shall be used.
assignment
An <snip: extremely long sentence to state varname=varvalue>, shall specify a variable assignment rather than a pathname. <snip: some extended details on the meaning of varname=varvalue> Each such variable assignment shall occur just prior to the processing of the following file, if any. Thus, an assignment before the first file argument shall be executed after the BEGIN actions (if any), while an assignment after the last file argument shall occur before the END actions (if any). If there are no file arguments, assignments shall be executed before processing the standard input.
source: POSIX awk standard
Which means that if you do:
$ awk program FS=val file
BEGIN will not know about the new definition of FS but any other part of the program will.
Example:
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}END{print "END",a,""}' FS=: a=1 /dev/null
BEGIN| ||
END|:|1|
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}
{print "ACTION",FS,a,""}
END{print "END",a,""}' FS=: a=1 <(echo 1) a=2
BEGIN| ||
ACTION|:|1|
END|:|2|
See also:
GNU awk manual: Section Other arguments for an understanding how GNU awk interprets the above.
Because you can set the variable individually for each file you process, and BEGIN happens before any of that.
bash$ awk '{ print NF }' <(echo "foo:bar") FS=: <(echo "foo:bar")
1
2

variable declaration in awk command

I am facing the problem in awk command. Actually I used a variable DELETION_COMMAND and value of that variable is rm -rf. After that I am trying to execute the below line then it gives an error. While if am using the rm as a value of same variable DELETION_COMMAND. then it works fine.
awk '{print "'${DELETION_COMMAND}'"" ""'${COMPLETE_PATH}'""/"$1"/*"}' ${DB_FEED_FILE} > ${TEMP_FEED_FILE}
Error :
awk: {print "rm
awk: ^ unterminated string
Please suggest. Where am I wrong?
The highlighing already indicates your error. You put the awk expression between single quotes and then uses single quotes in the expression. Awk thinks your expression is this:
awk '{print "'
To solve this, escape the single quotes using a backslash.
Use the -v flag to pass shell vars to awk, and don't forget the quoting:
awk -v com="${DELETION_COMMAND}" -v path="${COMPLETE_PATH}" '{ print com, path, "/", $1, "/" }' "${DB_FEED_FILE}" > "${TEMP_FEED_FILE}"

awk use a command line variable

awk -F, -f awkfile.awk -v mysearch="search term"
I am trying to use the above command from terminal and use search as the search term in the awk program. My awk program runs perfectly fine while actually assigning the search term inside of the program but I am wondering how to get the variable search to be used?
example of the line it's used at if($j ~ /mysearch/){, this does not work at setting the search term, but actually searching for the string mysearch.
Just remove the slashes:
$j ~ mysearch
This is not ideal, but I suggest to write a bash script, which takes in the search term, replace that search term in the awk script, then run the script. For example:
$ cat dosearch.sh
sed "s/XXX/$1/" awktemplate.awk > awkfile.awk
awk -f awkfile.awk data.txt
$ cat awktemplate.awk
{
j = 1
if ($j ~ /XXX/) {
# Do something, such as
print "Found:", $0
}
}
$ cat data.txt
foo here
bar there
xyz everywhere
$ ./dosearch.sh foo
Found: foo here
$ ./dosearch.sh bar
Found: bar there
In the above example, the awk template contains "XXX" as a search term, the bash script replaces that search term with the first parameter, then invoke awk on the modified script.
$ cat input
tinky-winky
dipsy
laa-laa
noo-noo
po
$ teletubby='po'
$ awk -v "regexp=$teletubby" '$0 ~ regexp' input
po
Note that anything could go into the shell-variable,
even a full-blown regexp, e.g ^d.*y. Just make sure to use single-quotes
to prevent the shell from doing any expansion.

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?
If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).
do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}