CMake function to convert string to C string literal - cmake

Is there a built-in function to convert a string to a C string literal. For example:
set(foo [[Hello\ World"!\]])
convert_to_cstring_literal(bar "${foo}")
message("${foo}") # Should print (including quotes): "Hello\\ World\"!\\"
I mean I can do this with considerable effort with regexes, but if there's a built-in function it would be a lot nicer.

So, I actually gave up on this an used a different trick: C++ raw string literals. It's not 100% guaranteed of course, so don't use it on untrusted input (not sure why you would have any in CMake though). But it should be fine for most purposes.
set(foo "R\"#?#:#?#(${foo})#?#:#?#\"")

Turning my comment into an answer
Slightly modifying the CMake's function _cpack_escape_for_cmake from CPack.cmake I was able to successfully test the following:
cmake_minimum_required(VERSION 2.8)
project(CStringLiteral)
function(convert_to_cstring_literal var value)
string(REGEX REPLACE "([\\\$\"])" "\\\\\\1" escaped "${value}")
set("${var}" "\"${escaped}\"" PARENT_SCOPE)
endfunction()
set(foo [[Hello\ World"!\]])
convert_to_cstring_literal(bar "${foo}")
message("${bar}") # prints "Hello\\ World\"!\\"

Related

Cmake match beginning of string

I am getting some compile definitions from an external library. Unfortunately, they provide a list that sometimes starts with a leading semi-colon. For example:
;-Dfoo;Dbar
I think this is crashing the build command later in the process. I thought that I could simply remove potential leading semi-colons with this regex:
string(REGEX REPLACE "^;" "" stripped_defs ${defs})
but the problem is that Cmake seems to be ignoring the carrot ^ which signifies the start of the string, with the consequence being that all semi-colons are deleted. That is, I am getting the output
-Dfoo-Dbar
when I want
-Dfoo;-Dbar
As Sergei points out, the problem is that my defs variable was being interpreted as a list, not a string. So the regex was acting on each element of the list individually. All I need to do to force the string interpretation is to add quotes. Specifically, instead of
string(REGEX REPLACE "^;" "" stripped_defs ${defs})
I should have had
string(REGEX REPLACE "^;" "" stripped_defs "${defs}")
Rather than using a regular expression in this case, using list operations to delete empty elements would be my preferred approach in this case:
set(stripped_defs ${defs})
list(REMOVE_ITEM stripped_defs "")
This may involve one more command, but it's easier to understand what the snippet does.

CMake syntax: how to negate if(<constant>) and if(<variable|string>)

CMake's if command [1] supports several signatures, starting with
if(<constant>)
if(<variable|string>)
if(NOT <expression>)
How to negate the first two?
If the CMake documentation is correct (which in my experience is far from certain), then my question boils down to:
How to convert a constant, a variable, or a string X into an expression, with the additional requirement that X is to be evaluated as a boolean?
[1] https://cmake.org/cmake/help/latest/command/if.html
Actually, <expression> is just a placeholder for any parameter, which can be passed to if. Even the list of possible if constructions is titled as "Possible expressions are".
if(NOT <constant>) # Revert 'if(<constant>)'
if(NOT <variable|string>) # Revert 'if(NOT <variable|string>)'

How do I convert a CMake semicolon-separated list to newline-separated?

E.g:
set (txt "Hello" "There" "World")
# TODO
message (txt) # Prints "Hello\nThere\nWorld" (i.e. each list item on a new line
What do I put in place of TODO?
CMake's lists are semicolon-delimited. So "Hello" "There" "World" is internally represented as Hello;There;World. So a simple solution is to replace semicolons with newlines:
string (REPLACE ";" "\n" txt "${txt}")
This works in this example, however lets try a more complicated example:
set (txt "" [[\;One]] "Two" [[Thre\;eee]] [[Four\\;rrr]])
The [[ ]] is a raw string so the \'s are passed through into CMake's internal representation of the list unchanged. The internal representation is: ;\;One;Two;Thre\;eee;Four\\;rrr. We'd expect it to print:
<blank line>
;One
Two
Thre;eee
Four\\;rrr
I'm not actually 100% sure about the Four\\;rrr one but I think it is right. Anyway with our naive implementation we actually get this:
<blank line>
\
One
Two
Thre\
eee
Four\\
rrr
It's because it doesn't know to not convert actual semicolons that are escaped. The solution is to use a regex:
string (REGEX REPLACE "[^\\\\];" "\\1\n" txt "${txt}")
I.e. only replace ; if it is preceded by a non-\ character (and put that character in the replacement). The almost works, but it doesn't handle the first empty element because the semicolon isn't preceded by anything. The final answer is to allow the start of string too:
string (REGEX REPLACE "(^|[^\\\\]);" "\\1\n" txt "${txt}")
Oh and the \\\\ is because one level of escaping is removed by CMake processing the string literal, and another by the regex engine. You could also do this:
string (REGEX REPLACE [[(^|[^\\]);]] "\\1\n" txt "${txt}")
But I don't think that is clearer.
Maybe there is a simpler method than this but I couldn't find it. Anyway, that, Ladies and Gentlemen, is why you should never use strings as your only type, or do in-band string delimiting. Still, could have been worse - at least they didn't use spaces as a separator like Bash!
I just wanted to add some alternatives I'm seeing just using the fact that message() does place a newline at the end by itself:
Just using for_each() to iterate over the list:
set (txt "Hello" "There" "World")
foreach(line IN LISTS txt)
message("${line}")
endforeach()
An function() based alternative I came up with looks more complicated:
function(message_cr line)
message("${line}")
if (ARGN)
message_cr(${ARGN})
endif()
endfunction()
set(txt "Hello" "There" "World")
message_cr(${txt})
The more generalized version of those approaches would look like:
for_each() with strings
set(txt "Hello" "There" "World")
foreach(line IN LISTS txt)
string(APPEND multiline "${line}\n")
endforeach()
message("${multiline}")
function() with strings
function(stringify_cr var line)
if (ARGN)
stringify_cr(${var} ${ARGN})
endif()
set(${var} "${line}\n${${var}}" PARENT_SCOPE)
endfunction()
set(txt "Hello" "There" "World")
stringify_cr(multiline ${txt})
message(${multiline})
If you don't like the additional newline at the end add string(STRIP "${multiline}" multiline).

Strip filename (shortest) extension by CMake (get filename removing the last extension)

get_filename_component can be used to remove/extract the longest extension.
EXT = File name longest extension (.b.c from d/a.b.c)
NAME_WE = File name without directory or longest extension
I have a file with a dot in its name, so I need the shortest extension:
set(MYFILE "a.b.c.d")
get_filename_component(MYFILE_WITHOUT_EXT ${MYFILE} NAME_WE)
message(STATUS "${MYFILE_WITHOUT_EXT}")
reports
-- a
but I want
-- a.b.c
What is the preferred way to find the file name without the shortest extension?
I would do:
string(REGEX REPLACE "\\.[^.]*$" "" MYFILE_WITHOUT_EXT ${MYFILE})
The regular expression matches a dot (\\., see next paragraph), followed by any number of characters that is not a dot [^.]* until the end of the string ($), and then replaces it with an empty string "".
The metacharacter dot (normally in a regular expression it means "match any character") needs to be escaped with a \ to be interpreted as a literal dot. However, in CMake string literals (like C string literals), \ is a special character and need to be escaped as well (see also here). Therefore you obtain the weird sequence \\..
Note that (almost all) metacharacters do not need to be escaped within a Character Class: therefore we have [^.] and not [^\\.].
Finally, note that this expression is safe also if there's no dot in the filename analyzed (the output corresponds to the input string in that case).
Link to string command documentation.
As of CMake 3.14 it is possible to do this with get_filename_component directly.
NAME_WLE: File name without directory or last extension
set(INPUT_FILE a.b.c.d)
get_filename_component(OUTPUT_FILE_WE ${INPUT_FILE} NAME_WE)
get_filename_component(OUTPUT_FILE_WLE ${INPUT_FILE} NAME_WLE)
OUTPUT_FILE_WE would be set to a, and OUTPUT_FILE_WLE would be set to a.b.c.
I'd solve this with a simple regex:
string(REGEX MATCH "^(.*)\\.[^.]*$" dummy ${MYFILE})
set(MYFILE_WITHOUT_EXT ${CMAKE_MATCH_1})
In CMake 3.20 and greater, the cmake_path command now provides an elegant solution:
cmake_path(GET <path-var> STEM [LAST_ONLY] <out-var>)
where STEM refers to the portion of the filename before the extension. The cmake_path command supersedes the get_filename_component command.
So, in your example a.b.c.d, the following code grabs the stem of the filename:
cmake_path(GET MYFILE STEM MYFILE_WITHOUT_EXT)
message(STATUS ${MYFILE_WITHOUT_EXT})
which yields:
a
But this code with LAST_ONLY grabs the stem, but considers only the last extension as the file extension:
cmake_path(GET MYFILE STEM LAST_ONLY MYFILE_WITHOUT_EXT)
message(STATUS ${MYFILE_WITHOUT_EXT})
which yields:
a.b.c

When should I wrap variables with ${...} in CMake?

I wonder why often variables in CMake are wrapped with a dollar sign and curly brackets. For example, I saw this call in a CMake tutorial.
include_directories(${PROJECT_BINARY_DIR})
But from what I tried, this does the same thing.
include_directories(PROJECT_BINARY_DIR)
When is the wrapping with ${...} needed and what does it mean? Why are variables often wrapped with this even if it makes no difference?
Quoting the CMake documentation:
A variable reference has the form ${variable_name} and is evaluated
inside a Quoted Argument or an Unquoted Argument. A variable reference
is replaced by the value of the variable, or by the empty string if
the variable is not set.
In other words, writing PROJECT_BINARY_DIR refers, literally, to the string "PROJECT_BINARY_DIR". Encapsulating it in ${...} gives you the contents of the variable with the name PROJECT_BINARY_DIR.
Consider:
set(FOO "Hello there!")
message(FOO) # prints FOO
message(${FOO}) # prints Hello there!
As you have probably guessed already, include_directories(PROJECT_BINARY_DIR) simply attempts to add a subdirectory of the name PROJECT_BINARY_DIR to the include directories. On most build systems, if no such directory exists, it will simply ignore the command, which might have tricked you into the impression that it works as expected.
A popular source of confusion comes from the fact that if() does not require explicit dereferencing of variables:
set(FOO TRUE)
if(FOO)
message("Foo was set!")
endif()
Again the documentation explains this behavior:
if(<constant>)
True if the constant is 1, ON, YES, TRUE, Y, or a non-zero number. False if the constant is 0, OFF, NO, FALSE, N, IGNORE, NOTFOUND, the
empty string, or ends in the suffix -NOTFOUND. Named boolean constants
are case-insensitive. If the argument is not one of these constants,
it is treated as a variable.
if(<variable>)
True if the variable is defined to a value that is not a false constant. False otherwise. (Note macro arguments are not variables.)
In particular, one can come up with weird examples like:
unset(BLA)
set(FOO "BLA")
if(FOO)
message("if(<variable>): True")
else()
message("if(<variable>): False")
endif()
if(${FOO})
message("if(<constant>): True")
else()
message("if(<constant>): False")
endif()
Which will take the TRUE branch in the variable case, and the FALSE branch in the constant case. This is due to the fact that in the constant case, CMake will go look for a variable BLA to perform the check on (which is not defined, hence we end up in the FALSE branch).
it's per-case. it's poorly defined. You just have to look it up.
there are other places where you don't have to use {}'s to use the contents of the variable, besides IF. Yikes.