CMake: what is the purpose of matching a variable against itself? - cmake

I saw the following piece of CMake code in the definition of the CHECK_C_SOURCE_COMPILES macro:
IF("${VAR}" MATCHES "^${VAR}$")
...
What is the purpose of this code and wouldn't it always succeed?

from CMake mailing list
this is definitely not always true.
The variable your are testing may contain "un-evaluated" var
or some special regex character( *, ?, ...)

Related

CMake - How does the if() command treat a symbol? As string or as variable?

I am not sure the CMake if() command will treat a symbol in the condition clause as a variable or a string literal. So I did some experiments.
Script1.cmake
cmake_minimum_required(VERSION 3.15)
set(XXX "YYY") #<========== HERE!!
if(XXX STREQUAL "XXX")
message("condition 1 is true") # If reach here, XXX is treated as string
elseif(XXX STREQUAL "YYY")
message("condition 2 is true") # If reach here, XXX is treated as variable
endif()
The output is:
condition 2 is true
So I come to below conclusion 1.
For a symbol in the condition clause:
If the symbol is defined as a variable before, CMake will treat it as variable and use its value for evaluation.
If the symbol is not defined as a variable before, CMake will treat it literally as a string.
Then I did another experiment.
set(ON "OFF")
if(ON)
message("condition 3 is true") # If reach here, ON is treated as a constant.
else()
message("condition 4 is true") # If reach here. ON is treated as a variable.
endif()
The output is:
condition 3 is true
So, though ON is explicitly defined as a variable, the if command still treat it as a constant of TRUE value. This directly contradicts to my previous conclusion 1.
So how can I know for sure the CMake if() command will treat a symbol as string or variable??
ADD 1 - 11:04 AM 7/11/2019
It seems the if(constant) form precedes other forms of if() statement. (src)
if(<constant>)
True if the constant is 1, ON, YES, TRUE, Y, or a non-zero number.
False if the constant is 0, OFF, NO, FALSE, N, IGNORE, NOTFOUND, the
empty string, or ends in the suffix -NOTFOUND. Named boolean constants
are case-insensitive. If the argument is not one of these specific
constants, it is treated as a variable or string and the following
signature is used.
So for now, I have to refer to the above rule first before applying my conclusion 1.
(This may be an answer, but I am not sure enough yet.)
Welcome to the wilderness of CMake symbol interpretation.
If the symbol exists as a variable, then the expression is evaluated with the value of the variable. Otherwise, the name of the variable (or literal, as you said) is evaluated instead.
The behavior becomes a little more consistent if you add the ${ and } sequences. Then the value of the variable is used in the evaluation every single time. If the variable doesn't exist or has not been assigned a value, then CMake uses several placeholder values that evaluate to "false". These are the values you mentioned in the latter part to your post.
I believe this is done this way for backwards compatibility, which CMake is really good about. For most of the quirky things CMake does, it's usually in the name of backwards compatibility.
As for the inconsistent behavior you mentioned in the "ON" variable, this is probably due to the precedence in which CMake processes the command arguments. I would have to figure that the constants are parsed before the symbol lookup occurs.
So when it comes to knowing/predicting how an if statement will evaluate, my best answer is experience. The CMake source tree and logic is one magnificent, nasty beast.
There's been discussions on adding an alternative language (one with perhaps a functional paradigm), but it's a quite large undertaking.

Numeric only variable name in CMake

What was the reason to allow numeric only variable names in CMake?
It makes the next code frustrative (if's condition becomes true):
set(1 3)
set(2 3)
if (1 EQUAL 2)
MESSAGE( "hi there" )
endif()
And even more likely usage (if's condition becomes true also):
set(1 2)
... # later on, or even in the other file:
set(var1 1)
if (${var1} EQUAL 2)
MESSAGE( "hi there" )
endif()
PS I understand why variable references without ${} used inside IF/WHILE. But the possibility of numeric only variable names makes using IFs more error-prone...
Answer from Brad King at CMake issue tracker:
For reference, variable names are arbitrary strings, e.g.
set(var "almost anything here")
set("${var}" value)
message(STATUS "${${var}}")
Allowing numeric-only names is a side effect of that.
Certainly they can be used in confusing ways. Disallowing them, even
if only for if() evaluation, would require a policy.

Does the "#" symbol have a special meaning when surrounding a CMake variable?

While looking through the ITK source code I've come across a number of files like this, which have the suffix .cmake.in and which define a number of variables (strings?), where the value is identical to the variable name, but with # symbols prepended/appended. For example:
set(ExternalData_OBJECT_STORES "#ExternalData_OBJECT_STORES#")
What is the purpose of these declarations? Does the # symbol have a special meaning in this context? I tried searching for this in the CMake Language Syntax Wiki, but there were no occurrences of # on the page.
Files with suffix .in are usually intended for configuration via command configure_file. All sequences #NAME# within such files are translated to value of variable NAME.
Outside of configure_file / string(CONFIGURE) symbol # has no special meaning in CMake.

What's the difference between parenthesis $() and curly bracket ${} syntax in Makefile?

Is there any differences in invoking variables with syntax ${var} and $(var)? For instance, in the way the variable will be expanded or anything?
There's no difference – they mean exactly the same (in GNU Make and in POSIX make).
I think that $(round brackets) look tidier, but that's just personal preference.
(Other answers point to the relevant sections of the GNU Make documentation, and note that you shouldn't mix the syntaxes within a single expression)
The Basics of Variable References section from the GNU make documentation state no differences:
To substitute a variable's value, write a dollar sign followed by the
name of the variable in parentheses or braces: either $(foo) or
${foo} is a valid reference to the variable foo.
As already correctly pointed out, there is no difference but be be wary not to mix the two kind of delimiters as it can lead to cryptic errors like in the GNU make example by unomadh.
From the GNU make manual on the Function Call Syntax (emphasis mine):
[…] If the arguments themselves contain other function calls or variable references, it is wisest to use the same kind of delimiters for all the references; write $(subst a,b,$(x)), not $(subst a,b,${x}). This is because it is clearer, and because only one type of delimiter is matched to find the end of the reference.
The ${} style lets you test the make rules in the shell, if you have the corresponding environment variables set, since that is compatible with bash.
Actually, it seems to be fairly different:
, = ,
list = a,b,c
$(info $(subst $(,),-,$(list))_EOL)
$(info $(subst ${,},-,$(list))_EOL)
outputs
a-b-c_EOL
md/init-profile.md:4: *** unterminated variable reference. Stop.
But so far I only found this difference when the variable name into ${...} contains itself a comma. I first thought ${...} was expanding the comma not as part as the value, but it turns out i'm not able to hack it this way. I still don't understand this... If anyone had an explanation, I'd be happy to know !
It makes a difference if the expression contains unbalanced brackets:
${info ${subst ),(,:-)}}
$(info $(subst ),(,:-)))
->
:-(
*** insufficient number of arguments (1) to function 'subst'. Stop.
For variable references, this makes a difference for functions, or for variable names that contain brackets (bad idea)

Using CMake's include_directories command with white spaces

I am using CMake to build my project and I have the following line:
include_directories(${LLVM_INCLUDE_DIRS})
which, after evaluating LLVM_INCLUDE_DIRS, evaluates to:
include_directories(C:\Program Files\LLVM\include)
The problem is that this is being considered two include directories, "C:\Program" and "Files\LLVM\include".
Any idea how can I solve this problem? I tried using quotation marks, but it didn't work.
EDIT: It turned out that the problem is in the file llvm-3.0\share\llvm\cmake\LLVMConfig.cmake. I enclosed the following paths with quotation marks and the problem was solved:
set(LLVM_INSTALL_PREFIX C:/Program Files/LLVM)
set(LLVM_INCLUDE_DIRS ${LLVM_INSTALL_PREFIX}/include)
set(LLVM_LIBRARY_DIRS ${LLVM_INSTALL_PREFIX}/lib)
In CMake,
whitespace is a list separator (like ;),
evaluating variable names basically replaces the variable name with its content and
\ is an escape character (to get the symbol, it needs to be escaped as well)
So, in your example, include_directories(C:\\Pogram Files\\LLVM\\include) is the same as
include_directories( C:\\Program;Files\\LLVM\\include)
that is, a list with two items. To avoid this, either
escape the whitespace as well:
include_directories( C:\\Program\ Files\\LLVM\\include) or
surround the path with quotation marks:
include_directories( "C:\\Program Files\\LLVM\\include")
Obviously, the second option is the better choice as it is
simpler and easier to read and
can be used with variable evaluation like in your example (since the result of the evaluation is then surrounded by quotation marks and thus, treated a single item)
include_directories("${LLVM_INCLUDE_DIRS}")
This works as well, if LLVM_INCLUDE_DIRS is a list of multiple directories because the items in this list will then be explicitly separated by ; so that there is no need for unquoted whitespace as implicit list item separator.
Side note:
When using hard-coded path-names (for whatever reason) in my CMake files, I usually uses forward slashes as directory separators as this works on Windows as well and avoids the need to escape all backslashes.
This is more likely to be an error at the point where LLVM_INCLUDE_DIRS is set rather than a problem with include_directories.
To check this, try calling include_directories("C:\\Program Files\\LLVM\\include") - it should work correctly.
The problem seems to be that LLVM_INCLUDE_DIRS was constructed without using quotation marks. Try for example running this:
set(LLVM_INCLUDE_DIRS C:\\Program Files\\LLVM\\include)
message("${LLVM_INCLUDE_DIRS}")
set(LLVM_INCLUDE_DIRS "C:\\Program Files\\LLVM\\include")
message("${LLVM_INCLUDE_DIRS}")
The output is:
C:\Program;Files\LLVM\include
C:\Program Files\LLVM\include
Note the semi-colon in the first output line. This is a list with 2 items.
So the way to fix this is to modify the way in which LLVM_INCLUDE_DIRS is created.