CPack: Ignoring files using regex - cmake

(apologies: cross-posted from CMake mailing list)
I'm trying to get my head round CMake's regex implementation; I have a folder containing 4 folders and 2 text files as follows:
build/
projectA/
CMakeLists.txt
extrafiles/
README
temp/
One line of CMakeLists.txt is:
set(CPACK_SOURCE_IGNORE_FILES "[^projectA]$")
In my source package that is then subsequently generated, build/, projectA/ and extrafiles are present, but temp/ and the 2 text files are not. I'm trying to get to a stage where the regex will ignore everything in the folder except for projectA/, README and CMakeLists.txt, but can't work out at the moment how the regex I've
supplied is giving those results.
I guess what this boils down to is how to match a whole string using regex. I realise that the docs say Matches any character(s) not inside the brackets which is where I guess I'm going wrong...
Further exploration
In trying to understand CMake's regex implementation, I thought I'd start from 1st principles and do some easy stuff.
If I do
set(CPACK_SOURCE_IGNORE_FILES projectA)
then the folder projectA doesn't appear in my source package (as expected); however, if I do
set(CPACK_SOURCE_IGNORE_FILES ^projectA$)
or
set(CPACK_SOURCE_IGNORE_FILES ^/projectA/$)
then projectA does appear. What is it about the ^ (beginning of line) and $ (end of line) that I'm not understanding?
Even more
As probably obvious, projectA is not actually the name of my project, but everything above holds true when I physically rename my project folder to projectA. But, when I replace
set(CPACK_SOURCE_IGNORE_FILES projectA)
with
set(CPACK_SOURCE_IGNORE_FILES <name of my project>)
and rename my actual project folder from projectA to its actual name, I end up with an empty tarball! Argh! I have absolutely no idea what strange tricks CMake is playing on me, but I just want to cry.
Any insight will be greatly appreciated!
SELF CONTAINED EXAMPLE
As requested by Fraser, a self contained example showing 2 of the 'features' I've described. However, I do know that I'm running CMake in a slightly non-standard way, in order to keep everything to do with individual builds together, so if there's any proof running CMake in a more standard way eliminates these problems I'd be interested to see them.
Step 1: creating files
Create tree:
cd ~
mkdir
cd projectA
mkdir projectA
Create C file, and save it as ~/projectA/projectA/helloworld.c:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
printf("!!!Hello World!!!\n"); /* prints !!!Hello World!!! */
printf("!!!Hello CMake!!!\n"); /* prints !!!Hello CMake!!! */
return 0;
}
create a file that won't need compiling, and save it as ~/projectA/test.sh:
#A non compiled program
echo "Hello world!"
create ~/projectA/CMakeLists.txt:
cmake_minimum_required (VERSION 2.6)
project (HelloWorld)
set(CMAKE_INSTALL_PREFIX "$ENV{HOME}/projectAinstall")
add_executable(helloworld projectA/helloworld.c)
install(TARGETS helloworld DESTINATION .)
include(InstallRequiredSystemLibraries)
set(CPACK_GENERATOR "TGZ")
set(CPACK_SOURCE_GENERATOR "TGZ")
include(CPack)
Step 2: compiling
In ~/projectA, run:
chris#chris:~/projectA$ cmake -H. -Bbuild
then:
make -C build && make -C build package && make -C build package_source
this results in 2 tarballs in the build folder. Moving these somewhere else and untarring them shows helloworld in the binary tarball (as expected), and everything from the ~/projectA/projectA in the source tarball, including test.sh which won't get compiled (which Fraser seemed surprised about)
Step 3: random tests
Modifying CMakeLists.txt to include
set(CPACK_SOURCE_IGNORE_FILES "projectA")
and rerunning the CMake / Make commands above results in an empty source tarball, but with the same binary tarball as above. I have now realised that changing the directory tree so that the top level directory is testproject (and so different to its child folder) doesn't result in an empty source tarball, and does only remove the files listed in CPACK_SOURCE_IGNORE_FILES

I don't think you can achieve what you're after using CPACK_SOURCE_IGNORE_FILES (although I'm not certain). As you rightly noted, CMake's regex handling allows for excluding groups of characters, but I don't think it allows for negating whole patterns. [See updated answer at the end of the edits.]
That being said, I guess you can list all the folders you wish to exclude in your install command. Not as robust as excluding everything except "projectA", but still here's the syntax:
install(DIRECTORY .
DESTINATION the_install_subdir
REGEX "build|extrafiles|temp+" EXCLUDE)
Regarding the empty tarball, I imagine that you maybe have <name of my project> both as your project's root dir and as a subdir? So in your example, if you called your project "projectA", then you'd have "projectA/build", "projectA/projectA", etc.
If so, the regex will work on the full path, and hence all files within your project will contain projectA/ within their paths.
As for the crying... well, I can only advise you to get a grip and pull yourself together! :-)
Edit: In response to the comments, here's a quick example of using the install command to achieve the goal:
install(DIRECTORY projectA
DESTINATION the_install_subdir)
install(FILES CMakeLists.txt README DESTINATION the_install_subdir)
Further Edit:
OK, your example helps a lot - I had indeed misunderstood what you were doing. I hadn't picked up that you were actually making 2 different targets ("package" and "package_source"). I had thought you were creating the binary package by doing something like
cpack -G DEB
and that you were creating the other package by doing
cpack -G TGZ
These both build the binary package. My mistake - I should have paid more attention. Sorry!
As for your specific questions:
Question 1
It seems to me that installing files / directories that aren't compiled but are at the same level as the folder containing all the compiled files (i.e. bin), and then ignoring the bin folder using CPACK_SOURCE_IGNORE_FILES results in an empty tarball - is this correct?
I take this to mean: "Should doing set(CPACK_SOURCE_IGNORE_FILES "${CMAKE_BINARY_DIR}") result in an empty tarball?" The answer is probably not.
Because CPACK_SOURCE_IGNORE_FILES represents a regex, I'm sure there are cases where the resultant regex could match every file in the project, and this would cause an empty tarball. However I imagine it's fairly unlikely.
If, rather than using the full path to your bin dir via the variable ${CMAKE_BINARY_DIR} you were to just give the folder name, there would be a much greater chance of an empty tarball. Say you call your bin dir "build" and have set(CPACK_SOURCE_IGNORE_FILES "build"). If your project lived in say ~/test_builds/projectA, then the regex "build" would match every file in the project since each contains "test_builds"; resulting in an empty tarball.
I think this is the crux of issue each time you've generated an empty tarball. Whatever the regex is trying to achieve, it actually ends up matching and excluding all files.
Question 2
It also seems that files in the CMAKE_SOURCE_DIR which aren't 'installed' don't end up in the binary tarball but do end up in the source tarball
Yes, the "package_source" is indeed a different target to the binary package. It by default contains all files in the ${CMAKE_SOURCE_DIR}, whereas the "package" target contains only items added via install commands. Here, the term "source files" is probably a slight misnomer since it means all files in the source tree - not just .c, .cc, .cxx, etc.
Original Question
I think there's a reasonably safe way to achieve your original aim after all! If you use file(GLOB ...) to generate a non-recursive list of all files/folders in your root, then remove those you wish to keep in the source package, you should be able to use the remaining list as the regex value of CPACK_SOURCE_IGNORE_FILES:
file(GLOB SourceIgnoreFiles "${CMAKE_SOURCE_DIR}/*")
set(SourceKeepFiles "${CMAKE_SOURCE_DIR}/projectA"
"${CMAKE_SOURCE_DIR}/CMakeLists.txt"
"${CMAKE_SOURCE_DIR}/README")
list(REMOVE_ITEM SourceIgnoreFiles ${SourceKeepFiles})
# Escape any '.' characters
string(REPLACE "." "\\\\." SourceIgnoreFiles "${SourceIgnoreFiles}")
set(CPACK_SOURCE_IGNORE_FILES "${SourceIgnoreFiles}")
Hopefully this should now work for you. Sorry again for the misdirections.

CMake tends to use absolute paths except in contexts where there's a strong argument for using relative paths. So I'm pretty sure it's running each regex in CPACK_SOURCE_IGNORE_FILES against absolute paths of files (which should answer your question "What is it about the ^ (beginning of line) and $ (end of line) that I'm not understanding?"). Anything that isn't matched by any regex in CPACK_SOURCE_IGNORE_FILES is not ignored.
What you want is probably something like:
set(CPACK_SOURCE_IGNORE_FILES
/build/
/extrafiles/
/temp/
)

Related

How can I copy a directory that have symlinks in it with CMake during build time when using file(GLOB_RECURSE)?

The most straight forward way to copy a directory in CMake during build time is:
add_custom_target(my-copy-dir ALL COMMAND ${CMAKE_COMMAND} -E copy_directory
${INPUT_DIR} ${OUTPUT_DIR})
However, this approach is not desired because the copy will be executed every single time even when everything is up-to-date, which slow things down. Therefore, the below approach is better
file(
GLOB_RECURSE FILES
LIST_DIRECTORIES false
RELATIVE ${INPUT_DIR}/
${INPUT_DIR}/*)
foreach(FILE ${FILES})
add_custom_command(
OUTPUT ${OUTPUT_DIR}/${FILE}
DEPENDS ${INPUT_DIR}/${FILE}
COMMAND ${CMAKE_COMMAND} -E copy ${INPUT_DIR}/${FILE} ${OUTPUT_DIR}/${FILE})
list(APPEND ALL_OUTPUT_FILES ${OUTPUT_DIR}/${FILE})
endforeach()
add_custom_target(my-copy-dir ALL DEPENDS ${ALL_OUTPUT_FILES})
The copy will only be executed if there are changes to the files in the input directory. Most of the time this approach works perfectly, however, I recently encountered a situation where there are symlinks in the input folder, and the symlinks are ignored during the file glob, so unfortunately I have to use the first approach to copy this folder in order to preserve the symlink. Is there a way to make the second approach work with a folder that contains symlinks?
For more context, the folder I am copying is a macOS framework. For example, assuming the name of the framework is Foo.framework, Foo.h is inside Foo.framework/Versions/A/Headers, and there's also a symlink to Foo.h in Foo.framework/Headers. If I use the second approach, I can see Foo.h in Foo.framework/Versions/A/Headers but not Foo.framework/Headers, whereas if I use the first approach I can see Foo.h in both, but I would like to make the second approach work in order to avoid copying every single time when everything is up-to-date. How can I achieve this?
Just use the FOLLOW_SYMLINKS argument of file(GLOB_RECURSE) :P
Also, note that globbing also always has a tradeoff. See the CONFIGURE_DEPENDS argument. Either you don't get glob match updates at build time, or you do (in which case there's the cost of the glob matching having to run for every build). Take that into account in your statement "Therefore, the below approach is better".

Find a CMake file-generating add_custom_command example in which DEPENDS option is necessary

I want a simple example to illustrate the DEPENDS option of file generating add_custom_command(OUTPUT ...), that is, if we comment the DEPENDS part, the example will give different output or totally crash.
In the following example (there are files london and good.cpp in the current working directory), DEPENDS is dispensable:
cmake_minimum_required(VERSION 3.10)
project(Tutorial VERSION 1.0)
add_custom_command(OUTPUT foo
COMMAND cp london foo
#DEPENDS london
COMMENT "I'm testing the new method.")
add_executable(cake good.cpp foo)
I did read the documentation. I have little knowledge about building system, neither Make nor CMake. The first sentence Specify files on which the command depends. confuses me. I don't understand how a command depends on other files, in my casual example, the command line itself seems to locate everything. I want a CMake code example to show how command depends on other files, with the necessary help of DEPENDS.
The phrase in documentation
Specify files on which the command depends.
is better understandable as
Specify files on which content of the command's output file(s) depends.
As one could guess, a content of the output file of the command cp london foo depends only from london, so it is reasonable to specify option DEPENDS london for add_custom_command.
As a build system, CMake uses information in DEPENDS for decide, whether to run the command or not. If:
OUTPUT file has already been created on previous run, and
since previous run the DEPENDS file has not been updated,
then the command won't be run again. The reasoning is simple: no needs to run the command if it results with the same file(s).
Taking into account source (CMAKE_SOURCE_DIR) and build (CMAKE_BINARY_DIR) directories separation, the example could be rewritten as follows:
cmake_minimum_required(VERSION 3.10)
project(Tutorial VERSION 1.0)
add_custom_command(
OUTPUT foo # relative path denotes a file in the build directory
COMMAND cp ${CMAKE_SOURCE_DIR}/london foo # The command will be run from the build directory,
# so need to use absolute path for the file in the source directory
DEPENDS london # relative path is resolved in the source directory,
# assuming that corresponded file is already existed
WORKING_DIRECTORY ${CMAKE_BINARY_DIR} # specifies a directory from which run the COMMAND
# build directory is used by default, so the option can be omitted
COMMENT "I'm testing the new method."
)
add_executable(cake
good.cpp # relative path is resolved in the source directory,
# assuming that corresponded file is already existed
foo # because given file is absent in the source directory,
# the path is resolved relative to the build directory.
)
When build the project the first time, both foo and executable will be built.
When build the project the second time (without changing in london) nothing will be rebuilt.
When change london file and build the project again, foo will be re-built (because it depends on london). As foo is rebuilt, the executable will be rebuilt too, because it depends on foo.

Why does `install(DIRECTORY ... FILES_MATCHING_PATTERN ...)` copy empty directories? How to exclude them?

I use a CMakeLists.txt with the following install command:
install(DIRECTORY ./ DESTINATION include FILES_MATCHING PATTERN "*.h")
It correctly installs all "./*.h" files, but also copies the "./.git" directory structure (without any files).
The problem happens when using CMake 3.14.0 and did not happen with CMake 3.11.1.
Did the command change or is this a CMake bug? Should I use an explicit exclude for ".git" or can I somehow keep the whitelist approach, that will e.g. keep working when I actually need to install subfolders?
As of now, there does not seem to exist any straightforward solution other than explictly specifying to exclude your directory. The behaviour is not new to version 3.14.0 and was similar in 3.11.1. The fact that your .git directory wasn't copied might be due to another command in your CMakeLists...
As you suggest and based on this post and this thread in the old CMake forum, a solution for you would be:
install(DIRECTORY ./ DESTINATION include FILES_MATCHING PATTERN "*.h" PATTERN ".git*" EXCLUDE)
There is a ticket to add a feature to not include empty directories when using install(DIRECTORY ...), so you might keep an eye on it for when it is finally implemented.
Alternatively, you may use nested file(GLOB ...) followed by install(FILES ...), with the inherent drawbacks of globbing (see the note in the documentation).

CPackRPM compresses Man pages and then cannot find them during packaging

While creating a RPM package containing a man page (which installation path matches a list of known man locations), CPack seems to compress it using GZip, but then complains that the original, uncompressed file cannot be found. How should that functionality be used then?
Consider the following CMakeLists.txt project:
install(FILES test.1 DESTINATION /usr/share/man)
install(FILES test.2 DESTINATION /usr/share/man/man1)
set(CPACK_PACKAGE_NAME "CPackRPM_man_test")
set(CPACK_GENERATOR "RPM")
include(CPack)
When the line containing “test.2” is commented out, make package operation succeeds, — that is, simply packaging a file that is not destined to a true man page location does not cause any trouble. However, when the full project is processed, the following error message is output:
error: File not found: …/_CPack_Packages/Linux/RPM/CPackRPM_man_test-0.1.1-Linux/usr/share/man/man1/test.2
Indeed, that file simply does not exist:
$ cd …/_CPack_Packages/Linux/RPM/CPackRPM_man_test-0.1.1-Linux/usr/share/man
$ ls *
test.1
man1:
test.2.gz
Worth to note is that DEB generator does not have that issue, — simply because CPackDEB operates on original files only. While looking into CPackRPM.cmake module, I was able to locate the code that messes with man page files, but not the code responsible for undoing the mess-up; the former code works unconditionally, — I could not spot any variable that may tell it not to compress man pages.
The only similar discussion I could find dates back to 2014 and was resolved by updating to a more current version of CMake. Since I am also using openSUSE Linux as the original reporter did (the actual release, of course), I tried to switch from the system-provided CMake 3.3.2 to a locally built 3.7.0-rc2, but that did not change a thing. Therefore, taking into account the popularity of CMake, RPM and man, I suppose that if some bug really existed, it should have been fixed long ago, so the one to blame is me. What am I missing here?
Update. Regarding the use of wildcards that was recommended, — i. e. specifying “test.2*” instead of just “test.2”. CMake does not seem to support globbing in install(FILES …) form: it literally searches for test.2* and fails before generating RPM spec file. Several kinds of pattern matching are supported through another form however:
install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} DESTINATION /usr/share/man/man1 FILES_MATCHING PATTERN *.1*)
install(DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR} DESTINATION /usr/share/man/man2 FILES_MATCHING REGEX "^.+[.]2([.].+)?$")
Although that succeeds, it still does not answer the question of distributing man pages in uncompressed form. Therefore, it looks ugly and counterintuitive.
Worth to mention is how RPM spec file is generated from the originally posted CMakeLists.txt:
%dir "/usr/share/man"
%dir "/usr/share/man/man1"
"/usr/share/man/man1/test.2*"
%config "/usr/share/man/test.1"
%config "/usr/share/man/man1/test.2"
As can be clearly seen, CPackRPM indeed adds an extra wildcarded version of source file name when it belongs to a known man location; unfortunately, it still retains the original (uncompressed) file name, which leads to failure. Perhaps CMake should either warn the user of consequences or even fail early at configuration stage rather than concealing the problem for some later time.
Using a wild card after a man page entry in %files will package the file independent of compression. All I'm saying is to type
/usr/share/man/man1/bash.1*
instead of
/usr/share/man/man1/bash.1.gz
in the %files section of a spec file.

CMake follow symbolic links during install

Short question:
Is it possible to set CMake to follow symlinks when copying files during an install, and if so how is this done?
Details: I am using CMake to build and install LLVM. In my LLVM source tree in the include directory I have a symbolic link to another subproject that is being developed against LLVM. Everything seems to work, except that I noticed that when I ran "cmake install" that it copied the include directory without following the symlinks. The problem that I have is that my symlinks have a relative path (because it is inside a git repo). So when the symlinks are copied (instead of followed and copying the contents) they no longer point to the correct files. For example I have dsa -> ../../llvm-poolalloc/include/dsa/ I would like to copy the contents of this link when I do the install rather than just copying the link. But I did not find a cmake flag for doing this yet.
I realize that this is probably not the idea way to structure my project, but I am working with something that's already in place and it would be preferable to not have to change too much of the directory structures because other people I am working with expect it to be this way. So I think that being able to follow symlinks might solve my problem without having to restructure the whole build system. But I am open to other suggestions for better ways to accomplish what I am trying to do.
Note that I am working in Linux (Ubuntu 10.04) and using LLVM 2.6 (that I am compiling from source along with llvm-gcc). Also I am using CMake version 2.8.
Edit:
Here is the source code from the CMakeLists.txt file that is associated with the install instruction:
install(DIRECTORY include
DESTINATION .
PATTERN ".svn" EXCLUDE
PATTERN "*.cmake" EXCLUDE
PATTERN "*.in" EXCLUDE
PATTERN "*.tmp" EXCLUDE
)
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/include
DESTINATION .
)
the directory listing for the include directory is:
dsa -> ../../llvm-poolalloc/include/dsa/
llvm
llvm-c
poolalloc -> ../../llvm-poolalloc/include/poolalloc
What I want is for the dsa and poolalloc directories to be copied rather than just copying the symbolic links. The reason that I don't use absolute paths in the symbolic links is that I have them checked into a git repo. So my absolute path would differ from someone else working on the project when they do a checkout from the repo.
Hmm, let's try this:
get_filename_component(ABS_DIR include REALPATH)
install(DIRECTORY ${ABS_DIR}
DESTINATION .
PATTERN ".svn" EXCLUDE
PATTERN "*.cmake" EXCLUDE
PATTERN "*.in" EXCLUDE
PATTERN "*.tmp" EXCLUDE
)
If it wouldn't help, you can try to install not the include dir itself (which is symlink), but it's contents. But in your case you would need to came up with smart regex:
file(GLOB INCLUDES include/*) # filter there .svn and others
install(FILES ${INCLUDES}
DESTINATION include
)
Finally, make the symlink absolute.