CMake: build library used by multiple projects - cmake

I have a directory containing several tools which I use for independent projects, e.g.:
CommonTools
+ Tool A
+ Tool B
+ Tool C
Tool B depends on Tool A, but Tool A can be used independently from Tool B. I think I have two options:
I can install the tools under a system directory (e.g. for Windows, C:\Program Files). This is not necessarily a good thing given that some of my programs are meant to be used in the same directory as the one they are shipped in because I don't have sufficient rights to write to a system directory). Besides, I still need to locate the header files to compile projects that use those tools.
I could use find_library to locate them. Then I run into the following problem: find_library(A) won't work until I've actually built A, so I can't cmake CommonTools (because Tool B requires Tool A). I could call cmake from make, but that looks rather convoluted...
I can put relative paths to Tool A in Tool B & only use find_library for other projects. Unfortunately, this relative path changes depending on whether I'm building CommonTools or Tool B.
What are your thoughts on this? Thanks!

As I wanted to be able to perform one-step builds, this is what I ended up doing.
I distinguish the submodules of the module I'm currently building from external dependencies & third-party tools. Each (sub)module is only responsible for building itself. This means that all external dependencies & third-party tools must be already installed or available in binary + header form from a server. As a corollary, it means that a missing dependency is a binary which should be available from a given server but isn't.
Submodules are added using add_subdirectory, which means that if any of them is not available, the configuration step will fail with an explicit message.
External dependencies & third-party tools are located using find_package. The HINT location is an option which must be provided by the user performing the build (this gives an indication of the module's dependencies to the user. If any of them is not found, a binary is downloaded from a given location using ExternalProject_Add. The <module>_FOUND, <module>_LIBRARIES & <module>_INCLUDE_DIRS variables must be set manually in the CMakeLists.txt file, but given a proper directory layout on the server side (e.g. <module>-<version>-<platform>/include & <module>-<version>-<platform>/binaries), it can be done in a consistent way (e.g. using a macro). There again, if no binaries are found on the server, the configuration step will fail with an explicit message.
All of this means that the continuous integration server will correctly detect any missing dependencies (i.e. components which should be on the server but aren't or submodules which are not under version control) at configuration time rather than at build time, while still allowing one-step builds.
I hope this can be of some use to others.
PS: as a side-node to Google Test users: "gtest must be recompiled for each module because every user needs to compile his tests using the same compiler flags used to compile the installed Google Test libraries; otherwise he may run into undefined behaviors. If you compile Google Test and your test code using different compiler flags, they may see different definitions of the same class/function/variable)". This means you actually need (in my case) to run an ExternalProject_Add command in every module because each module contains its own tests.

Related

When I should use find_package

I am learning CMake, and I feel hard to understand when I should use find_package.
For separate compilation, we need to let the compiler knows where to find the header file, and this could be done by target_include_directories. For linking, we need to let the linker knows where the implementation is, and this could be done by target_link_libraries. It seems like that is all we need to do to compile a project. Could anyone explain why and when we should use find_package?
If a package you intend allows for the use of find_package, you should use it. If a package comes with a working configuration script, it'll encourage you to use the library the way it's intended to be used likely come with a simple way to add include directories and dependencies required.
When is it possible to use find_package?
There needs to be either a configuration script (<PackageName>Config.cmake or packagename-config.cmake) that gets installed with the package or find script (Find<PackageName>.cmake). The latter one in some cases even comes with the cmake installation instead of the package installed, see CMake find modules.
Should you create missing scripts yourself?
There are several benefits in creating a package configuration script yourself, even if a package doesn't come with a existing configuration or find script:
The scripts separate the information about libraries from the logic used to create your own target. The use of the 2 commands find_package and target_link_libraries is concise and any logic you may need to collect and apply information like dependencies, include directories, minimal versions of the C++ standard to use, ect. would probably take up much more space in your CMakeLists.txt files thus making it harder to understand.
If makes library used easy to replace. Basically all it takes to go with a different version of the same package would be to modify CMAKE_PREFIX_PATH, CMAKE_MODULE_PATH or package-specific <PackageName>_ROOT variables. If you ever want to try out different versions of the same library, this is incredibly useful.
The logic is reuseable. If you need to use the same functionality in a different project, it takes little effort to reuse the same logic. Even if a library is only used within a single project, but in multiple places, the use of find_package can help keeping the logic for "importing" a lib close to its use (see also the first bullet point).
There can be multiple versions of the same library with automatic selection of applicable ones. Note that this requires the use of a version file, but this file allows you to specify, if a version of the package is suitable for the current project. This allows for the checking the target architecture, ect. This is helpful when cross compiling or when providing both 32 and 64 bit versions of a library on Windows: If a version file indicates a mismatch the search for a suitable version simply continues with different paths instead of failing fatally when considering the first mismatch.
You will probably find CMake's guide on using dependencies helpful. It describes find_package and alternatives, and when each one is relevant / useful. Here's an excerpt from the section on find_package (italics added):
A package needed by the project may already be built and available at some location on the user's system. That package might have also been built by CMake, or it could have used a different build system entirely. It might even just be a collection of files that didn't need to be built at all. CMake provides the find_package() command for these scenarios. It searches well-known locations, along with additional hints and paths provided by the project or user. It also supports package components and packages being optional. Result variables are provided to allow the project to customize its own behavior according to whether the package or specific components were found.
find_package requires that the package provide CMake support in the form of specific files that describe the package's contents to CMake. Some library authors provide this support (the most desirable scenario for you, the package consumer), some don't but are prominent enough that CMake itself comes with such files for those packages, or in the worst case, there is no CMake support at all, in which case you can either do something to get the either of the previous good outcomes, or perform some kludges to get the job done (ie. define the targets yourself in your project's CMake config).

Why are the source file names not human readable?

I installed Perl6 with rakudobrew and wanded to browse the installed files to see a list of hex-filenames in ~/.rakudobrew/moar-2018.08/install/share/perl6/site/sources as well as ~/.rakudobrew/moar-2018.08/install/share/perl6/sources/.
E.g.
> ls ~/.rakudobrew/moar-2018.08/install/share/perl6/sources/
09A0291155A88760B69483D7F27D1FBD8A131A35 AAC61C0EC6F88780427830443A057030CAA33846
24DD121B5B4774C04A7084827BFAD92199756E03 C57EBB9F7A3922A4DA48EE8FCF34A4DC55942942
2ACCA56EF5582D3ED623105F00BD76D7449263F7 C712FE6969F786C9380D643DF17E85D06868219E
51E302443A2C8FF185ABC10CA1E5520EFEE885A1 FBA542C3C62C08EB82C1F4D25BE7B4696F41B923
522BE83A1D821D8844E8579B32BA04966BAB7B87 FE7156F9200E802D3DB8FA628CF91AD6B020539B
5DD1D8B49C838828E13504545C427D3D157E56EC
The files contain the source of packages but this does not feel very accessible. What is the rational for that?
In Perl 6, the mechanism for loading modules and caching their compilations is pluggable. Rakudo Perl 6 comes with two main mechanisms for this.
One is a file-system based repository, and it's used with things like -Ilib. This resolves modules simply using paths on disk. Whenever a module loaded, it first has to check that the modules sources have not changed in order to re-compile them if so. This is ideal for development, however such checks take time. Furthermore, this doesn't allow for having multiple versions of the same module available and picking the one matching the specification in the use statement. Again, ideal for development, when you just want it to use your latest changes, but less so for installation of modules from the ecosystem.
The other is an installation repository. Here, specific versions of modules are installed and precompiled. It is expected that all interactions with such a repository will be done through the API or tools using the API (for example, zef locate Some::Module). It's assumed that once a specific version of a module has been installed, then it is immutable. Thus, no checks need to be done against source, and it can go straight to loaded the compiled version of the module.
Thus, the installation repository is not intended for direct human consumption. The SHA-1s are primarily an implementation convenience; an alternative scheme could have been used in return for a bit more effort (and may well be used in the future). However, the SHA-1s do also create the appearance of something that wasn't intended for direct manipulation - which is indeed the case: editing a source file in there will have no effect in the immediate, and probably confusing effects next time the compiler is upgraded to a new version.

Why there are two buttons in GUI Configure and Generate when CLI does all in one command

I understand that cmake is build generator. It mean that it can generate appropriate builds (makefiles, Visual Studio project etc.) based on instructions from CMakeLists.txt. But I do not understand two things which I guess are related:
Why there are two buttons "Configure" and "Generate" in cmake-gui? In command line tutorials that I've read (e.g. this one) usual process was done with one cmake command.
What is cache in cmake world? AFAIK it is state when "Configure" button was pressed but "Generate" button was not pressed. But why is this useful? What all those variables that pops-up after pressing "Configure" mean? Why I'm supposed to edit them? Isn't the only allowed configuration done via CMakeLists.txt?
Thanks
There are two stages when CMake is run, as reflected by the two buttons in the CMake GUI. The first stage is the configure step where the CMakeLists.txt file is read in. CMake builds up an internal representation of the project during this stage. After that, the second stage called generation occurs where the project files are written out based on that internal representation.
In CMake GUI, the two stages can be run separately. When you run the configure step, the GUI shows all cache variables (see below) which changed their values since the last time configure was run or since CMake GUI was started if this is the first configure run. Normal practice is to re-run the configure stage until no variables are highlighted red. Once configure leaves no variables in red, you can press the generate button and the build tool's native project files will be created and you are good to go starting your builds, etc.
The command line cmake tool doesn't allow you to separate out running the configure and generate steps individually. Rather, it always runs configure and then generate.
For simple projects, the distinction between configuration and generation is not all that important. Simple tutorials will often just lump the two together since the reader can get away without understanding the distinction for basic project arrangements. There are, however, some CMake features which rely on this distinction. In particular, generator expressions are a generation-time feature where decisions about certain aspects of the build are delayed to generation time rather than being fully handled at configure time. One example of this is configuration-specific content such as compiler flags, source files only compiled in for some configurations, etc. The build configuration isn't always known at CMake's configure step (e.g. Xcode and Visual Studio are multi configuration build tools, so there can be more than one and it is selected by the user at build time). The generation step will process generator expressions for each build type and the result can be different for each configuration. You might also find this answer informative regarding this particular example. For a more advanced example of a technique which takes advantage of the distinction between configure and generation stages, see this post, but be aware it is not a common technique.
Regarding your other question about what is the cache, CMake records information between runs in the variable cache. At the end of the run, it updates a file called CMakeCache.txt in the build directory. When you next run CMake, it reads in that cache to pre-populate various things so it doesn't have to recompute them (like finding libraries and other packages) and so that you don't have to supply custom options you want to override each time. You wouldn't typically edit CMakeCache.txt by hand (although it's okay to do so). Rather, you can modify the variables you want in CMake GUI and then re-run the configure step (don't forget to then also run generate to create updated project files). You can also define or modify cache variables at the cmake command line with the -D option.

How do you make it so that cpack doesn't add required libraries to an RPM?

I'm trying to convert our build system at work over to cmake and have run into an interesting problem with the RPMs that it generates (via cpack): It automatically adds all of the dependencies that it thinks your RPM has to its list of required libraries.
In general, that's great, but in my case, it's catastrophic. Unfortunately, the development packages that we build end up getting installed with one our home-grown tool that uses rpm to install them in a separate RPM database from the system one. It's stupid, but I can't change it. What this means is that all of the system libraries that any normal library will rely on (like libc or libpthread) aren't in the RPM database that is being used with our development packages. So, if an RPM for one of our development packages lists system libraries as being required, then we can't install it, as rpm will think that they're not installed (since they're listed in the normal database rather than the one that it's being told to use when installing our packages). Our current build stuff handles this just fine, because it doesn't list any system libraries as dependencies in the RPMs, but cpack automatically populates the RPM's list of required libraries and puts the system libraries in there. I need a way to stop it from doing so.
I tried setting CPACK_RPM_PACKAGE_REQUIRES to "", but that has no effect. The RPM cpack generates still ends up with the system libraries listed as being required. All I can think of doing at this point is to copy the RPM cpack generator and hack it up to do what I want and use that instead of the standard one, but I'd prefer to avoid that. Does anyone have any idea how I could get cpack to stop populating the RPM with required libraries?
See bottom of
http://www.rpm.org/max-rpm/s1-rpm-depend-auto-depend.html
The autoreqprov Tag — Disable Automatic Dependency Processing
There may be times when RPM's automatic dependency processing is not desired. In these cases, the autoreqprov tag may be used to disable it. This tag takes a yes/no or 0/1 value. For example, to disable automatic dependency processing, the following line may be used:
AutoReqProv: no
EDIT:
In order to set this in cmake, you need to do set(CPACK_RPM_PACKAGE_AUTOREQPROV " no"). The extra space seems to be required in front of (or behind) the no in order for it to work. It seems that the RPM module for cpack has a bug which makes it so that it won't let you set some its variables to anything shorter than 3 characters long.
To add to Mark Lakata's answer above, there's a snapshot of the "Maximum RPM" doc
http://www.rpm.org/max-rpm-snapshot/s1-rpm-depend-auto-depend.html
that also adds:
The autoreq and autoprov tags can be used to disable automatic processing of requirements or "provides" only, respectively.
And at least with my version of CPackRPM, there seems to be similar variables you can set e.g.
set(CPACK_RPM_PACKAGE_AUTOREQ " no")
to only disable the automatic dependency processing of 'Requires'.

Determine all of the file dependencies in a build process that uses makefiles and ant scripts

I'm trying to understand the build process of a codebase. The project uses both autoconf (configure scripts that generate makefiles) and Maven.
I would like to be able identify all of the file dependencies in the project, so that for any output file that ends up being generated by a build, I can identify how it was actually produced. Ultimately, I'd like to generate a diagram using something like graphviz to visualize the dependencies, but for now I just want to extract them.
Is there any automated way to do this? In other words, given some makefiles and Maven or ant XML files, and the name of the top-level target, is there a way to identify all of the files that will be generated, the programs used to generate them, and the input files associated with those programs?
Electric Accelerator and ClearCase are two systems that do this, by running the build and watching what it does (presumably by intercepting operating system calls). This has the advantage of working for any tool, and being unaffected by buggy makefiles (hint: they're all buggy).
That's probably the only reliable way for non-trivial makefiles, since they all do things like generating new make rules on the fly, or have behaviour that depends on the existence of files on disk that are not explicitly listed in rules.
I don't know about the maven side, but once you've ./configured the project, you could grep through the output of make -pd (make --print-data-base --dry-run) to find the dependencies. This will probably be more annoying if it's based on recursive make, but still manageable.
Note that if you're using automake, it computes detailed dependencies as a side-effect of compilation, so you won't get all the dependencies on #included headers until you do a full build.