Can cmake configuration run in parallel? - cmake

I know that the cmake build stage can be done in parallel, but the initial configuration stage takes too long for large packages. Can that be parallelized?

No, the configuration stage cannot be parallelized.
Nonetheless, here are some things you could do:
Profile the Configuration Phase
CMake has --profiling-output/--profiling-format options, which emits a file that can be opened e.g. in Chromium-based browsers using about:tracing. This visualizes all command that were processes by CMake, how long they took, how they are nested, etc. This information can be used to find bottlenecks in your configuration phase and possibly optimize them to run faster. This is especially useful if you have a lot of custom functions/macros in your CMake files. Example:
$ cmake -G Ninja --profiling-output ./profile.json --profiling-format google-trace ..
Note that the profiling itself poses some additional work on CMake. Therefore, while profiling the CMake run will take longer as usual.
Generate Less Configurations
Running CMake actually does two things: configure and generate. Configure means, all CMakeLists.txt and *.cmake files are read and processes. Generate means, the CMake-internal representation of the build is written to some buildsystem specific format, e.g. Makefile, ninja.build, Visual Studio projects, etc. In case you use a Multi-Config generator like "Ninja Multi-Config", Visual Studio or XCode, it might help to reduce the number of generated configurations. By default CMake generates build files for 4 configurations: Debug, Release, RelWithDebInfo and MinSizeRel. In case you only need a subset of those, you can specify the required ones using the CMAKE_CONFIGURATION_TYPES variable. From my experience (using Ninja Multi-Config) this can reduce the generation phase significantly for large projects.
Superbuild
Use a superbuild structure with ExternalProject. Here you basically have one build that orchestrates the configuration and building of several other projects. This way, the sub-builds including their configuration phases are run in parallel. Note however, that this has other issues like the targets of the other projects not being available at configure time. In my experience, superbuild are only a good choice for special use cases.
Concurrent execute_process()
Someone managed write a ray tracer using CMake. For the parallelization he exploited that execute_process() runs multiple COMMANDs concurrently. This could help, if you have to do some heavy preprocessing that cannot be deferred to the build stage. But then again, it runs as a separate process, so if you call CMake recursively with this you don't have access to targets, variables, etc. of that subprocess.
Do as much as possible in the Build Phase
I don't have insight into your build, but often it help to perform only the things necessary in the configuration phase. Especially for execute_process() calls, one should always evaluate if it could be replaced with add_custom_command()/add_custom_target(), because creating processes is relatively costly.
An example: I had once a large scale project at work where a lot of source files were generated from XML/XSLT files. Since CMake doesn't have any knowledge about the dependencies between XML/XSLT files, we had a script that figured out those dependencies by reading the files and following includes recursively. At first, this script was called using execute_process() at configure time in order to pass the output to the DEPENDS options of add_custom_command(). Later, I optimized this by doing the dependency evaluation at build time and generating a .d dependency file (Makefile syntax) that could be passed to the DEPFILE option of add_custom_command() instead. The speedup in configuration was enormous, while the build time suffered only slightly, because it in parallel.

Related

Generating compile_commands.json without generating build files

I'd like to generate a compile_commands.json file for use with the clangd language server. However, EXPORT_COMPILE_COMMANDS only works for the make and ninja build systems. When building a project that uses a different build system it would be convenient to also be able to generate compile_commands.json files as if I was using make or ninja without actually generating any build files that interfere with the build system that I'm using to perform the build.
What is the most convenient way to do this with cmake?
I think your only option here is to have a different build folder with Ninja or Makefile to generates the compile_commands.json and have a different build folder for your "actual" build.
The thing is, CMake is a generator, and it doesn't support mixed builds; and in fact, it should not. If they do that, you will end up having random artifacts from different build systems inside the build folder that might eventually conflicts with each others.
That being said, you are aware that what you get in Ninja-based compile_commands.json is not going to be fully relevant to your "actual" build system that you want to use. I can see it being useful, but not the same for sure.

How to handle autotools project with cmake dependency?

I have an autotools C project that needs to use another library that is built with CMake. Is their an equivalent to AC_CONFIG_SUBDIRS that will work with CMake?
I take it that you want to configure and build the CMake-based project as part of configuring and building the Autotools-based host project. This is possible, and there are several viable ways to do it, but I'm not aware of anything wholly pre-packaged like AC_CONFIG_SUBDIRS is for Autotools-based subprojects.
For configuration
Option 1 - config commands
Autoconf provides a group of macros by which you can specify custom commands for configure or the generated config.status script to run. You could use one of these -- probably AC_CONFIG_COMMANDS, but maybe AC_CONFIG_COMMANDS_POST -- to run cmake (and any wanted preparatory steps) in the subproject. Personally, I like this option best.
Option 2 - glue script
AC_CONFIG_SUBDIRS instructs configure to run configure scripts in the specified subsirectories, but those other configure scripts don't need to be Autotools-generated. You could conceivably write a custom wrapper script named "configure" in the subproject directory for the parent configure to run, but which itself performs an appropriate call to cmake. AC_CONFIG_SUBDIRS in the top-level configuration should then run that script at the right time.
Option 3 - custom code
I think Autoconf already provides sufficient support for what you seem to want, but if you think otherwise then you always have the option of writing whatever shell code you want into configure via configure.ac. You might find it worthwhile to write a custom macro for that, especially if you have multiple CMake subprojects, but that's not obligatory. Note that such commands are distinguished from those specified via AC_CONFIG_COMMANDS & co. by the timing of their execution.
For building
Presumably you'll be relying on recursive make during the build and installation steps. It shouldn't be hard to make that work, whether you're using an Automake-based Makefile.in or a hand-rolled one at the top level.
Option 1 - Automake + glue makefile
Use a SUBDIRS variable in your top-level Makefile.am to direct make to recurse into the CMake project's subdirectory, just as you would do into any other project's. Write a simple Makefile there that recurses into a build subdirectory (which you will have had to ensure is created and configured by configure). This should not collide with the subproject because it presupposes that a separate build directory is used. The glue makefile can adapt targets and make variables to the expectations of the subproject's build system.
The Automake documentation describes all the recursive targets that the top-level Autotools makefile might try to build recursively, and the glue makefile should provide all of them -- though there may be many that need only a dummy (but not empty) recipe.
Option 2 - hand-rolled top-level Makefile.in
If, on the other hand, you're using a hand-rolled top-level Makefile template then you have full control over your recursive make invocations. You could still use a glue makefile in the subproject in this case, but it's probably easier and cleaner to just adapt directly to the expected CMake-generated makefile.

When should I rerun cmake?

After running the cmake command once to generate a build system, when, if ever, should I rerun the cmake command?
The generated build systems can detect changes in the associated CMakeLists.txt files and behave accordingly. You can see the logic for doing so in generated Makefiles. The exact rules for when this will happen successfully are mysterious to me.
When should I rerun cmake? Does the answer depend on the generator used?
This blog post (under heading: "Invoking CMake multiple times") points out the confusion over this issue and states that the answer is actually 'never', regardless of generator, but I find that surprising. Is it true?
The answer is simple:
The cmake binary of course needs to re-run each time you make changes to any build setting, but you wont need to do it by design; hence "never" is correct regarding commands you have to issue.
The build targets created by cmake automatically include checks for each file subsequently [=starting from the main CMakeLists.txt file] involved or included generating the current set of Makefiles/VS projects/whatever. When invoking make (assuming unix here) this automatically triggers a previous execution of cmake if necessary; so your generated projects include logic to invoke cmake itself! As all command-line parameters initially passed (e.g. cmake -DCMAKE_BUILD_TYPE=RELEASE .. will be stored in the CMakeCache.txt, you dont need to re-specify any of those on subsequent invocations, which is why the projects also can just run cmake and know it still does what you intended.
Some more detail:
CMake generates book-keeping files containing all files that were involved in Makefile/Project generation, see e.g. these sample contents of my <binary-dir>/CMakeFiles/Makefile.cmake file using MSYS makefiles:
# The top level Makefile was generated from the following files:
set(CMAKE_MAKEFILE_DEPENDS
"CMakeCache.txt"
"C:/Program Files (x86)/CMake/share/cmake-3.1/Modules/CMakeCCompiler.cmake.in"
"C:/Program Files (x86)/CMake/share/cmake-3.1/Modules/RepositoryInfo.txt.in"
"<my external project bin dir>/release/ep_tmp/IRON-cfgcmd.txt.in"
"../CMakeFindModuleWrappers/FindBLAS.cmake"
"../CMakeFindModuleWrappers/FindLAPACK.cmake"
"../CMakeLists.txt"
"../CMakeScripts/CreateLocalConfig.cmake"
"../Config/Variables.cmake"
"../Dependencies.cmake"
"CMakeFiles/3.1.0/CMakeCCompiler.cmake"
"CMakeFiles/3.1.0/CMakeRCCompiler.cmake")
Any modification to any of these files will trigger another cmake run whenever you choose to start a build of a target. I honestly dont know how fine-grained those dependencies tracking goes in CMake, i.e. if a target will just be build if any changes somewhere else wont affect the target's compilation. I wouldn't expect it as this can get messy quite quickly, and repeated CMake runs (correctly using the Cache capabilities) are very fast anyways.
The only case where you need to re-run cmake is when you change the compiler after you started a project(MyProject); but even this case is handled by newer CMake versions automatically now (with some yelling :-)).
additional comment responding to comments:
There are cases where you will need to manually re-run cmake, and that is whenever you write your configure scripts so badly that cmake cannot possibly detect files/dependencies you're creating. A typical scenario would be that your first cmake run creates files using e.g. execute_process and you would then include them using file(GLOB ..). This is BAD style and the CMake Docs for file explicitly say
Note: We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate.
Btw this comment also sheds light on the above explained self-invocation by the generated build system :-)
The "proper" way to treat this kind of situations where you create source files during configure time is to use add_custom_command(OUTPUT ...), so that CMake is "aware" of a file being generated and tracks changes correctly. If for some reason you can't/won't use add_custom_command, you can still let CMake know of your file generation using the source file property GENERATED. Any source file with this flag set can be hard-coded into target source files and CMake wont complain about missing files at configure time (and expects this file to be generated some time during the (first!) cmake run.
Looking into this topic for reading the version information from a debian/changelog file (generation phase), I ran in the topic that cmake execution should be triggered as debian/changelog is modified. So I had the need to add debian/changelog to CMAKE_MAKEFILE_DEPENDS.
In my case, debian/changelog is read through execute_process. Execute_process unfortunately gives no possibility to add files processed to CMAKE_MAKEFILE_DEPENDS. But I found that running configure_file will do it. Actually I am really missing something like DEPENDENCIES in execute_process.
However, as I had the need to configure the debian/changelog file for my needs, the solution came implicitly to me.
I actually also found a hint about this in the official documentation of configure_file:
"If the input file is modified the build system will re-run CMake to re-configure the file and generate the build system again."
So using configure_file should be a safe to trigger the re-run of cmake.
From a user perspective, I would expect other commands to extend CMAKE_MAKEFILE_DEPENDS, too. E.g. execute_process (on demand) but also file(READ) (implicitly like configure_file). Perhaps there are others. Each read file is likely to influence the generation phase. As an alternative it would be nice to have a command to just extend the dependency list (hint for the cmake developers, perhaps one comes along).

Is it possible to build binaries for different targets using CMake?

I'm considering to use CMake for projects targeting a microcontroller. I found out how to create a toolchain file and invoke cmake -DCMAKE_TOOLCHAIN_FILE=Path/To/Toolchain.cmake to make CMake do cross-compiling.
However most projects that I work on have also code that must be compiled for the host platform. These are often unit tests or other test tools, which share most part of their code with the binary that will run on the microcontroller. A rare case might be a project that even has two processors having a different instruction architectures, thus needing a host compiler and two different cross compilers.
I'd like to have one build that rules them all. Is it possible to have a construction that I only need to call cmake /path/to/source && make, or is the only solution having multiple 'root' CMakeList.txt files, each for every target?
Each cmake run will target one specific generator and thus one platform.
What you want can be achieved by having one hierarchy of CMakeLists files for each platform. You need to get to a point where doing a succession of cmake .. && make calls will build the whole project.
Then write a master CMakeLists that executes all of those separate build steps for you, e.g. through ExternalProject_Add or by using custom commands. Depending on the structure of your project it might make sense to have only the tools required for building being processed this way and add the sources for the actual project directly to the master CMakeLists instead.

How exactly does CMake work?

I'm not asking this for just myself. I hope this question will be a reference for the many newbies who like me, found it utterly perplexing about what exactly what was going on behind the scenes when for such a small CMakeLists.txt file
cmake_minimum_required (VERSION 2.6)
project(Tutorial)
add_executable(Tutorial tutorial.cpp)
and such a small tutorial.cpp
int main() { return 0; }
there are so many files generated
CMakeCache.txt cmake_install.cmake Makefile
CMakeLists.txt tutorial.cpp
and a CMakeFiles folder with so many files and folders
CMakeCCompiler.cmake CMakeOutput.log Makefile.cmake
cmake.check_cache CMakeSystem.cmake progress.marks
CMakeCXXCompiler.cmake CMakeTmp TargetDirectories.txt
CMakeDetermineCompilerABI_C.bin CompilerIdC Tutorial.dir
CMakeDetermineCompilerABI_CXX.bin CompilerIdCXX
CMakeDirectoryInformation.cmake Makefile2
Not understanding what was going on behind the scenes (i.e: why so may files had to be generated and what their purpose was), was the biggest obstacle in being able to learn CMake.
If anyone knows, could you please explain it for the sake of posterity? What is the purpose of these files, and when I type cmake ., what exactly is cmake configuring and generating before it builds the project?
The secret is that you don't have to understand what the generated files do.
CMake introduces a lot of complexity into the build system, most of which only pays off if you use it for building complex software projects.
The good news is that CMake does a good job of keeping a lot of this messiness away from you: Use out-of-source builds and you don't even have to look at the generated files. If you didn't do this so far (which I guess is the case, since you wrote cmake .), please check them out before proceeding. Mixing the build and source directory is really painful with CMake and is not how the system is supposed to be used.
In a nutshell: Instead of
cd <source_dir>
cmake .
always use
cd <build_dir_different_from_source_dir>
cmake <source_dir>
I usually use an empty subfolder build inside my source directory as build directory.
To ease your pain, let me give a quick overview of the relevant files which CMake generates:
Project files/Makefiles - What you are actually interested in: The files required to build your project under the selected generator. This can be anything from a Unix Makefile to a Visual Studio solution.
CMakeCache.txt - This is a persistent key/value string storage which is used to cache value between runs. Values stored in here can be paths to library dependencies or whether an optional component is to be built at all. The list of variables is mostly identical to the one you see when running ccmake or cmake-gui. This can be useful to look at from time to time, but I would recommend to use the aforementioned tools for changing any of the values if possible.
Generated files - This can be anything from autogenerated source files to export macros that help you re-integrate your built project with other CMake projects. Most of these are only generated on demand and will not appear in a simple project such as the one from your question.
Anything else is pretty much noise to keep the build system happy. In particular, I never needed to care about anything that is going on inside the CMakeFiles subdirectory.
In general you should not mess with any of the files that CMake generates for you. All problems can be solved from within CMakeLists.txt in one way or the other. As long as the result builds your project as expected, you are probably fine. Do not worry too much about the gory details - as this is what CMake was trying to spare you of in the first place.
As stated on its website:
Cmake is cross-platform, open-source build system for managing the build process of software using a compiler-independent method
In most cases it is used to generate project/make files - in your example it has produced Makefile which are used to build your software (mostly on Linux/Unix platform).
Cmake allows to provide cross platform build files that would generate platform specific project/make files for particular compilation/platform.
For instance you may to try to compile your software on Windows with Visual Studio then with proper syntax in your CMakeLists.txt file you can launch
cmake .
inside your project's directory on Windows platform,Cmake will generate all the necessary project/solution files (.sln etc.).
If you would like to build your software on Linux/Unix platform you would simply go to source directory where you have your CMakeLists.txt file and trigger the same cmake . and it will generate all files necessary for you to build software via simple make or make all.
Here you have some very good presentation about key Cmake functionalities http://www.elpauer.org/stuff/learning_cmake.pdf
EDIT
If you'd like to make platform dependent library includes / variable definitions etc. you can use this syntax in CMakeLists.txt file
IF(WIN32)
...do something...
ELSE(WIN32)
...do something else...
ENDIF(WIN32)
There is also a lot of commands with use of which you are able to prevent the build from failing and in place Cmake will notify you that for instance you do not have boost libraries filesystem and regex installed on your system. To do that you can use the following syntax:
find_package(Boost 1.45.0 COMPONENTS filesystem regex)
Having checked that it will generate the makefiles for appropriate system/IDE/compiler.
Exactly how CMake works is a question for the developers, so this question can't be answered here.
However we can give a touch of useful guidance as far as when you should use CMake and when you therefore need to worry about how it works. I'm not a fan of "oh it just works" answers either - because, especially in software, NOTHING ever "just works" and you ALWAYS have to get into the nitty-gritty details at some point.
CMake is an industrial-strength tool. It automates several VERY complex process and takes into account many variables of which you may not be aware, especially as a fairly new developer, probably working with limited knowledge of all the operating systems and build tools CMake can handle. The reason so many files are generated and why things seem so complex is because all of those other systems are complex and must be accounted for and automated. Additionally there are the issues of "caching" and other time-saving features of the tool To understand everything in CMake would mean understanding everything in these build tools and OS's and all the possible combinations of these variables, which as you can imagine is impossible.
It's important to note that if you're not in charge of managing a large cross-platform build system, and your code base is a few KLOC, maybe up to 100KLOG, using CMake seems a little bit like using a 100,000 dollar forestry tree removal machine to remove weeds from your 2 foot by 2 foot flower garden. (By the way, if you've never seen such a machine, you should look for one on youtube, they're amazing)
If your build system is small and simple it's likely to be better to just write your own makefiles by hand or script them yourself. When your makefiles become unwieldy or you need to build a version of your system on another platform, then you can switch over to CMake. At that point, you'll have lots of problems to solve and you can ask more focused questions about it. In the meantime, check out some of the great books that have been written about CMake, or even better, write one yourself! 8)