How to use Flambda with dune in ocaml? - optimization

I'm doing a project in which I need to optimize my code as much as I can because it takes hours to run with a normal compilation.
I was told to use Flambda, but I didn't find anything on the internet on how to use it with dune.
I'd appreciate instructions to follow / resource to read.

Flambda is a configuration feature of the compiler, which is disabled by default (as of January 2022). To use it, you need to install a version of the compiler that enables this feature. It is very easy with opam, e.g.,
opam switch create myswitch ocaml-variants.4.13.1+options ocaml-option-flambda
It will create a new switch with the 4.13.1 version of the OCaml compiler and that has the flambda feature enabled. The general syntax is,
opam switch create <switch-name> ocaml-variants.<version>+options <options>...
You can enable several options, see opam search ocaml-option for all the options.
After you have installed the flambda version of OCaml, it will use flambda optimizations by default (but make sure that you're using the release version, as the default debug version disables some of the optimizations, including the cross-module optimization). Also, as usual, do not forget to activate your switch with eval $(opam env).
The flambda compiler has numerous configuration options that are thoroughly described in the manual. You may experiment with them to find some tradeoff between the compilation time and the performance of your application. You can set the options using flags stanza, e.g., (flags (:standard -rounds 5)). You can set the flags globally as well. Once you find the perfect set of flags, you can even use them to compile your upstream dependencies, using the OCAMLPARAM environment variable. Using this environment variable you can change the optimization parameters without interfering with the build scripts of other packages, e.g.,
export OCAMLPARAM='_,rounds=5,O3=1,inline=100,inline-max-unroll=5'
# opam install your deps

Related

lookup and verify sha256 for kotlinc in bazel

I try migrating from Maven to Bazel and want to use another (newer) version of kotlinc than the standard default.
I have started from the example given on https://github.com/bazelbuild/rules_kotlin#custom-kotlinc-distribution-and-version:
load("#io_bazel_rules_kotlin//kotlin:repositories.bzl", "kotlin_repositories", "kotlinc_version")
kotlin_repositories(
compiler_release = kotlinc_version(
release = "1.6.21", # just the numeric version
sha256 = "632166fed89f3f430482f5aa07f2e20b923b72ef688c8f5a7df3aa1502c6d8ba"
)
)
However, the link and the example explain only the syntax. It is nice that the Bazel configuration is so concise, but on the downside it is hard to see what the effect of this rule is.
How can I know/verify what binary for kotlinc will be download and where from? Can I figure out which other kotlinc releases are available, and more importantly, what the correct sha256 should be?
(Since this seems to be an official part of Bazel (loading from #io_bazel…), maybe there is a public directory (like https://mvnrepository.com/ for mvn, https://www.npmjs.com/ for npm, etc.) to check?)
Both the hash and the download location for the kotlinc that are apparently used by Bazel kotlin_rule can currently be found on GitHub:
https://github.com/JetBrains/kotlin/releases
hash for each release is printed at the end of the respective release note, next to kotlin-compiler-*.zip
download URL follows under "Asset", which for older releases may be collapsed (not immediately visible)
(as for the last part of the question: a unified directory for dependencies (artifacts, plugins, toolchains) seems to be unfeasible for a general-purpose build automation tool such as Bazel; it works for specific-purpose tools like Maven, NPM and so on; but all these ecosystems use quite different approaches and conventions, its probably too much hassle to provide that in a unified way for a system like Bazel; at least some of these bounded ecosystems already started to provide configuration snippets for Bazel)

When I should use find_package

I am learning CMake, and I feel hard to understand when I should use find_package.
For separate compilation, we need to let the compiler knows where to find the header file, and this could be done by target_include_directories. For linking, we need to let the linker knows where the implementation is, and this could be done by target_link_libraries. It seems like that is all we need to do to compile a project. Could anyone explain why and when we should use find_package?
If a package you intend allows for the use of find_package, you should use it. If a package comes with a working configuration script, it'll encourage you to use the library the way it's intended to be used likely come with a simple way to add include directories and dependencies required.
When is it possible to use find_package?
There needs to be either a configuration script (<PackageName>Config.cmake or packagename-config.cmake) that gets installed with the package or find script (Find<PackageName>.cmake). The latter one in some cases even comes with the cmake installation instead of the package installed, see CMake find modules.
Should you create missing scripts yourself?
There are several benefits in creating a package configuration script yourself, even if a package doesn't come with a existing configuration or find script:
The scripts separate the information about libraries from the logic used to create your own target. The use of the 2 commands find_package and target_link_libraries is concise and any logic you may need to collect and apply information like dependencies, include directories, minimal versions of the C++ standard to use, ect. would probably take up much more space in your CMakeLists.txt files thus making it harder to understand.
If makes library used easy to replace. Basically all it takes to go with a different version of the same package would be to modify CMAKE_PREFIX_PATH, CMAKE_MODULE_PATH or package-specific <PackageName>_ROOT variables. If you ever want to try out different versions of the same library, this is incredibly useful.
The logic is reuseable. If you need to use the same functionality in a different project, it takes little effort to reuse the same logic. Even if a library is only used within a single project, but in multiple places, the use of find_package can help keeping the logic for "importing" a lib close to its use (see also the first bullet point).
There can be multiple versions of the same library with automatic selection of applicable ones. Note that this requires the use of a version file, but this file allows you to specify, if a version of the package is suitable for the current project. This allows for the checking the target architecture, ect. This is helpful when cross compiling or when providing both 32 and 64 bit versions of a library on Windows: If a version file indicates a mismatch the search for a suitable version simply continues with different paths instead of failing fatally when considering the first mismatch.
You will probably find CMake's guide on using dependencies helpful. It describes find_package and alternatives, and when each one is relevant / useful. Here's an excerpt from the section on find_package (italics added):
A package needed by the project may already be built and available at some location on the user's system. That package might have also been built by CMake, or it could have used a different build system entirely. It might even just be a collection of files that didn't need to be built at all. CMake provides the find_package() command for these scenarios. It searches well-known locations, along with additional hints and paths provided by the project or user. It also supports package components and packages being optional. Result variables are provided to allow the project to customize its own behavior according to whether the package or specific components were found.
find_package requires that the package provide CMake support in the form of specific files that describe the package's contents to CMake. Some library authors provide this support (the most desirable scenario for you, the package consumer), some don't but are prominent enough that CMake itself comes with such files for those packages, or in the worst case, there is no CMake support at all, in which case you can either do something to get the either of the previous good outcomes, or perform some kludges to get the job done (ie. define the targets yourself in your project's CMake config).

Variable interpolation in -D option

As a package manager for a Linux distribution, I want to install docs into a separate prefix. With CMake projects, the docs installation location is controlled by CMAKE_INSTALL_DOCDIR from GNUInstallDirs module. Unfortunately, unlike the other directory variables, this one contains the project name so I cannot just use cmake "-DCMAKE_INSTALL_DOCDIR=$myDocPrefix/doc".
With GNU Make, I would run make "DOCDIR=$myDocPrefix/doc/\$(PROJECT_NAME)" and have Make interpolate it but the documentation of CMake’s -D option does not mention interpolation and I understand that CMake uses much more complex system of cache entries where interpolation might be problematic (especially if the referenced variable is not yet in cache).
I could pass tailor-made CMAKE_INSTALL_DOCDIR to each CMake project but would be bothersome as I would have to do that in every package definition manually; being able to define configureCmakeProject function and have it take care of everything automatically would be better. When setting it manually, I would also want to make sure it matches the PROJECT_NAME of the respective CMake project – well, I could resign on that and just use $packageName from the package definition instead but keeping packages as close to upstream as possible is preferred.
Alternately, I could try to grep CMakeLists.txt for project command but that seems fragile and might still result in misalignments. I doubt it is possible to extract it using some CMake API since the project is not configured at the time and we actually need the value to configure the project.
Is there a way I can configure CMAKE_INSTALL_DOCDIR to use custom prefix but still keep the project name set by the CMake project?

Force CMake to install targets to architecture-specific directories?

I'm currently having this issue with the Google Protobuf Library, but it is a recurring problem and will likely occur with many if not all 3rd-party packages that I want to build and install from source.
I'm developing for Windows, and we need to be able to generate both 32-bit and 64-bit versions of our DLLs. It was relatively straightforward to get CMake to install our own modules to architecture-specific subdirectories, e.g. D:\libraries\bin\i686 and d:\libraries\lib\i686 (and sim. for bin). But I'm having trouble achieving the same thing with 3rd-party libraries such as Protobuf.
I could, of course, use distinct CMAKE_INSTALL_PREFIX and CMAKE_PREFIX_PATH combinations (e.g. D:\libraries-i686 and D:\libraries-x86_64, and will probably end up doing just that, but it bothers me that there doesn't seem to be a better alternative. The docs for find_package() clearly show that the search procedure does attempt architecture-specific search paths, so why do the CMake files of popular libraries not generally seem to support installing to architecture-specific subdirectories?
Or could it be that it is just a matter of setting the right CMAKE_XXX variable?
Thanks to #arrowd for pointing me in the right direction, I now have my answer, though it is not exactly what I had hoped for.
CMAKE_LIBRARY_OUTPUT_DIRECTORY and CMAKE_RUNTIME_OUTPUT_DIRECTORY, however, specify the build output directories, not the install directories. As it turns out though, there are variables for the install directories too, called CMAKE_INSTALL_BINDIR and CMAKE_INSTALL_LIBDIR - they are actually plainly visible (along with plenty more) in the cmake-gui interface when "Advanced" is checked.
I tried setting those two manually (to bin\i686 and lib\i686), and it works: the Protobuf INSTALL target copies the files where I wanted to have them, i.e. where the CMake script of my consumer project will find them in an architecture-safe manner.
I'm not sure how I feel about this - I would have preferred something like a CMAKE_INSTALL_ARCHITECTURE or CMAKE_ARCHITECTURE_SUBDIR variable that CMake would automatically append to relevant install paths. The solution above requires overriding defaults that I would prefer to leave untouched.
Under the circumstances, my fallback approach might still be the better option. That approach however requires that the choice of architecture be made very early on, typically when running the script that initializes the CMake-specific environment variables that will be passed to cmake when configuring build directories. And it's worse when using cmake-gui, which requires the user to set all directories manually.
In the end, I'm still undecided.

How do you make it so that cpack doesn't add required libraries to an RPM?

I'm trying to convert our build system at work over to cmake and have run into an interesting problem with the RPMs that it generates (via cpack): It automatically adds all of the dependencies that it thinks your RPM has to its list of required libraries.
In general, that's great, but in my case, it's catastrophic. Unfortunately, the development packages that we build end up getting installed with one our home-grown tool that uses rpm to install them in a separate RPM database from the system one. It's stupid, but I can't change it. What this means is that all of the system libraries that any normal library will rely on (like libc or libpthread) aren't in the RPM database that is being used with our development packages. So, if an RPM for one of our development packages lists system libraries as being required, then we can't install it, as rpm will think that they're not installed (since they're listed in the normal database rather than the one that it's being told to use when installing our packages). Our current build stuff handles this just fine, because it doesn't list any system libraries as dependencies in the RPMs, but cpack automatically populates the RPM's list of required libraries and puts the system libraries in there. I need a way to stop it from doing so.
I tried setting CPACK_RPM_PACKAGE_REQUIRES to "", but that has no effect. The RPM cpack generates still ends up with the system libraries listed as being required. All I can think of doing at this point is to copy the RPM cpack generator and hack it up to do what I want and use that instead of the standard one, but I'd prefer to avoid that. Does anyone have any idea how I could get cpack to stop populating the RPM with required libraries?
See bottom of
http://www.rpm.org/max-rpm/s1-rpm-depend-auto-depend.html
The autoreqprov Tag — Disable Automatic Dependency Processing
There may be times when RPM's automatic dependency processing is not desired. In these cases, the autoreqprov tag may be used to disable it. This tag takes a yes/no or 0/1 value. For example, to disable automatic dependency processing, the following line may be used:
AutoReqProv: no
EDIT:
In order to set this in cmake, you need to do set(CPACK_RPM_PACKAGE_AUTOREQPROV " no"). The extra space seems to be required in front of (or behind) the no in order for it to work. It seems that the RPM module for cpack has a bug which makes it so that it won't let you set some its variables to anything shorter than 3 characters long.
To add to Mark Lakata's answer above, there's a snapshot of the "Maximum RPM" doc
http://www.rpm.org/max-rpm-snapshot/s1-rpm-depend-auto-depend.html
that also adds:
The autoreq and autoprov tags can be used to disable automatic processing of requirements or "provides" only, respectively.
And at least with my version of CPackRPM, there seems to be similar variables you can set e.g.
set(CPACK_RPM_PACKAGE_AUTOREQ " no")
to only disable the automatic dependency processing of 'Requires'.