Produce static libs from tensorflow_cc and tensorflow_framework - tensorflow

As far as I understand using bazel I can only produce libtensorflow_cc.so and libtensorflow_framework.so.
I need to produce static libs that are position independent (-fPIC) because I'll link them to a dynamic lib of my own later.
I found this answer which suggest the use of a Makefile included in the project.
I successfully used it to replace the libtensorflow_cc.so but what can I do to replace libtensorflow_framework.so?

Not an actual answer, but too long for a comment.
I managed to do something like what you mention using Bazel on Windows. In particular, I wanted to make a single wrapper DLL with one or two headers (limited in functionality) that I could move around easily. I'll write a summary of the things that I did; it's rather convoluted an customized for our needs, but maybe you find something useful.
I pass --config=monolithic to the bazel build command (besides any other option that you need). That will avoid modularizing the library and thus remove the dependency to a libtensorflow_framework.so (see
tools/bazel.rc).
The goal that I build is not any of the ones in the TensorFlow repository. Instead, I add a very small program that uses my wrapper as a new Bazel target (a C++ file plus my headers headers and a BUILD file). So all of TensorFlow had to be compiled beforehand in order to compile this final dummy program.
When I get that done, I take advantage of the fact that Bazel does already compile every subgoal as a static library. I check a file under the bazel-bin directory generated for my dummy program goal with a name ending .params - there I find the path of all the static libraries that were used to compile it.
I copy all of these intermediate static libraries to somewhere else. Also, I copy a bunch of headers I will need to compile my final wrapper (TensorFlow own's, but also Eigen, Protobuf and Nsync now too). I put all of this in a build area I have prepared before.
I use NMake Makefile to produce my custom DLL, using the static libraries, the copied headers and my own thin wrapper.
And that's about it, I think. I have an ugly Bash script I run on MSYS2 that does everything for me. Usually with every new release I need to tweak one or two things (some option in the configure script, some additional headers I need to copy, etc.), but I do get it to work in the end. It's quite a lot of fiddling though, so I'm not necessarily saying you should use the same approach (but feel free to ask for details about any step if you want).

Using the -2.params files #jdehesa mentioned and bazel verbose output (-s switch), you can even create a link command to eventually statically link these intermediate static libraries. I automated this process for Windows/Linux/macOS and included it to the vcpkg package manager. To use it just run vcpkg install tensorflow:x64-windows-static. If you're interested in the sources, you'll find them here.

Related

Find out why cmake adds specific link flags

I have big project with cmake. It mostly works.
But recently some combination of compilation server vs test server broke. Investigation found that final compile/link command calls gcc (...) -licudata -licui18n -licuuc (...), this introduces dependency on shared library which is not present on test server.
How do I find out what in my project (my library, imported library, found library, whatever) adds those 3 flags to compile command?
I don't add them explicitly, so something is done automagically and I want to find it. compile_commands.json doesn't have them because linking flags don't belong in it. CMakeCache.txt has those flags in some obscure variable PC_LIBXML_STATIC_LIBRARIES:INTERNAL but removing them there doesn't affect compile/link command.
Note that this question is not about dealing with libicu specifically but about a method for investigation in general (though comments about eventual known problems with libicu would be appreciated too).
I found out that dependency graphs created by cmake can have more details that was configured for our project. Here are all options: https://cmake.org/cmake/help/latest/module/CMakeGraphVizOptions.html I expect GRAPHVIZ_EXTERNAL_LIBS, GRAPHVIZ_SHARED_LIBS are most important to set to true.
We enabled everything that was possible to enable, filtered out nothing and resulting graph was massive (to big for xdot - luckily .dot files are human readable), but showed that Boost::regex uses those 3 libraries.

Get build command or all compiler flags that will be used to build a target

Is there a sensible way to get a CMake variable containing the build command or all the compiler flags that CMake will associate with a target?
It doesn't seem practical to try to gather and maintain a list of all properties that could add flags. Besides, CMake must have this info somewhere, since it has to eventually generate a build system.
From the CMake docs it looks like this feature once existed and was provided by calling build_command() but this was replaced:
Note In CMake versions prior to 3.0 this command returned a command
line that directly invokes the native build tool for the current
generator.
Is there a new command that gives the old behavior of build_command()?
Is there a sensible way to get a CMake variable containing the build command or all the compiler flags that CMake will associate with a target?
The answer is no (CMake 3.23 is latest at time of writing), not during the CMake configure step.
In general, such a thing is ill-defined, which is likely why it was removed from CMake and will likely not be re-added. The complications arising from generator expressions, multi-config generators, generators that don't construct command lines (like VS/msbuild), source-file-specific properties, and the simple fact that after the command is called, relevant state might change, all make such efforts quixotic.
Honestly, this is such an odd thing to want at configure time, I wonder if this isn't an XY problem. It's unlikely that one target depends on another in such a way that the entire eventual command line is needed (rather than a particular property) to create it.
I know this is many years later now, but what were you trying to do?
CMake provides many ways post-generation to get information about the compiler command lines.
There's the CMake File API, meant for IDE integration,
The CMAKE_EXPORT_COMPILE_COMMANDS option that creates a Clang-compatible compile_commands.json, and then there's
The CMAKE_<LANG>_COMPILER_LAUNCHER variables that would let you instrument a full command line with a custom script while the build is running.
One of these might be useful. The latter is commonly used with ccache, but can be (ab)used with any arbitrary program as long as the output file is eventually generated.
Note that the latter two only work with the Makefile and Ninja generators.
If you want the final output of how the source files will actually be compiled you will want to look at the generated files. I don't really know a better way currently:
Example:
Here is an example output from Ninja Multi
build\CMakeFiles\impl-Release.ninja
This file will list all of the compile definitions, compiler flags, include directories, object directory, etc.
Under the path "cmake-build-debug/CMakeFiles/" you'll find a folder named as "TopFolderOfYourProject.dir", where the cmake generates all its build system files, including a file "build.make". In this file you can see something like this:
CMakeFiles/somepath/somesourcefile.c
#$(CMAKE_COMMAND) -E cmake_echo_color --switch=$(COLOR) --green --progress-dir=xxx\cmake-build-debug\CMakeFiles --progress-num=$(CMAKE_PROGRESS_1) "Building C object CMakeFiles/somepath/somesourcefile.c.obj"
Besides this, you can find extra info about the flags in the file "flags.make", it contains all extra compiler flags specified by developers.
And in "includes_C.rsp/includes_CXX.rsp" you can see the including path.
Build flags are, actually, associated with source files, because you can have differrent flags for different files. On the other hand, for the most cases these flags are equivalent.
Anyways, to get all build flags for a source file you can use COMPILE_FLAGS property:
get_source_file_property(RESULT file.cpp COMPILE_FLAGS)

Getting imported targets through `find_package`?

The CMake manual of Qt 5 uses find_package and says:
Imported targets are created for each Qt module. Imported target names should be preferred instead of using a variable like Qt5<Module>_LIBRARIES in CMake commands such as target_link_libraries.
Is it special for Qt or does find_package generate imported targets for all libraries? The documentation of find_package in CMake 3.0 says:
When the package is found package-specific information is provided through variables and Imported Targets documented by the package itself.
And the manual for cmake-packages says:
The result of using find_package is either a set of IMPORTED targets, or a set of variables corresponding to build-relevant information.
But I did not see another FindXXX.cmake-script where the documentation says that a imported target is created.
find_package is a two-headed beast these days:
CMake provides direct support for two forms of packages, Config-file Packages
and Find-module Packages
Source
Now, what does that actually mean?
Find-module packages are the ones you are probably most familiar with. They execute a script of CMake code (such as this one) that does a bunch of calls to functions like find_library and find_path to figure out where to locate a library.
The big advantage of this approach is that it is extremely generic. As long as there is something on the filesystem, we can find it. The big downside is that it often provides little more information than the physical location of that something. That is, the result of a find-module operation is typically just a bunch of filesystem paths. This means that modelling stuff like transitive dependencies or multiple build configurations is rather difficult.
This becomes especially painful if the thing you are trying to find has itself been built with CMake. In that case, you already have a bunch of stuff modeled in your build scripts, which you now need to painstakingly reconstruct for the find script, so that it becomes available to downstream projects.
This is where config-file packages shine. Unlike find-modules, the result of running the script is not just a bunch of paths, but it instead creates fully functional CMake targets. To the dependent project it looks like the dependencies have been built as part of that same project.
This allows to transport much more information in a very convenient way. The obvious downside is that config-file scripts are much more complex than find-scripts. Hence you do not want to write them yourself, but have CMake generate them for you. Or rather have the dependency provide a config-file as part of its deployment which you can then simply load with a find_package call. And that is exactly what Qt5 does.
This also means, if your own project is a library, consider generating a config file as part of the build process. It's not the most straightforward feature of CMake, but the results are pretty powerful.
Here is a quick comparison of how the two approaches typically look like in CMake code:
Find-module style
find_package(foo)
target_link_libraries(bar ${FOO_LIBRARIES})
target_include_directories(bar ${FOO_INCLUDE_DIR})
# [...] potentially lots of other stuff that has to be set manually
Config-file style
find_package(foo)
target_link_libraries(bar foo)
# magic!
tl;dr: Always prefer config-file packages if the dependency provides them. If not, use a find-script instead.
Actually there is no "magic" with results of find_package: this command just searches appropriate FindXXX.cmake script and executes it.
If Find script sets XXX_LIBRARY variable, then caller can use this variable.
If Find script creates imported targets, then caller can use these targets.
If Find script neither sets XXX_LIBRARY variable nor creates imported targets ... well, then usage of the script is somehow different.
Documentation for find_package describes usual usage of Find scripts. But in any case you need to consult documentation about concrete script (this documentation is normally contained in the script itself).

How to reuse Fortran modules without copying source or creating libraries

I'm having trouble understanding if/how to share code among several Fortran projects without building libraries or duplicating source code.
I am using Eclipse/Photran with the Intel compiler (ifort) on a linux system, but I believe I'm having a bigger conceptual problem with modules than with the specific tools.
Here's a simple example: In ~/workspace/cow I have a source directory (src) containing cow.f90 (the PROGRAM) and two modules m_graze and m_moo in m_graze.f90 and m_moo.f90, respectively. This project builds and links properly to create the executable 'cow'. The executable and modules (m_graze.mod and m_moo.mod) are stored in ~/workspace/cow/Debug and object files are stored under ~/workspace/cow/Debug/src
Later, I create ~/workplace/sheep and have src/sheep.f90 as the program and src/m_baa.f90 as the module m_baa. I want to 'use m_graze, only: ruminate' in sheep.f90 to get access to the ruminate() subroutine. I could just copy m_graze.f90 but that could lead to code getting out of sync and doesn't take into account any dependencies m_graze might have. For these reasons, I'd rather leave m_graze in the cow project and compile and link sheep.f90 against it.
If I try to compile the sheep project, I'll get an error like:
error #7002: Error in opening the compiled module file. Check INCLUDE paths. [M_GRAZE]
Under Properties:Project References for sheep, I can select the cow project. Under Properties:Fortran Build:Settings:Intel Compiler:Preprocessor I can add ~/workspace/cow/Debug (location of the module files) to the list of include directories so the compiler now finds the cow modules and compiles sheep.f90. However the linker dies with something like:
Building target: sheep
Invoking: Intel(R) Fortran Linker
ifort -L/home/me/workspace/cow/Debug -o "sheep" ./src/sheep.o
./src/sheep.o: In function `sheep':
/home/me/workspace/sheep/src/sheep.f90:11: undefined reference to `m_graze_mp_ruminate_'
This would normally be solved by adding libraries and library paths to the linker settings except there are no appropriate libraries to link to (this is Fortran, not C.)
The cow project was perfectly capable of compiling and linking together cow.f90, m_graze.f90 and m_moo.f90 into an executable. Yet while the sheep project can compile sheep.f90 and m_baa.f90 and can find the module m_graze.mod, it can't seem to find the symbols for m_graze even though all the requisite information is present on the system for it to do so.
It would seem to be an easy matter of configuration to get the linker portion of ifort to find the missing pieces and put them together but I have no idea what magic words need to be entered where in the Photran UI to make this happen.
I confess an utter lack of interest and competence in C and the C build process and I'd rather avoid the diversion of creating libraries (.a or .so) unless that's the only way to make this work.
Ultimately, I'm looking for a pure Fortran solution to this problem so I can keep a single copy of the source code and don't have to manually maintain a pile of custom Makefiles.
So can this be done?
Apologies if this has already been documented somewhere; Google is only showing me simple build examples, how to create modules, and how to link with existing libraries. There don't seem to be (m)any examples of code reuse with modules that don't involve duplicating source code.
Edit
As respondents have pointed out, the .mod files are necessary but not sufficient; either object code (in the form of m_graze.o) or static or shared libraries must be specified during the linking phase. The .mod files describe the interface to the object code/library but both are necessary to build the final executable.
For an oversimplified toy problem such as this, that's sufficient to answer the question as posed.
In a larger project with more complex dependencies (in my case, 80+KLOC of F90 linking to the MKL version of LAPACK95), the IDE or toolchain may lack sufficient automatic or user-interface facilities to make sharing a single canonical set of source files a viable strategy. The choice seems to be between risking duplicate source files getting out of sync, giving up many of the benefits of an IDE (i.e. avoiding manual creation of make/CMake/SCons files), or, in all likelihood, both. While a revision control system and good code organization can help, it's clear that sharing a single canonical set of source files among projects is far from easy given the current state of Eclipse.
Some background which I suspect you already know: Typically (including ifort) compiling the source code for a Fortran module results in two outputs - a "mod" file that contains a description of the Fortran entities that the module defines that the compiler needs to find whenever it sees a USE statement for the module, and object code for the linker that implements the procedures and variable storage, etc., that the module defines.
Your first error (the one you solved) is because the compiler couldn't find the mod file.
The second error is because the linker hasn't been told about the object code that implements the stuff that was in the source file with the module. I'm not an Eclipse user by any means, but a brute force way of specifying that is just to add the object file (xxxxx/Debug/m_graze.o) as an additional linker option (Fortran Build > Settings, under Intel Fortran Linker > Command Line). (Other tool chains have explicit "additional object file" properties for their link stage - there may well be a better way of doing this for the Intel chain.)
For more involved examples you would typically create a library out of the shared code. That's not really C specific, the only Fortran aspect is that the libraries archive of object code needs to be provided alongside the mod files that the Fortran compiler generates.
Yes the object code must be provided. E.g., when you install libnetcdf-dev in Debian (apt-get install libnetcdf-dev), there is a /usr/include/netcdf.mod file that is included.
You can now use all netcdf routines in your Fortran code. E.g.,
program main
use netcdf
...
end
but you'll have link to the netcdf shared (or static) library, i.e.,
gfortran -I/usr/include/ main.f90 -lnetcdff
However, as user MSB mentioned the mod file can only be used by gfortran that comes with the distribution (apt-get install gfortran). If you want to use any other compiler (even a different version that you may have installed yourself) then you'll have to build netcdf yourself using that particular compiler.
So creating a library is not a bad solution.

.h generated from .h.in?

There are struct definitions in the .h file that my library creates after I build it.. but I cannot find these in the corresponding .h.in. Can somebody tell me how all this works and where it gets the extra info from?
To be specific: I am building pth, the userspace threading library. It has pth_p.h.in, which doesn't contain the struct definition I am looking for, yet when I build the library, a pth_p.h appears and it has the definition I need.
In fact, I have searched every single file in the library before it is built and cannot find where it is generating the struct definition.
Pth uses GNU Autoconf, Automake, and Libtool. By running ./configure you'll be running a shell script which eventually runs m4 to detect the presence of a whole bunch of different system attributes and make changes to a number of files.
It looks like it boils down to ./configure generating Makefile from Makefile.in and then running something via make that triggers the shtool subcommand scpp:
pth_p.h: $(S)pth_p.h.in
$(SHTOOL) scpp -o pth_p.h -t $(S)pth_p.h.in -Dcpp -Cintern -M '==#==' $(HSRCS)
Obscure link, but here's an shtool-scpp manpage, which describes it as:
This command is an additional ANSI C
source file pre-processor for sharing
cpp(1) code segments, internal
variables and internal functions. The
intention for this comes from writing
libraries in ANSI C. Here a common
shared internal header file is usually
used for sharing information between
the library source files.
The operation is to parse special
constructs in files, generate a few
things out of these constructs and
insert them at position mark in tfile
by writing the output to ofile.
Additionally the files are never
touched or modified. Instead the
constructs are removed later by the
cpp(1) phase of the build process. The
only prerequisite is that every file
has a ``"#include ""ofile"""'' at the
top.
.h.in is probably processed within a configure (generated from configure.ac) script, look out for
AC_CONFIG_FILES([thatfile.h])
It replaces variables of the form #VAR# in the .in file with their values.
Edit: Just noticed if I'm right you should retag your question