How to load customized dynamic libs (*.so) in tensorflow serving (gpu)? - tensorflow

I wrote my own cudaMelloc as follows, which I plan to apply in tensorflow serving (GPU) to trace the cudaMelloc calls via the LD_PRELOAD mechanism (could be used to limit the GPU usage for each tf serving container with proper modification as well).
typedef cudaError_t (*cu_malloc)(void **, size_t);
/* cudaMalloc wrapper function */
cudaError_t cudaMalloc(void **devPtr, size_t size)
{
//cudaError_t (*cu_malloc)(void **devPtr, size_t size);
cu_malloc real_cu_malloc = NULL;
char *error;
real_cu_malloc = (cu_malloc)dlsym(RTLD_NEXT, "cudaMalloc");
if ((error = dlerror()) != NULL) {
fputs(error, stderr);
exit(1);
}
cudaError_t res = real_cu_malloc(devPtr, size);
printf("cudaMalloc(%d) = %p\n", (int)size, devPtr);
return res;
}
I compile the above code into a dynamic lib file using the following command:
nvcc --compiler-options "-DRUNTIME -shared -fpic" --cudart=shared -o libmycudaMalloc.so mycudaMalloc.cu -ldl
When applied to a vector_add program compiled with command nvcc -g --cudart=shared -o vector_add_dynamic vector_add.cu, it works well:
root#ubuntu:~# LD_PRELOAD=./libmycudaMalloc.so ./vector_add_dynamic
cudaMalloc(800000) = 0x7ffe22ce1580
cudaMalloc(800000) = 0x7ffe22ce1588
cudaMalloc(800000) = 0x7ffe22ce1590
But when I apply it to tensorflow serving using the following command, the cudaMelloc calls do not refer to the dynamic lib I wrote.
root#ubuntu:~# LD_PRELOAD=/root/libmycudaMalloc.so ./tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=resnet --model_base_path=/models/resnet
So here's my questions:
Is it because that tensorflow-serving is built in a fully static manner, such that tf-serving refers to the libcudart_static.a instead of libcudart.so?
If so, how could I build tf-serving to enable dynamic linking?

Is it because that tensorflow-serving is built in a fully static manner, such that tf-serving refers to the libcudart_static.a instead of libcudart.so?
It probably isn't built fully-static. You can see whether it is or not by running:
readelf -d tensorflow_model_server | grep NEEDED
But it probably is linked with libcudart_static.a. You can see whether it is or not with:
readelf -Ws tensorflow_model_server | grep ' cudaMalloc$'
If you see unresolved (U) symbol (as you would for the vector_add_dynamic binary), then LD_PRELOAD should work. But you'll probably see a defined (T or t) symbol instead.
If so, how could I build tf-serving to enable dynamic linking?
Sure: it's open-source. All you have to do is figure out how to build it, then how to build it without libcudart_static.a, and then figure out what (if anything) breaks when you do so.

Related

DLL silently ignored even though the library links

I'm using a third part DLL that I've used successfully for ages. Now the linker links the dll lib without complaint but the exe doesn't load the dll.
I recently upgraded from the 32 bit to 64 bit cygwin.
I'm doing a mingw cross compile to 32 bits.
I'm trying to use the FTDI USB interface FTD2XX dll.
I have the version 2.04.06 FTD2XX lib, .h, and dll.
I had been using that dll successfully for ages but with older versions of cygwin and mingw.
Recently upgraded to cygwin64.
The app appears to link with the FTD2XX.lib without complaint.
But when I run the app it doesn't seem to look for or load the FTD2XX.dll.
The app runs but crashes as soon as it tries to call something in FTD2XX dll.
I created a simple hello_dll.dll for side by side test. That works.
The app.c does calls on both hello_dll.dll and ftd2xx.dll.
Is starts without complain, successfully calls function in hello_dll, and then it crashes on a call to ft2xx.dll.
(I renamed the lib to ftd2xx_2.04.06 to distinguish them from other versions I have. Newer versions don't work any better.)
Link with -verbose gives:
i686-w64-mingw32-gcc -Wall -m32 -g -O2 -c -I . -o app.o app.c
i686-w64-mingw32-gcc -Wall -m32 -o app.exe app.o -Wl,-verbose -L. -lhello_dll -lftd2xx_2.04.06
GNU ld (GNU Binutils) 2.34.50.20200227
Supported emulations:
i386pe
using internal linker script:
<snip>
/usr/lib/gcc/i686-w64-mingw32/9.2.0/../../../../i686-w64-mingw32/bin/ld: mode i386pe
attempt to open /usr/i686-w64-mingw32/sys-root/mingw/lib/../lib/crt2.o succeeded
/usr/i686-w64-mingw32/sys-root/mingw/lib/../lib/crt2.o
attempt to open /usr/lib/gcc/i686-w64-mingw32/9.2.0/crtbegin.o succeeded
/usr/lib/gcc/i686-w64-mingw32/9.2.0/crtbegin.o
attempt to open app.o succeeded
app.o
<snip>
attempt to open ./hello_dll.lib succeeded
./hello_dll.lib
(./hello_dll.lib)d000001.o
(./hello_dll.lib)d000000.o
(./hello_dll.lib)d000002.o
<snip>
attempt to open ./ftd2xx_2.04.06.lib succeeded
./ftd2xx_2.04.06.lib
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
(./ftd2xx_2.04.06.lib)FTD2XX.dll
::::::::::::::::::::::::::::
I obtained a 32 bit compatible version of gdb. When I run gdb:
GNU gdb (GDB) 7.7.50.20140303-cvs
<snip>
This GDB was configured as "i686-pc-mingw32".
<snip>
(gdb) break main
(gdb) Breakpoint 1 at 0x40267b: file app.c, line 28.
(gdb) run
(gdb) Starting program: C:\_d\aaa\pd\src\dll\pathological\app.exe
[New Thread 1428.0x2528]
Breakpoint 1, main (argc=1, argv=0x9b2f70) at app.c:28
28 dostuff();
(gdb) info share
(gdb) From To Syms Read Shared Object Library
0x774e0000 0x77644ccc Yes (*) C:\Windows\SysWOW64\ntdll.dll
0x753d0000 0x754cadec Yes (*) C:\Windows\syswow64\kernel32.dll
0x75ea1000 0x75ee6a3a Yes (*) C:\Windows\syswow64\KernelBase.dll
0x64081000 0x6408a1d8 Yes C:\_d\aaa\pd\src\dll\pathological\hello_dll.dll
0x75041000 0x750eb2c4 Yes (*) C:\Windows\syswow64\msvcrt.dll
(*): Shared library is missing debugging information.
(gdb) A debugging session is active.
(gdb) c
Continuing.
Hello dll. <--- The function in hello_dll.dll prints this.
Program received signal SIGSEGV, Segmentation fault.
0x8000004c in ?? () <----- call to FT_GetLibraryVersion()
(gdb) bt
#0 0x8000004c in ?? ()
#1 0x0040158e in dostuff () at app.c:49
#2 0x00402680 in main (argc=1, argv=0x8e2f70) at app.c:28
(gdb)
It links with the lib without complaint but when I run the exe it (silently) doesn't load the dll.
Anybody have any ideas? Is there some linker control that I am missing? Are there other diagnostic or debug tools to dig into this further?
:::::::::::::::::::::::
edit 7/11/20
I'll post some code. (If I know how. I'm new here.)
It should be shown in the "info share", but it isn't, as you can see above.
I'm suspecting name decoration. Objdump -x of the .exe shows an entry for FTD2XX.dll in the Import Tables. But it doesn't show any vma or bound name under it. I suspect that at program load the loader sees no vma/name and decides it doesn't really need to load the dll.
There is an import table in .idata at 0x406000
<snip>
The Import Tables (interpreted .idata section contents)
vma: Hint Time Forward DLL First
Table Stamp Chain Name Thunk
00006000 0000607c 00000000 00000000 00006218 0000614c
DLL Name: FTD2XX.dll
vma: Hint/Ord Member-Name Bound-To
<----- empty?
00006014 00006080 00000000 00000000 000064f8 00006150
DLL Name: hello_dll.dll
vma: Hint/Ord Member-Name Bound-To
6224 1 hello_dll
00006028 00006088 00000000 00000000 00006554 00006158
DLL Name: KERNEL32.dll
vma: Hint/Ord Member-Name Bound-To
6230 277 DeleteCriticalSection
6248 310 EnterCriticalSection
<snip>
:::::::::::::::::::::::::::::::::::::::::::::
edit 2, 7/11/20
This is the program that calls functions in the DLLs.
/* app.c
Demonstrates using the function imported from the DLL.
*/
// 200708 pathological case. Based on the simple hello_dll.
//#include <stdlib.h>
// for sleep
#include <unistd.h>
#include <stdio.h>
// for dword
#include <windef.h>
// for lpoverlapped
#include <minwinbase.h>
#include "hello_dll.h"
// My legacy app, and really all others too, use 2.04.06.h
#include "ftd2xx_2.04.06.h"
//#include "ftd2xx_2.02.04.h"
///////////////////////////
void dostuff( void );
void call_ft_listdevices( void );
///////////////////////////
int main(int argc, char** argv)
{
FT_STATUS status;
DWORD libver;
//dostuff();
printf( "Calling hello_dll():\n" );
fflush( stdout );
hello_dll();
fflush( stdout );
printf( "Back from hello_dll()\n" );
fflush( stdout );
sleep( 1 );
printf( "Calling FT_GetLibraryVersion().\n" );
fflush( stdout );
status = FT_GetLibraryVersion( &libver );
if( status == FT_OK ){
printf( "FTD2XX library version 0x%lx\n", libver );
fflush( stdout );
}
else{
printf( "Error reading FTD2XX library version.\n" );
fflush( stdout );
}
// 200710 Adding call to different ft function did
// not result in entries in the import table.
//call_ft_listdevices( );
return 0;
}
I don't think there is a need to include the code for my hello_dll. It works.
I have three versions of the FTD2XX. I'm pretty careful about tracking versions. Plus, when one is beating one's head against the wall, double checking the versions appeals early on as a way to end the pain.
I found a surprise copy of FTD2XX.dll. It's in c:/Windows/SysWOW64. It is the oldest of the three versions I have. Versions of my app that were compiled before this problem started run correctly using that dll in that place.
Solved.
There's a bug in the 2.34.50.20200227 i686-w64-mingw32-ld.exe. It won't work with ftd2xx.lib, regardless of ftd2xx version as far as I can tell.
2.25.51.20150320 and 2.29.1.20171006 work with ftd2xx.lib. I've reverted back to 2.29 mingw64-i686-binutils. I'm running again.

How to resolve the issue "Can't find import Effects" in Idris?

I was trying to compile the following "hello world" source file after some time away from Idris using a nix-shell environment (direction):
module Main
import Effects
import Effect.StdIO
hello : Eff () [STDIO]
hello = putStrLn "Hello world!"
main : IO ()
main = run hello
My output was as follows during the experimentation; despite specifying what seems to be the right library, I still get the same error at the end:
[nix-shell:~/workspace/idris_tmp]$ idris --listlibs
00prelude-idx.ibc
prelude
00base-idx.ibc
base
[nix-shell:~/workspace/idris_tmp]$ exit
exit
brandon#brandon-750-170se-DevContainer:~/workspace/idris_tmp
$ nix-shell -p 'idrisPackages.with-packages (with idrisPackages; [ contrib effects ])' -p closurecompiler
these derivations will be built:
/nix/store/j44rkb0fpxqd4qkamrg1ysf9kbx1q2qa-idris-1.3.0.drv
these paths will be fetched (1.55 MiB download, 4.37 MiB unpacked):
/nix/store/1fcldlyp5n0ry22dsqfam68mpra6jky9-idris-effects-1.3.0
/nix/store/a9rjm84pbmvg6dmkdzhl9q1wliyi0q4b-idris-contrib-1.3.0
copying path '/nix/store/a9rjm84pbmvg6dmkdzhl9q1wliyi0q4b-idris-contrib-1.3.0' from 'https://cache.nixos.org'...
copying path '/nix/store/1fcldlyp5n0ry22dsqfam68mpra6jky9-idris-effects-1.3.0' from 'https://cache.nixos.org'...
building '/nix/store/j44rkb0fpxqd4qkamrg1ysf9kbx1q2qa-idris-1.3.0.drv'...
/nix/store/5dny6qnjfq9liya5z1sxvr2g64bqypwl-idris-base-1.3.0/nix-support:
propagated-build-inputs: /nix/store/1fcldlyp5n0ry22dsqfam68mpra6jky9-idris-effects-1.3.0/nix-support/propagated-build-inputs
/nix/store/a9rjm84pbmvg6dmkdzhl9q1wliyi0q4b-idris-contrib-1.3.0/nix-support:
propagated-build-inputs: /nix/store/1fcldlyp5n0ry22dsqfam68mpra6jky9-idris-effects-1.3.0/nix-support/propagated-build-inputs
/nix/store/a5x52wi84jgjiimpnkfpcl3mbpbkf1r4-idris-1.3.0/nix-support:
propagated-build-inputs: /nix/store/1fcldlyp5n0ry22dsqfam68mpra6jky9-idris-effects-1.3.0/nix-support/propagated-build-inputs
[nix-shell:~/workspace/idris_tmp]$ idris --listlibs
00effects-idx.ibc
effects
00base-idx.ibc
base
00prelude-idx.ibc
prelude
00contrib-idx.ibc
contrib
[nix-shell:~/workspace/idris_tmp]$ idris --codegen javascript hello.idr -o hello.js
Can't find import Effects
Try to add -p effects to your commandline as in
idris -p effects --codegen javascript hello.idr -o hello.js
this will load the additional library/package. You can avoid doing this by defining an ipkg file, which describes your build. See here: http://docs.idris-lang.org/en/latest/reference/packages.html

error using yaml-cpp to get integer nested in key hierarchy

Initial Question
I have a config.yaml that has structure similar to
some_other_key: 34
a:
b:
c:
d: 3
I thought I could do
YAML::Node config = YAML::LoadFile(config_filename.c_str());
int x = config["a"]["b"]["c"]["d"].as<int>();
but I get
terminate called after throwing an instance of
'YAML::TypedBadConversion<int>'
what(): bad conversion
How do I descend through my config.yaml to extract a value like this? I also get that same exception if I mistype one of the keys in the path, so I can't tell from the error if I am accidentally working with a null node or if there is a problem converting a valid node's value to int
Follow up After First Replies
Thank you for replying! Maybe it is an issue with what is in the config.yaml? Here is a small example to reproduce,
yaml file: config2.yaml
daq_writer:
num: 3
num_per_host: 3
hosts:
- local
datasets:
small:
chunksize: 600
Python can read it:
Incidentally, I am on linux on rhel7, but from a python 3.6 environment, everything looks good:
$ python -c "import yaml; print(yaml.load(open('config2.yaml','r')))"
{'daq_writer': {'num_per_host': 3, 'num': 3, 'datasets': {'small': {'chunksize': 600}}, 'hosts': ['local']}}
C++ yaml-cpp code
The file yamlex.cpp:
#include <string>
#include <iostream>
#include "yaml-cpp/yaml.h"
int main() {
YAML::Node config = YAML::LoadFile("config2.yaml");
int small_chunksize = config["daq_writer"]["datasets"]["smal"]["chunksize"].as<int>();
}
When I compile and run this, I get:
(lc2) psanagpu101: ~/rel/lc2-hdf5-110 $ c++ --std=c++11 -Iinclude -Llib -lyaml-cpp yamlex.cpp
(lc2) psanagpu101: ~/rel/lc2-hdf5-110 $ LD_LIBRARY_PATH=lib ./a.out
terminate called after throwing an instance of 'YAML::TypedBadConversion<int>'
what(): bad conversion
Aborted (core dumped)
(lc2) psanagpu101: ~/rel/lc2-hdf5-110 $ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
I have been able to read top level keys, like the some_other_key that I referenced above, but I got an error when I went after this nested key. Good to know that syntax works!
You have a typo in your keys: instead of "small", you wrote "smal".

Generating .gcda coverage files via QEMU/GDB

Executive summary: I want to use GDB to extract the coverage execution counts stored in memory in my embedded target, and use them to create .gcda files (for feeding to gcov/lcov).
The setup:
I can successfully cross-compile my binary, targeting my specific embedded target - and then execute it under QEMU.
I can also use QEMU's GDB support to debug the binary (i.e. use tar extended-remote localhost:... to attach to the running QEMU GDB server, and fully control the execution of my binary).
Coverage:
Now, to perform "on-target" coverage analysis, I cross-compile with
-fprofile-arcs -ftest-coverage. GCC then emits 64-bit counters to keep track of execution counts of specific code blocks.
Under normal (i.e. host-based, not cross-compiled) execution, when the app finishes __gcov_exit is called - and gathers all these execution counts into .gcdafiles (that gcov then uses to report coverage details).
In my embedded target however, there's no filesystem to speak of - and libgcov basically contains empty stubs for all __gcov_... functions.
Workaround via QEMU/GDB: To address this, and do it in a GCC-version-agnostic way, I could list the coverage-related symbols in my binary via MYPLATFORM-readelf, and grep-out the relevant ones (e.g. __gcov0.Task1_EntryPoint, __gcov0.worker, etc):
$ MYPLATFORM-readelf -s binary | grep __gcov
...
46: 40021498 48 OBJECT LOCAL DEFAULT 4 __gcov0.Task1_EntryPoint
...
I could then use the offsets/sizes reported to automatically create a GDB script - a script that extracts the counters' data via simple memory dumps (from offset, dump length bytes to a local file).
What I don't know (and failed to find any relevant info/tool), is how to convert the resulting pairs of (memory offset,memory data) into .gcda files. If such a tool/script exists, I'd have a portable (platform-agnostic) way to do coverage on any QEMU-supported platform.
Is there such a tool/script?
Any suggestions/pointers would be most appreciated.
UPDATE: I solved this myself, as you can read below - and wrote a blog post about it.
Turned out there was a (much) better way to do what I wanted.
The Linux kernel includes portable GCOV related functionality, that abstracts away the GCC version-specific details by providing this endpoint:
size_t convert_to_gcda(char *buffer, struct gcov_info *info)
So basically, I was able to do on-target coverage via the following steps:
Step 1
I added three slightly modified versions of the linux gcov files to my project: base.c, gcc_4_7.c and gcov.h. I had to replace some linux-isms inside them - like vmalloc,kfree, etc - to make the code portable (and thus, compileable on my embedded platform, which has nothing to do with Linux).
Step 2
I then provided my own __gcov_init...
typedef struct tagGcovInfo {
struct gcov_info *info;
struct tagGcovInfo *next;
} GcovInfo;
GcovInfo *headGcov = NULL;
void __gcov_init(struct gcov_info *info)
{
printf(
"__gcov_init called for %s!\n",
gcov_info_filename(info));
fflush(stdout);
GcovInfo *newHead = malloc(sizeof(GcovInfo));
if (!newHead) {
puts("Out of memory!");
exit(1);
}
newHead->info = info;
newHead->next = headGcov;
headGcov = newHead;
}
...and __gcov_exit:
void __gcov_exit()
{
GcovInfo *tmp = headGcov;
while(tmp) {
char *buffer;
int bytesNeeded = convert_to_gcda(NULL, tmp->info);
buffer = malloc(bytesNeeded);
if (!buffer) {
puts("Out of memory!");
exit(1);
}
convert_to_gcda(buffer, tmp->info);
printf("Emitting %6d bytes for %s\n", bytesNeeded, gcov_info_filename(tmp->info));
free(buffer);
tmp = tmp->next;
}
}
Step 3
Finally, I scripted my GDB (driving QEMU remotely) via this:
$ cat coverage.gdb
tar extended-remote :9976
file bin.debug/fputest
b base.c:88 <================= This breaks on the "Emitting" printf in __gcov_exit
commands 1
silent
set $filename = tmp->info->filename
set $dataBegin = buffer
set $dataEnd = buffer + bytesNeeded
eval "dump binary memory %s 0x%lx 0x%lx", $filename, $dataBegin, $dataEnd
c
end
c
quit
And finally, executed both QEMU and GDB - like this:
$ # In terminal 1:
qemu-system-MYPLATFORM ... -kernel bin.debug/fputest -gdb tcp::9976 -S
$ # In terminal 2:
MYPLATFORM-gdb -x coverage.gdb
...and that's it - I was able to generate the .gcda files in my local filesystem, and then see coverage results over gcov and lcov.
UPDATE: I wrote a blog post showing the process in detail.

How to compile libgit2 without HTTPS

I'm trying to compile a no or limited libgit2 static build but haven't yet succeeded into compiling it without openssl.
So far, my best attempt has been following this suite of commands:
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/ -DBUILD_SHARED_LIBS=OFF -DCMAKE_DISABLE_FIND_PACKAGE_OpenSSL=TRUE
make
But I obtain the following:
[ 11%] Building C object CMakeFiles/git2.dir/src/openssl_stream.c.o
/Users/raphael/src/github.com/libgit2/libgit2/src/openssl_stream.c:369:41: warning: unused parameter 'out' [-Wunused-parameter]
int git_openssl_stream_new(git_stream **out, const char *host, const char *port)
^
/Users/raphael/src/github.com/libgit2/libgit2/src/openssl_stream.c:369:58: warning: unused parameter 'host' [-Wunused-parameter]
int git_openssl_stream_new(git_stream **out, const char *host, const char *port)
^
/Users/raphael/src/github.com/libgit2/libgit2/src/openssl_stream.c:369:76: warning: unused parameter 'port' [-Wunused-parameter]
int git_openssl_stream_new(git_stream **out, const char *host, const char *port)
Then:
[ 23%] Building C object CMakeFiles/git2.dir/src/hash/hash_generic.c.o
In file included from /Users/raphael/src/github.com/libgit2/libgit2/src/hash/hash_generic.c:10:
/Users/raphael/src/github.com/libgit2/libgit2/src/hash/hash_generic.h:13:8: error: redefinition of 'git_hash_ctx'
struct git_hash_ctx {
^
/Users/raphael/src/github.com/libgit2/libgit2/src/hash/hash_common_crypto.h:15:8: note: previous definition is here
struct git_hash_ctx {
^
And many others obviously following.
Environment details:
I'm on MacOS X Yosemite, using either Clang or GCC 4.9 and I'm building statically, I tried with tag v0.22.1 and master from Jan 24, 2015.
I'm looking for a process that would be portable to Linux / FreeBSD as well.
The warnings about the openssl stream are irrelevant; the constructor simply returns an error, so it doesn't use any of the parameters passed. It'd be nice to clean the up, but they don't do anything.
As for the redefinition issue, you can find a workaround in PR 2820.