CuDNN crashing under valgrind - valgrind

My program works fine on my standard Ubuntu x64 box, but if I run under valgrind I see the following error:
==22246== Conditional jump or move depends on uninitialised value(s)
==22246== at 0x9854DCBC: ??? (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x98182940: cudnnGetConvolutionBackwardFilterWorkspaceSize (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x982C3787: ??? (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x9817FC0F: cudnnGetConvolutionBackwardFilterAlgorithm (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x903F4908: caffe::CuDNNConvolutionLayer<float>::Reshape(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) (cudnn_conv_layer.cpp:149)
==22246== by 0x904F35D8: SetUp (layer.hpp:72)
==22246== by 0x904F35D8: caffe::Net<float>::Init(caffe::NetParameter const&) (net.cpp:148)
==22246== by 0x904F523F: caffe::Net<float>::Net(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, caffe::Phase, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*, caffe::Net<float> const*) (net.cpp:45)
--from here is my own code
Unfortunately, although my code is pretty much a bog-standard interface to caffe, the model is a complex proprietary model I cannot share here. Furthermore, CuDNN is closed-source, so I cannot debug that to see if this is a problem worth bothering about.
Googling cudnnGetConvolutionBackwardFilterWorkspaceSize valgrind and cudnnGetConvolutionBackwardFilterAlgorithm valgrind turns up nothing useful except a hint to add --track-origins=yes, but when I add that the error goes away...
The problem I am actually trying to solve is that the Deep Learning module crashes in the standard library on the target platform with a call to freeing already-freed memory. However, the target is an ARM-based device that I cannot get access to for further investigation.

Related

C++ standard library symbols bound at runtime to other libraries

I have a C++ application built with GCC 8.3 on RHEL 6 and linking with a bunch of internal and external shared libraries.
I am trying to understand how the loader binds my application symbols at runtime.
What I have observed and I cannot understand is why some symbols from the libstdc++.so get mapped to my application and shared libraries:
LD_DEBUG=bindings ldd -r main
[...]
binding file /usr/lib64/libstdc++.so.6 [0] to ./libshared01.so [0]: normal symbol `std::__detail::_Prime_rehash_policy::_M_next_bkt(unsigned long) const' [GLIBCXX_3.4.18]
binding file /usr/lib64/libstdc++.so.6 [0] to ./libshared02.so [0]: normal symbol `std::_Hash_bytes(void const*, unsigned long, unsigned long)' [CXXABI_1.3.5]
binding file /usr/lib64/libstdc++.so.6 [0] to ./libshared03.so [0]: normal symbol `char* std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_construct<__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >(__gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<char> const&, std::forward_iterator_tag)' [GLIBCXX_3.4.14]
binding file /usr/lib64/libstdc++.so.6 [0] to ./libshared03.so [0]: normal symbol `char* std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag)' [GLIBCXX_3.4.14]
binding file /usr/lib64/libstdc++.so.6 [0] to ./libshared04.so [0]: normal symbol `std::__future_base::_Async_state_common::~_Async_state_common()' [GLIBCXX_3.4.17]
binding file /usr/lib64/libstdc++.so.6 [0] to ./main [0]: normal symbol `std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >::~basic_stringbuf()' [GLIBCXX_3.4]
Not all standard symbols are bound outside of the libstdc++.so but only a few, all the others are mapped as I would expect:
binding file /usr/lib64/libstdc++.so.6 [0] to /usr/lib64/libstdc++.so.6 [0]: normal symbol `std::terminate()' [GLIBCXX_3.4]
binding file /usr/lib64/libstdc++.so.6 [0] to /usr/lib64/libstdc++.so.6 [0]: normal symbol `std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<double>(double)' [GLIBCXX_3.4.9]
binding file /usr/lib64/libstdc++.so.6 [0] to /usr/lib64/libstdc++.so.6 [0]: normal symbol `std::basic_ostream<wchar_t, std::char_traits<wchar_t> >::sentry::~sentry()' [GLIBCXX_3.4]
I am not using any visibility flags from GCC or visibility attributes in my code.
However I was assuming that all these symbols clearly identified as standard ones would be mapped to the libstdc++.so by default.
My underlying issue is that my application behavior/performances seems to be therefore randomly dependent on a symbol mapping process that I don't control. If one of my external dependency is highly optimized and all the standard string symbols of my application get suddenly picked from this external library it feels like a problem.
Can someone shed some light on this behavior? Is it expected and documented?
However I was assuming that all these symbols clearly identified as standard ones would be mapped to the libstdc++.so by default.
Why? Many of the symbols that you are looking at are templates, and are instantiated by the compiler when you build. They may not exist in libstdc++. [ Some specializations, like (say) std::basic_string<char, std::char_traits<char>, std::allocator<char>>] maybe be instantiated in the dylib to save space in your app; but certainly not all of them.
You can get a list of symbols exported from libstdc++.so using nm -g (if I remember correctly). Some versions of nm have a -demangle flag as well. You will probably be surprised by the contents of that list.

ROOT Library error

So I installed Root 6.12.04 after I (unthinkingly) updated my OS to MacOS High Sierra. I cloned the git repository and followed all the steps on the "Quick Start" build root page. However something went wrong with my build and this is the error I get when I try to start a new instance of root and run Detector test:
dyld: lazy symbol binding failed: Symbol not found: __ZN5TROOT14RegisterModuleEPKcPS1_S2_S1_S1_PFvvERKNSt3__16vectorINS5_4pairINS5_12basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEiEENSB_ISE_EEEES2_
Referenced from: /Users/MM/repos/nsd-rootscripts/compile/NSDRootScriptsLib.so
Expected in: /Users/MM/cern/root-build/lib/libCore.so
dyld: Symbol not found: __ZN5TROOT14RegisterModuleEPKcPS1_S2_S1_S1_PFvvERKNSt3__16vectorINS5_4pairINS5_12basic_stringIcNS5_11char_traitsIcEENS5_9allocatorIcEEEEiEENSB_ISE_EEEES2_
Referenced from: /Users/MM/repos/nsd-rootscripts/compile/NSDRootScriptsLib.so
Expected in: /Users/MM/cern/root-build/lib/libCore.so
Does anyone know where the build went wrong?
This means that ROOT's dictionary interface has changed, but your dictionary sources (here: those in NSDRootScriptsLib.so) didn't get updated. Please re-generate the dictionaries.
This is not a solution, but may start you on the right track.
If you demangle the symbol name you posted with c++filt, you get
TROOT::RegisterModule(
char const*,
char const**,
char const**,
char const*,
char const*,
void (*)(),
std::__1::vector<
std::__1::pair<
std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >,
int
>,
std::__1::allocator<
std::__1::pair<
std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >,
int
>
>
> const&,
char const**
)
So, the problem is that your linker can't find the definition of this TROOT::RegisterModule() function.
You said the problem was with libCore.so. Looking at my libCore.so with
objdump -x libCore.so | grep RegisterModule | c++filt
I can see that I have an identical symbol defined. If this command doesn't print anything for you, then your compilation must have gone wrong and didn't compile this function into the library.
Unfortunately, this is as far as I can get you.

Can't link to static library in Xcode workspace

I have an Xcode workspace containing two projects, JDPlayer and JDComposer. JDComposer compiles to a static library, which JDPlayer needs to link to.
JDComposer builds fine, and libJDComposer shows up in the build products directory. JDPlayer then compiles fine, too - but when it comes to trying to link to libJDComposer.a, there are a bunch of problems, all along these lines:
Undefined symbols for architecture x86_64:
"JDComposer::getSyncTypes(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&)", referenced from:
JDPlayer::getSyncTypes(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&) in JDPlayer.o
I have added libJDComposer.a to JDPlayer->Target->Build Phases->Link Binary With Libraries. I have also added the path of libJDComposer.a to JDPlayer->Build Settings->Library Search Paths.
It's worth noting that libJDComposer.a has 2 targets - libJDComposeriOS and libJDComposerOSX. I am trying to link to libJDComposerOSX, since JDPlayer is an OSX project.
Does anyone have any idea what might be going wrong?

Cmake flags for debugging don't seem to be useful in valgrind?

Ok, so I have this Qt application I'm attempting to debug;
upon running valgrind on it and redirecting output to a file, I see many some 'definitely lost' blocks that look something like this, which make me sad:
==24357== 24 bytes in 1 blocks are definitely lost in loss record 150 of 508
==24357== at 0x4C2C56F: malloc (vg_replace_malloc.c:267)
==24357== by 0x76ED3CA: FcPatternCreate (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)
==24357== by 0x76EB3CD: FcFontRenderPrepare (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)
==24357== by 0x76EB66C: FcFontMatch (in /usr/lib/x86_64-linux-gnu/libfontconfig.so.1.4.4)
==24357== by 0x57163D7: QFontDatabase::load(QFontPrivate const*, int) (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x56F3586: QFontPrivate::engineForScript(int) const (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x5728482: ??? (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x573B73D: QTextLine::layout_helper(int) (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x573D5A4: QTextLayout::endLayout() (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x58F33CE: QLineControl::updateDisplayText(bool) (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x58F36C6: QLineControl::init(QString const&) (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
==24357== by 0x58EC720: ??? (in /usr/lib/x86_64-linux-gnu/libQtGui.so.4.8.1)
I'm not very good with valgrind, but as far as I can tell, this trace doesn't come back to my source files, right? In fact, nowhere in the full valgrind report (with the -v switch) do my source files appear, except for in main() where I declare the QApplication.
Then can I assume I'm not compiling my project with CMake correctly? Hopefully that's the problem, because the valgrind report doesn't seem too helpful to me right now..
Now then, in my CmakeLists.txt, I'm (attempting) to compile the project with debug flags like so:
set(CMAKE_CXX_FLAGS_DEBUG "-g3 -ggdb -O0")
is this a proper way of doing this?
Am I doing something wrong here?
Thanks, and sorry for such a long question! :/
The usual procedure is to set CMAKE_BUILD_TYPE variable to Debug, Release, or etc. during configuration stage. This can be achieved by using -D flag for command-line cmake tool, or by modifying appropriate field in GUI.
If you wish to pass extra flags to the compiler, just set CMAKE_CXX_FLAGS the same way as you set CMAKE_BUILD_TYPE.
As you see, this doesn't invovles modifying any of CMakeLists.txt files, but CMakeCache.txt in your build dir.

g++ compile multiple files

I'got a problem with compiling linking of a program with multiple files by g++ (I usually use vstudio, but...).
If I use only main.cpp (and include appropriate header files for openCV), everything is ok with
g++ main.cpp -o main -I"C:\OpenCV2.1\include\opencv" -L"C:\OpenCV2.1\lib"
-lcxcore210 -lcv210 -lhighgui210
If I have main.cpp and some otherfile.cpp (both need openCV) and use
g++ main.cpp otherfile.cpp -o main -I"C:\OpenCV2.1\include\opencv"
-L"C:\OpenCV2.1\lib" -lcxcore210 -lcv210 -lhighgui210
it simply doesn't work and I got
c:/mingw/bin/../lib/gcc/mingw32/4.5.0/../../../../mingw32/bin/ld.exe: warning: a
uto-importing has been activated without --enable-auto-import specified on the c
ommand line.
This should work unless it involves constant data structures referencing symbols
from auto-imported DLLs.
C:\Users\ONDEJM~1\AppData\Local\Temp\ccNisCoC.o:main.cpp:(.text+0x16d0): undefin
ed reference to `cv::Mat::Mat(_IplImage const*, bool)'
C:\Users\ONDEJM~1\AppData\Local\Temp\ccNisCoC.o:main.cpp:(.text+0x16f1): undefin
ed reference to `cv::FAST(cv::Mat const&, std::vector<cv::KeyPoint, std::allocat
or<cv::KeyPoint> >&, int, bool)'
C:\Users\ONDEJM~1\AppData\Local\Temp\ccNisCoC.o:main.cpp:(.text$_ZN2cv3Mat7relea
seEv[cv::Mat::release()]+0x3f): undefined reference to `cv::fastFree(void*)'
collect2: ld returned 1 exit status
What am I doing wrong?
Pass "-Wl,--enable-auto-import" to g++. Read ld's documentation about this.
Hm, it seems that I made a silly mistake... solution is simple: just recompile all openCV binarries by g++ and everything will be ok!