caffe.set_mode_cpu() error in Caffe - cmake

I have built caffe with only cpu support. Is the command 'caffe.set_mode_cpu() ' only used when we have built with gpu support so that we can switch to cpu when needed? I thought I might need it just to make sure that Caffe is using my cpu but I guess the build takes care of that. Also is this command required even when I have built with cpu support only?
Error I get-
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1220 14:26:00.833413 17923 common.cpp:117] Cannot create Cublas handle. Cublas won't be available.
E1220 14:26:00.833684 17923 common.cpp:124] Cannot create Curand generator. Curand won't be available.
E1220 14:26:00.833871 17923 common.cpp:128] Cannot create cuDNN handle. cuDNN won't be available.
F1220 14:26:00.834089 17923 _caffe.cpp:61] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
Aborted (core dumped)
Problem posted on caffe users group
This is my output of 'ccmake ..' . It says that CPU_ONLY is off even after removing the comment on CPU flag. How do I make it build with CPU for sure?
To build Caffe, I used cmake .. instead of make as I got convert_imageset.bin error. So I followed the instructions in the link and I got it to build properly.
Now I was looking at my cmake output and realised that the "CPU_ONLY" option was set to off. So i followed this link where i used "cmake -DCPU_ONLY=ON" to set it ON.
But I'm still getting Cuda error even when cmake option "CPU_ONLY=ON" is there. I am not sure why it is still being built with GPU?
Looking at my cmake output again, I found this error-
CMake Error at CMakeLists.txt:85 (add_dependencies): The dependency target "pycaffe" of target "pytest does not exist.
Is this fine since anyways we have to do make pycaffe to build with python?

Related

Cannot compile PDDL 2.1 Temporal Planner POPF on Ubuntu 20.04.1 LTS

I need a temporal planner that supports durative-actions in PDDL, I was following this youtube guide, but I can't make the popf planner work.
I'm getting this error when making popf:
/home/virginia/Scaricati/popf/src/VALfiles/TimSupport.cpp:1392:36: required from here
/usr/include/c++/9/bits/stl_tree.h:1117:16: error: no type named ‘value_type’ in ‘struct std::iterator_traits<TIM::getConditionally<std::_Rb_tree_const_iterator<TIM::Property*> > >’
1117 | __enable_if_t<!__same_value_type<_InputIterator>::value>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make[2]: *** [src/VALfiles/CMakeFiles/Inst.dir/build.make:154: src/VALfiles/CMakeFiles/Inst.dir/TimSupport.o] Errore 1
make[1]: *** [CMakeFiles/Makefile2:213: src/VALfiles/CMakeFiles/Inst.dir/all] Errore 2
I used these commands:
mkdir build
cd build
cmake path_to_src_folder
make
After the installation process I expected to have the file 'build/popf/popf-clp' as a binary of popf.
Obviously, since I have an error, I don't have it.
I am using Ubuntu 20.04.1 LTS.
I think this is related to the fact that the VAL code is quite old and incompatible with the newer C++ libraries. Try putting it inside a Singularity file (very similar to Docker, but for some performance reasons AI Planning community prefers Singularity) which uses Ubuntu 16.04 as its base image. Then change your planner invocation scripts to run the singularity image instead (which you could then set in VSCode).
Refer to this very similar issue (SMTPlan and POPF use the same VAL code which is giving you problems):
https://github.com/KCL-Planning/SMTPlan/issues/10#issuecomment-660515454
Further down there is a reference to the Singularity file I had used, but you would need to change it to include your POPF compilation steps instead of SMTPlan.
I think I had the exact same error. And finally, I tried this fork version that compiled on "first try" on my Ubuntu 22.04 (after apt installed dependancies)
https://github.com/DaniGarciaLopez/popf

Tensorflow Lite and edgetpu_compiler: Compiling for version 10 gives "Internal compiler error. Aborting!"

I am attempting to compile the code at this Coral example on Colab to run on runtime version 10, since I have a Coral USB Accelerator connected to a customized build for Raspberry Pi Zero W.
The command I'd like to get working is
edgetpu_compiler --min_runtime_version 10 [.TFLITE file]
It always ends with an internal error; unknown to me why that would be...? The error is:
Edge TPU Compiler version 2.1.302470888
Internal compiler error. Aborting!
To reproduce this, you should do the import, preparation, build, and first training steps. No need to fine-tune: results are the same.
I understand that certain operations are not available for lower runtimes, but I am at a loss at what exactly would need to change in the demo so as to compile it successfully.
Does anyone know what might be missing, or otherwise provide guidance?
just got a chance to check this out, looks like there is actually a bug preventing compilation at older runtime version...
This is fixed as I'm able to compile this model with -m 10 form the code base, it'll be fixed for you by next release. For now here is a work around (essentially checking out an older compiler version to compile the model):
$ git clone https://github.com/google-coral/edgetpu.git && cd edgetpu
$ git checkout 657d2b6
$ ./compiler/x86_64/edgetpu_compiler -s -m 10 /path/to/model
This should works, although with an older runtime, many ops weren't supported then so you may not see the performance increase that you would with the current runtime version!

nv-nsight-cu-cli caused Tensorflow to fail

I've downloaded the newest Nsight Compute profiling tool and I want to use it to benchmark Tensorflow applications. The code I'm using is here. It runs perfectly fine when I execute it and when I benchmark it with nvprof ./mnist.py it had no problem at all. However, when I try to run it with command sudo ./nv-nsight-cu-cli [path to the file] I get the following error:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
I suspect that nv-nsight-cu-cli somehow didn't recognized the environment variable at all. Is there any fix around?
You need to search for differences in both environments:
env variables
LD_LIBRARY_PATH
/etc/ld.so.conf
/etc/ld.so.conf.d/*
cuBLAS
Is installation complete/not broken?
Is it installed at the same location on both machines?
Versions
...
You can start with locate libcublas.so on both machines to see if there's a difference. Alternatively, you can strace -f -e open the program to check where it tries to libcublas.so from.
Your error has (for now) nothing to do with GPUs: libcublas.so.9.0 can just not be found. Find it, find why Tensorflow can not find it and your problem will be solved.
It appears that GP100 is not supported by the tool at this moment.
The answer is found here:
Nsight Compute only supports Pascal (other than GP100) and later GPUs.

Blas GEMM launch failed: what does this error mean?

I am having problems executing a simple Tensorflow model that worked well yesterday. I suspect, the problem in its entirety relates to the error given
Blas GEMM launch failed
In the console it says,
tensorflow/core/common_runtime/gpu/gpu_util.cc:343] CPU->GPU Memcpy failed
My impression is that this may relate to my CUDA installation based on this
TensorFlow: Blas GEMM launch failed
however, I can't see how to run the simpleCUBLAS examples. I am completely new to CUDA.
I have 4 1080ti GPUs (Ubuntu 16.04, TensorFlow 1.3.0) and I have not identified any zombie processes taking up GPU memory. Any help is greatly appreciated.
So I found the answer after days of going mad. I first ran this
I did this:
cd /usr/local/cuda/samples/7_CUDALibraries/simpleCUBLAS
make
./simpleCUBLAS
to check my CUBLAS installation. It returned CUBLAS INITIALIZATION FAILED!!!
So next I did this (based on advice)
sudo rm -f ~/.nv
And it worked. Hope this saves someone else. Seems easy when you see it.
The other thing that is worth mentioning is that this problem also threw this error occasionally:
tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
This was cryptic - everybody suggested it was a memory issue and sure enough, my GPUs got hogged by python during the initiation of my TF model. But it was the CUBLAS error that led me to the solution.

Enabling jemalloc in Tensorflow 1.0.1 causes XLA tests to fail

We are using Tensorflow 1.0.1 on Ubuntu 16.04 Linux ppc64le. We've enabled jemalloc and experimental XLA support. While running bazel test //tensorflow/compiler/... to check XLA support on ppc64le, we found that all the tests are being skipped with "NO STATUS" and with below linker error
usr/bin/ld: bazel-out/local-opt/bin/tensorflow/compiler/aot/codegen_test: hidden symbol 'pthread_atfork' in
/usr/lib/powerpc64le-linux-gnu/libpthread_nonshared.a(pthread_atfork.oS) is referenced by DSO
Even I tried running a single test like bazel test //tensorflow/compiler/aot:codegen_test and got the same linker error. Only tfcompile_util_test is passing.
Leaving this error, one weird observation I've is, disabling jemalloc makes most of the XLA tests pass. Approximately 70-80% of the total XLA tests passed for me if we disable jemalloc. Rest of the tests still fail with some seg fault.
Not sure if jemalloc and XLA are related. Could anyone please confirm if they are related and my observation is possible to hold good?
For above linking error, I read that it's a glibc's bug on ppc64le that does not export dynamic version of pthread_atfork which x86's glibc does but by fluke. And the solution is we add -lpthread in the linking options. Somehow, adding -lpthread as a linkopts in any of .bzl file or BUILD file in tensorflow/compiler is not working. -lpthread does not even appear in the linking command. Any pointers on this error will also be helpful. Kindly help us on this problem.
Thanks,
Nishidha
-lpthread should be used in the linking options of jemalloc in jemalloc.BUILD file.