Tensorflow: device CUDA:0 not supported by XLA service while setting up XLA_GPU_JIT device number 0 - tensorflow

I got this when using keras with Tensorflow backend:
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Relevant code:
tfconfig = tf.ConfigProto()
tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tfconfig.gpu_options.allow_growth = True
K.tensorflow_backend.set_session(tf.Session(config=tfconfig))
tensorflow version: 1.14.0

Chairman Guo's code:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
solved my problem of jupyter notebook kernel crashing at:
tf.keras.models.load_model(path/to/my/model)
The fatal message was:
2020-01-26 11:31:58.727326: F
tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch
value instead of handling error Internal: failed initializing
StreamExecutor for CUDA device ordinal 0: Internal: failed call to
cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
My TF's version is: 2.2.0-dev20200123. There are 2 GPUs on this system.

This could be due to your TF-default (i.e. 1st) GPU is running out of memory. If you have multiple GPUs, divert your Python program to run on other GPUs. In TF (suppose using TF-2.0-rc1), set the following:
# Specify which GPU(s) to use
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Or 2, 3, etc. other than 0
# On CPU/GPU placement
config = tf.compat.v1.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True
tf.compat.v1.Session(config=config)
# Note that ConfigProto disappeared in TF-2.0
Suppose, however, your environment have only one GPU, then perhaps you have no choice but ask your buddy to stop his program, then treat him a cup of coffee.

Related

Time consuming Tensorflow's CUDA driver check in AWS Lambda

I've been running an AWS Lambda and mounted an EFS, where I've installed Tensorflow 2.4. When I try to run the Lambda (and every Lambda that uses Tensorflow 2.4) it wastes a lot of time (about 4 minutes, or maybe more sometimes) on some Tensorflow's settings check. So I need to set a very wide timeout to overcome this issue.
These are the prints that the Lambda produces:
2022-05-17 06:33:21.917336: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:21.921992: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
2022-05-17 06:33:21.922025: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-17 06:33:21.922048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (169.254.137.137): /proc/driver/nvidia/version does not exist
2022-05-17 06:33:21.922460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:22.339905: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 06:33:22.340468: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500010000 Hz
[WARNING] 2022-05-17T06:33:22.436Z c4500036-5b77-4808-a062-f8ae820b0317 AutoGraph could not transform <function Model.make_predict_function..predict_function at 0x7f65bfb37280> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: unsupported operand type(s) for -: 'NoneType' and 'int'
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
What I need is to overcome this waste of time, and run a clean elaboration.

CUDA Error when training YOLOv4-tiny on Colab: no kernel image is available for execution on the device

I was following this tutorial to train a YOLOv4-tiny model to detect custom objects: https://www.youtube.com/watch?v=NTnZgLsk_DA
However, when I attempt to train the model, I get this error message:
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jan 7 2022 - 12:01:41
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
I was running the code on Colab, not locally. The GPU used for training is Tesla K80.
A common answer is to set the ARCH values to compute_37, code_37, but I've already set them this way and keep getting the same error! So what should I do to get this code running?
Link to my Colab notebook: https://colab.research.google.com/drive/16EQ6I67OOs1I7rF6PHgBHp1eVHMXXvyO#scrollTo=QyMBDkaL-Aep
Any help would be appreciated!

Tensorflow2 Unknown device in Tensorboard

I cannot display the device placement in tensorboard graph. The only device is "unknown device" (see screenshot below).
In the python script, I activated the following option:
tf.debugging.set_log_device_placement(True)
And indeed, the log device placement information is logged correctly in the python output:
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarIsInitializedOp in device /job:localhost/replica:0/task:0/device:GPU:0
...
The tracing is activated as follows:
tf.summary.trace_on(graph=True, profiler=True)
And the graph is displayed correctly in tensorboard but the device name is simply wrong:
OS: Windows 10 x64
Python: 3.8
Tensorflow: 2.2.0
GPU

Operation was explicitly assigned to /GPU:1 but available devices are

My system has 2 Nvidia GPUs, a GTX 750 Ti which is assigned to the OS for OS graphics and a GTX 1080Ti which is free to be used for Tensorflow. I use the call: tensorflow::graph::SetDefaultDevice("/GPU:1", &graph); to enable GPU1 .
I am running TF 1.10 hand compiled and configured for C++/CMake and Cuda compilation tools, release 9.1, V9.1.85. When I assign GPU 1 to execute my graph I get the following error:
"Invalid argument: Cannot assign a device for operation 'h1_w/read':
Operation was explicitly assigned to /GPU:1 but available devices are
[ /job:localhost/replica:0/task:0/device:CPU:0,
/job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device
specification refers to a valid device. [[{{node h1_w/read}} =
IdentityT=DT_FLOAT, _class=["loc:#h1_w"], _device="/GPU:1"]]"

How can I run Tensorflow on one single core?

I'm using Tensorflow on a cluster and I want to tell Tensorflow to run only on one single core (even though there are more available).
Does someone know if this is possible?
To run Tensorflow on one single CPU thread, I use:
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
device_count limits the number of CPUs being used, not the number of cores or threads.
tensorflow/tensorflow/core/protobuf/config.proto says:
message ConfigProto {
// Map from device type name (e.g., "CPU" or "GPU" ) to maximum
// number of devices of that type to use. If a particular device
// type is not found in the map, the system picks an appropriate
// number.
map<string, int32> device_count = 1;
On Linux you can run sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread" to see how many CPUs/cores/threads you have, e.g. the following has 2 CPUs, each of them has 8 cores, each of them has 2 threads, which gives a total of 2*8*2=32 threads:
fra#s:~$ sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread"
Socket Designation: CPU1
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Socket Designation: CPU2
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Tested with Tensorflow 0.12.1 and 1.0.0 with Ubuntu 14.04.5 LTS x64 and Ubuntu 16.04 LTS x64.
Yes it is possible by thread affinity. Thread affinity allows you to decide which specific thread to be executed by which specific core of the cpu. For thread affinity you can use "taskset" or "numatcl" on linux. You can also use https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html and https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
The following code will not instruct/direct Tensorflow to run only on one single core.
TensorFlow 1
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
TensorFlow 2
import os
# reduce number of threads
os.environ['TF_NUM_INTEROP_THREADS'] = '1'
os.environ['TF_NUM_INTRAOP_THREADS'] = '1'
import tensorflow
This will generate in total at least N threads, where N is the number of cpu cores. Most of the time only one thread will be running while others are in sleeping mode.
Sources:
https://github.com/tensorflow/tensorflow/issues/42510
https://github.com/tensorflow/tensorflow/issues/33627
You can restrict the number of devices of a certain type that TensorFlow uses by passing the appropriate device_count in a ConfigProto as the config argument when creating your session. For instance, you can restrict the number of CPU devices as follows :
config = tf.ConfigProto(device_count={'CPU': 1})
sess = tf.Session(config=config)
with sess.as_default():
print(tf.constant(42).eval())