My system has 2 Nvidia GPUs, a GTX 750 Ti which is assigned to the OS for OS graphics and a GTX 1080Ti which is free to be used for Tensorflow. I use the call: tensorflow::graph::SetDefaultDevice("/GPU:1", &graph); to enable GPU1 .
I am running TF 1.10 hand compiled and configured for C++/CMake and Cuda compilation tools, release 9.1, V9.1.85. When I assign GPU 1 to execute my graph I get the following error:
"Invalid argument: Cannot assign a device for operation 'h1_w/read':
Operation was explicitly assigned to /GPU:1 but available devices are
[ /job:localhost/replica:0/task:0/device:CPU:0,
/job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device
specification refers to a valid device. [[{{node h1_w/read}} =
IdentityT=DT_FLOAT, _class=["loc:#h1_w"], _device="/GPU:1"]]"
Related
I was following this tutorial to train a YOLOv4-tiny model to detect custom objects: https://www.youtube.com/watch?v=NTnZgLsk_DA
However, when I attempt to train the model, I get this error message:
CUDA status Error: file: ./src/blas_kernels.cu : () : line: 841 : build time: Jan 7 2022 - 12:01:41
CUDA Error: no kernel image is available for execution on the device
CUDA Error: no kernel image is available for execution on the device: File exists
I was running the code on Colab, not locally. The GPU used for training is Tesla K80.
A common answer is to set the ARCH values to compute_37, code_37, but I've already set them this way and keep getting the same error! So what should I do to get this code running?
Link to my Colab notebook: https://colab.research.google.com/drive/16EQ6I67OOs1I7rF6PHgBHp1eVHMXXvyO#scrollTo=QyMBDkaL-Aep
Any help would be appreciated!
I cannot display the device placement in tensorboard graph. The only device is "unknown device" (see screenshot below).
In the python script, I activated the following option:
tf.debugging.set_log_device_placement(True)
And indeed, the log device placement information is logged correctly in the python output:
Executing op Fill in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarHandleOp in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op VarIsInitializedOp in device /job:localhost/replica:0/task:0/device:GPU:0
...
The tracing is activated as follows:
tf.summary.trace_on(graph=True, profiler=True)
And the graph is displayed correctly in tensorboard but the device name is simply wrong:
OS: Windows 10 x64
Python: 3.8
Tensorflow: 2.2.0
GPU
I got this when using keras with Tensorflow backend:
tensorflow.python.framework.errors_impl.InvalidArgumentError: device CUDA:0 not supported by XLA service
while setting up XLA_GPU_JIT device number 0
Relevant code:
tfconfig = tf.ConfigProto()
tfconfig.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tfconfig.gpu_options.allow_growth = True
K.tensorflow_backend.set_session(tf.Session(config=tfconfig))
tensorflow version: 1.14.0
Chairman Guo's code:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
solved my problem of jupyter notebook kernel crashing at:
tf.keras.models.load_model(path/to/my/model)
The fatal message was:
2020-01-26 11:31:58.727326: F
tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch
value instead of handling error Internal: failed initializing
StreamExecutor for CUDA device ordinal 0: Internal: failed call to
cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
My TF's version is: 2.2.0-dev20200123. There are 2 GPUs on this system.
This could be due to your TF-default (i.e. 1st) GPU is running out of memory. If you have multiple GPUs, divert your Python program to run on other GPUs. In TF (suppose using TF-2.0-rc1), set the following:
# Specify which GPU(s) to use
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Or 2, 3, etc. other than 0
# On CPU/GPU placement
config = tf.compat.v1.ConfigProto(allow_soft_placement=True, log_device_placement=True)
config.gpu_options.allow_growth = True
tf.compat.v1.Session(config=config)
# Note that ConfigProto disappeared in TF-2.0
Suppose, however, your environment have only one GPU, then perhaps you have no choice but ask your buddy to stop his program, then treat him a cup of coffee.
I have installed all necessary software (opencv, tensorflow-gpu, matplotlib, scikit-learn, pandas, keras 2) to run my code and validated them each. I am using Spyder as an IDE and going to train CNN in Keras with Tensorflow backend. I could run my code snippets until I reach training stage:
hist = model.fit(X_train, y_train, batch_size=32, nb_epoch=num_epoch, verbose=1, validation_data=(X_test, y_test))
When I run this line training somewhat starts and instead of displaying the epochs and other attributes (val_acc, training_acc, etc) the kernel suddenly dies, then re-connects to kernel and dies again, etc.
At the end I get this error:
2018 16:25:49.961500: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018 16:25:50.664501: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties:
name: GeForce GT 740 major: 3 minor: 0 memoryClockRate(GHz): 1.0715
pciBusID: 0000:01:00.0
totalMemory: 1.00GiB freeMemory: 756.79MiB
2018 16:25:50.664501: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018 16:25:51.148102: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/device:GPU:0 with 501 MB memory) ‑> physical GPU (device: 0, name: GeForce GT 740, pci bus id: 0000:01:00.0, compute capability: 3.0)
2018 16:27:22.549779: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0
2018 16:27:22.549779: I C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 224 MB memory) ‑> physical GPU (device: 0, name: GeForce GT 740, pci bus id: 0000:01:00.0, compute capability: 3.0)
2018 16:27:43.118021: E C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018 16:27:43.164821: F C:\tf_jenkins\workspace\rel‑win\M\windows‑gpu\PY\35\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream‑>parent()‑>GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
I though it is a Spyder problem and issued on github and received a reply that is not Spyder-related but compatibility problem
I searched the web hoping I could find solution to this, but it seems there is no exact same issue. (at least among I came across)
If there is someone who had the same problem, help me please.
What am I supposed to do?
I had the same issue when l was using the Jupyter notebook, the fix for me was changing the browser l was running the code with.
If using a different IDE (e.g. Jupyter notebook, pycharm) does not work l would recommend running the script from terminal/command prompt.
I'm using Tensorflow on a cluster and I want to tell Tensorflow to run only on one single core (even though there are more available).
Does someone know if this is possible?
To run Tensorflow on one single CPU thread, I use:
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
device_count limits the number of CPUs being used, not the number of cores or threads.
tensorflow/tensorflow/core/protobuf/config.proto says:
message ConfigProto {
// Map from device type name (e.g., "CPU" or "GPU" ) to maximum
// number of devices of that type to use. If a particular device
// type is not found in the map, the system picks an appropriate
// number.
map<string, int32> device_count = 1;
On Linux you can run sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread" to see how many CPUs/cores/threads you have, e.g. the following has 2 CPUs, each of them has 8 cores, each of them has 2 threads, which gives a total of 2*8*2=32 threads:
fra#s:~$ sudo dmidecode -t 4 | egrep -i "Designation|Intel|core|thread"
Socket Designation: CPU1
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Socket Designation: CPU2
Manufacturer: Intel
HTT (Multi-threading)
Version: Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz
Core Count: 8
Core Enabled: 8
Thread Count: 16
Multi-Core
Hardware Thread
Tested with Tensorflow 0.12.1 and 1.0.0 with Ubuntu 14.04.5 LTS x64 and Ubuntu 16.04 LTS x64.
Yes it is possible by thread affinity. Thread affinity allows you to decide which specific thread to be executed by which specific core of the cpu. For thread affinity you can use "taskset" or "numatcl" on linux. You can also use https://man7.org/linux/man-pages/man2/sched_setaffinity.2.html and https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html
The following code will not instruct/direct Tensorflow to run only on one single core.
TensorFlow 1
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)
TensorFlow 2
import os
# reduce number of threads
os.environ['TF_NUM_INTEROP_THREADS'] = '1'
os.environ['TF_NUM_INTRAOP_THREADS'] = '1'
import tensorflow
This will generate in total at least N threads, where N is the number of cpu cores. Most of the time only one thread will be running while others are in sleeping mode.
Sources:
https://github.com/tensorflow/tensorflow/issues/42510
https://github.com/tensorflow/tensorflow/issues/33627
You can restrict the number of devices of a certain type that TensorFlow uses by passing the appropriate device_count in a ConfigProto as the config argument when creating your session. For instance, you can restrict the number of CPU devices as follows :
config = tf.ConfigProto(device_count={'CPU': 1})
sess = tf.Session(config=config)
with sess.as_default():
print(tf.constant(42).eval())