How can I know which operation can not be placed on GPU in tensorflow? - tensorflow

How can I know which operation can not be placed on GPU in tensorflow? Is there a place that I can check?
Thanks

You can check kernels (i.e. implementations on devices) for ops which are located at this directory: https://github.com/tensorflow/tensorflow/tree/r0.11/tensorflow/core/kernels/
For example, suppose you would like to know whether softmax can be placed on GPU. You can navigate to the kernel of softmax: https://github.com/tensorflow/tensorflow/blob/r0.11/tensorflow/core/kernels/softmax_op.cc. You will find the following code:
REGISTER_KERNEL_BUILDER(
Name("Softmax").Device(DEVICE_GPU).TypeConstraint<Eigen::half>("T"),
SoftmaxOp<GPUDevice, Eigen::half>);
This means there is a kernel for softmax on GPU with type float16. The prerequisite is that you have to build your tensorflow with GPU enabled.

Related

How do I train deep learning neural network that contains embedding layer using GPU?

I'm getting a InvalidArgumentError on my embedding layer:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU
Cast: GPU CPU
Const: GPU CPU
ResourceSparseApplyAdagradV2: CPU
_Arg: GPU CPU
ReadVariableOp: GPU CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
model_6_user_embedding_embedding_lookup_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
adagrad_adagrad_update_1_update_0_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
model_6/User-Embedding/embedding_lookup/ReadVariableOp (ReadVariableOp)
model_6/User-Embedding/embedding_lookup/axis (Const)
model_6/User-Embedding/embedding_lookup (GatherV2)
gradient_tape/model_6/User-Embedding/embedding_lookup/Shape (Const)
gradient_tape/model_6/User-Embedding/embedding_lookup/Cast (Cast)
Adagrad/Adagrad/update_1/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0
[[{{node model_6/User-Embedding/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_2997]
Link to google colab:
https://colab.research.google.com/drive/1ZN1HzSTTfvA_zstuI-EsKjw7Max1f73v?usp=sharing
It's a really simple neural network, and data is available to download from Kaggle - you could just drag and drop into colabs to get it working.
I've also tried to set soft device placement = True
tf.config.set_soft_device_placement(True) but that doesn't seem to have worked.
From the error log, it looks like MirroredStrategy has assigned the Embedding lookup operation to GPU (which is GPU incompatible and I can see why), and I was hoping that tf.config.set_soft_device_placement(True) would have asked Tensorflow to use CPU instead but it feels like that's ignored.
Has anyone seen this problem before and know of a workaround?
Found a similar issue for TF1.14:
https://github.com/tensorflow/tensorflow/issues/31318
Looks like MirroredStrategy can't support training embedding layers using momentum-based optimisers.
Cloning the above notebook and using RMSprop (with momentum=0) seemed to work:
https://colab.research.google.com/drive/13MXa8Q96M6uzlkK3K_M7vmQfclL59eRj?usp=sharing
I'll use RMSProp with no momentum for now until this issue is fixed. The error message certainly hasn't helped!

Is it possible to use CuDNNLSTM with Google Colab's TPU?

I am able to do this with their GPU, but with their TPU it retrieves me an error...
Does anybody around here know what I'm missing, please?
Does it make sense to actually use the TPU with CuDNNLSTM? Or is CuDNNLSTM just tailored for GPU?
Thanks a lot in advance.
keras.layers.CuDNNLSTM is only supported on GPUs. But in Tensorflow 2 built-in LSTM and GRU layers have been updated to leverage CuDNN kernels by default when a GPU is available.
Below is the details from Performance optimization and CuDNN kernels:
In TensorFlow 2.0, the built-in LSTM and GRU layers have been updated to leverage CuDNN kernels by default when a GPU is available.
With this change, the prior keras.layers.CuDNNLSTM/CuDNNGRU layers have been deprecated, and you can build your model without worrying about the hardware it will run on.
You can just use the built-in LSTM layer: tf.keras.layers.LSTM and it will work on both TPUs and GPUs.

tensorflow.nn.conv2d - input / kernel matmul

I am new to tensorflow source code modifications.
I would like to try a variation of conv2d algorithm and want to know where exactly are the input / kernel matmul is implemented in the source code, and how to rebuild tensorflow with the modifications.
You can read this tutorial on how to add a new operation to TensorFlow. You will learn how to find the implementations of ops (i.e. kernels from there). There are clear instructions on compiling tensorflow from source on its github page.
For Conv2D, the operation is defined in core/ops/nn_ops.cc. It has many kernels (for CPU, for GPU, for XLA, MKL based, Eigen based). You can see them if you do: tensorflow/core/kernels$ ls | grep conv

How to choose to use GPU with tf.device() in tensorflow

I run tensorflow example from models/cifar10/cifar10_train.py, in the example script, they said it will use one GPU, but I did find they define to use GPU like with tf.device('/gpu:0'):
and they define store some intermediate variable with CPU, so I wander how does tensorflow handle the GPU and CPU? To use GPU is the default choice???
Here is the link for the example:
https://www.tensorflow.org/versions/r0.10/tutorials/deep_cnn/index.html
Any advice will be appreciated,
Thanks in advance
Good day

no supported kernel for GPU devices is available for SparseTensorDenseMatMul_grad

I meet a issue when building a model with tf.sparse_tensor_dense_matmul op in my graph. Part of the error info pasted as below,
Does that mean there is no GPU kernel support to compute the gradient of "SparseTensorDenseMatMul_grad"? I can build the model successfully with "allow_soft_placement=Ture" in the session config. However, I need all the computation keep on GPU for some special reason. Does anyone know how to fixed this issue? Or I need to implement the CUDA kernel of this op by myself? Thanks a lot.
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1 = Slice[Index=DT_INT32, T=DT_INT64, _device="/device:GPU:0"](Placeholder_2, gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1/begin, gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1/size)]]
Caused by op u'gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1', defined at: