When I follow vulkan-tutorial example code, validation layer doesn't work - vulkan

It's my first time to learnin vulkan. I try to follow https://vulkan-tutorial.com/Drawing_a_triangle/Setup/Validation_layers, but it doesn't work in using validation layer. Copy and paste this code(https://vulkan-tutorial.com/code/02_validation_layers.cpp), but it throw this runtime error.
validation layers requested, but not available!
System PATH for vulkan(VK_SDK_PATH, VULKAN_SDK, and Path) set correctly. Could anyone give me some helps for me?
p.s. I tested checkValidationLayerSupport() function to know what layerCount value is. It was revealed 1. Is it right?
Environment
OS: window(64-bits)
Vulkan SDK: 1.3.216.0(LunarG SDK)
GPU: GeForce GTX TITAN X (Discrete GPU) with Vulkan 1.2.142

Related

How do I train deep learning neural network that contains embedding layer using GPU?

I'm getting a InvalidArgumentError on my embedding layer:
Colocation Debug Info:
Colocation group had the following types and supported devices:
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
GatherV2: GPU CPU
Cast: GPU CPU
Const: GPU CPU
ResourceSparseApplyAdagradV2: CPU
_Arg: GPU CPU
ReadVariableOp: GPU CPU
Colocation members, user-requested devices, and framework assigned devices, if any:
model_6_user_embedding_embedding_lookup_readvariableop_resource (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
adagrad_adagrad_update_1_update_0_resourcesparseapplyadagradv2_accum (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
model_6/User-Embedding/embedding_lookup/ReadVariableOp (ReadVariableOp)
model_6/User-Embedding/embedding_lookup/axis (Const)
model_6/User-Embedding/embedding_lookup (GatherV2)
gradient_tape/model_6/User-Embedding/embedding_lookup/Shape (Const)
gradient_tape/model_6/User-Embedding/embedding_lookup/Cast (Cast)
Adagrad/Adagrad/update_1/update_0/ResourceSparseApplyAdagradV2 (ResourceSparseApplyAdagradV2) /job:localhost/replica:0/task:0/device:GPU:0
[[{{node model_6/User-Embedding/embedding_lookup/ReadVariableOp}}]] [Op:__inference_train_function_2997]
Link to google colab:
https://colab.research.google.com/drive/1ZN1HzSTTfvA_zstuI-EsKjw7Max1f73v?usp=sharing
It's a really simple neural network, and data is available to download from Kaggle - you could just drag and drop into colabs to get it working.
I've also tried to set soft device placement = True
tf.config.set_soft_device_placement(True) but that doesn't seem to have worked.
From the error log, it looks like MirroredStrategy has assigned the Embedding lookup operation to GPU (which is GPU incompatible and I can see why), and I was hoping that tf.config.set_soft_device_placement(True) would have asked Tensorflow to use CPU instead but it feels like that's ignored.
Has anyone seen this problem before and know of a workaround?
Found a similar issue for TF1.14:
https://github.com/tensorflow/tensorflow/issues/31318
Looks like MirroredStrategy can't support training embedding layers using momentum-based optimisers.
Cloning the above notebook and using RMSprop (with momentum=0) seemed to work:
https://colab.research.google.com/drive/13MXa8Q96M6uzlkK3K_M7vmQfclL59eRj?usp=sharing
I'll use RMSProp with no momentum for now until this issue is fixed. The error message certainly hasn't helped!

I am sometimes getting out of memory while training a model

Tensorflow Version : TF1.13
Using Anaconda
I studied in stack overflow to set :
TF_CUDNN_WORKSPACE_LIMIT_IN_MB = 100
To reduce the scratch space for tensorflow which is 4GB by default
GPU : NVIDIA GTX 1660 TI 6GB card
CUDA Version : 11.0
But I do not know how to set the environment variable.
I couldn't find any tutorial on it. Since I am a beginner with this can anyone give any links or tell how to set this variable?? It would be really helpful for me.
You can use the OS module in python as follows:
import os
os.environ['TF_CUDNN_WORKSPACE_LIMIT_IN_MB'] = '100'
And by the way, 100MiB is way much less for a model to be trained. At least allocate it 1GiB.

How to get the exact GPU memory usage for Keras

I recently started learning Keras and TensorFlow. I am testing out a few models currently on the MNIST dataset (pretty basic stuff). I wanted to know, exactly how much my model is consuming memory-wise, during training and inference. I tried googling but did not find much info.
I came across Nvidia-smi. I tried using config.gpu_options.allow_growth = True option but still am not able to use the exact memory python.exe is consuming due to some issues with Nvidia-smi. I know that I could run a separate pass of train and inference, but this is too cumbersome. It is very easy if I could just find the right API to do the job.
Tensorflow being such a well known and well-used library, I am hoping to find a better and faster way to get to these numbers.
Finally, once again my question is:
How to get the exact memory usage for a Keras model during training and inference.
Relevant specs:
OS: Windows 10
GPU: GTX 1050
TensorFlow version: 1.14
Please let me know if any other details are required.
Thanks!

Tensorflow behaves differently between GPU and CPU

I modified the Tensorflow tutorial example for beginners by adding a hidden layer. The recognition rate obtained by running on GPU is ~95%. But when running in cpu-only mode, I can get ~40%. Does anybody know why the same python code behaves so differently? Thanks.
Weiguang

no supported kernel for GPU devices is available for SparseTensorDenseMatMul_grad

I meet a issue when building a model with tf.sparse_tensor_dense_matmul op in my graph. Part of the error info pasted as below,
Does that mean there is no GPU kernel support to compute the gradient of "SparseTensorDenseMatMul_grad"? I can build the model successfully with "allow_soft_placement=Ture" in the session config. However, I need all the computation keep on GPU for some special reason. Does anyone know how to fixed this issue? Or I need to implement the CUDA kernel of this op by myself? Thanks a lot.
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
[[Node: gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1 = Slice[Index=DT_INT32, T=DT_INT64, _device="/device:GPU:0"](Placeholder_2, gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1/begin, gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1/size)]]
Caused by op u'gradients/softmax_linear/SparseTensorDenseMatMul/SparseTensorDenseMatMul_grad/Slice_1', defined at: