Unsupported operation workaround - tensorflow2.0

I have a custom layer which uses tf.py_function with some python code. Since this isn't supported by Cloud TPU, is it possible to place the computation for this layer on a CPU or GPU device? Or is the only solution to rewrite the function?

This use case is not currently supported.

Related

How to use TPU in TensorFlow custom training loop?

I don't use Keras. And I want to use TPUs on Google Colab. Questions:
Can tf.session automatically use TPUs?
What do tf.contrib.tpu.TPUDistributionStrategy, tf.contrib.tpu.rewrite, tf.contrib.cluster_resolver.TPUClusterResolver do in TPU computing? Are they all necessary?
Which Tensorflow version are you running? Currently the firmware for TPUs on Google Colab only support 1.14 (I may be wrong with the exact version, but it's definitely 1.x), however if you are using TF2.0, there is TPU support for nightly-2.x on GCP, so perhaps you can give that a try!
Note that in 2.0, you would want to get rid of any "sessions" because that is no longer a thing. Check out the TPUStrategy docs here for more information: https://www.tensorflow.org/guide/distributed_training#tpustrategy

Is there a way to set per_process_gpu_memory_fraction tensorflow.js (.JS!)?

Knowing that you can set the max. memory amount for GPUs for Tensorflow,
I wonder how can I prevent tensorflow**.js** from using all my gpu ram. I didn't found anything in the API documentation.
Not yet. Currently you can only select to register the tensorflow backend by using tf.setBackend which will use all the available gpu.
use tensorflow backend by using
tf.setBackend('tensorflow')
use webgl or cpu in the brower by using
tf.setBackend('cpu')
tf.setBackend('webgl') // if using tfjs with the browser

Specify Keras GPU without using CUDA_VISIBLE_DEVICES

I have a system with two GPUs, and am using Keras with Tensorflow backend. Gpu:0 is being allocated to PyCUDA, which is performing a unique operation which is fed forward to Keras, and changes with each batch. As such, I would like to run a Keras model on gpu:1 while leaving gpu:0 allocated to PyCUDA.
Is there any way to do this? Looking through prior threads I've found several depreciated solutions.
So I don't think that this feature is meaningfully implemented in Keras currently. Found a workaround that I recommend whereby you just create multiple processes using Python's default multiprocessing library.
Note: Currently for this setup you need to spawn the new process, rather than fork it, to avoid a weird interaction with one of the PyCUDA backend libraries.

Is there any way to fuse fully connected layer(gemm) and activation layer(relu/sigmoid) on gpu in dnn?

Usually one layer in dnn consists of MatMul, BiasAdd, Relu, cuBlas provides Gemm for MatMul, and we can do BiasAdd and Relu in another kernel for GPU. They are two GPU lanuch calls, is there any way to fuse them all togather and make them just one? I looked into cuBlas, cudnn, but not found anything. I think it's not difficult because BiasAdd and Relu are just element-wise operaions, and fusion makes it more efficient.
Here is the backgroud:
I am working on a online prediction service which is multi dnn model ensemble. By profiling my program, I found out that both my CPU and GPU is not fully utilized, but requests blocks on GPU-related function call (like lanuchKernel). It seems like there's a big lock in libcuda. I am using tensorflow, XLA enabled, so I use nvprof and tensorflow HLO to visialize GPU-call, and there'are only dot and fused(which is biasadd and relu) operations. Although kernel fusion is done, there're still too many lanuchKernel calls, and GPU utilization is only 60%. I tried multi cuda context in one process, the improvement is trivial.
By the way, I am using one single GPU, Tesla P100.

Is There a List of TensorFlow Ops that Are Supported for CPU?

I am trying to run a model which had been written for GPU on a CPU, and have discovered that the tf.nn.bias_add function does not support a data_format attribute of "NCHW" when executing on CPU, it only supports "NHWC".
Is there a list of which operations, like this one, are restricted to GPU? I haven't been able to find one yet.
No, there is no such list and sadly documentation does not tell these details either.
There was an attempt to ask for documentation improvement here, but it does not look like it was implemented.