Load image array from GPU memory into keras/tensorflow without numpy - tensorflow

I have trained model in keras for image classification. Now, I am using my own encoder-decoder (not using OpenCV) for image processing and streaming using GPU. This encoder-decoder code is written in c++. I am able to transfer the image array generated using encoder-decoder to python using numpy and ctypes, and using that numpy array for image classification. This process induces huge overhead as it involves copying data from GPU to numpy array which uses CPU and again copying numpy array to GPU for image classification.
How can I directly use the GPU image array into the keras for inference? I have the address of the GPU image array through c++ code generated by encoder-decoder.

Related

Image decoded/encoded/decoded in Tensorflow is not the same as the original

I store images in tfrecord files for training an image classification model with Tensorflow 2.10. TFrecords are read in a dataset on which I apply the fit() function. After training I'm making an inference with
image from dataset
same image read from disk
I notice that the predictions are not the same because in the first case, image encompasses (in the process of writing then reading the tfrecord file to build the dataset) a decode/encode/decode transformation (TF functions tf.io.decode_jpeg and tf.io.encode_jpeg) that is not symmetrical: image after transformation is not the same as original image (even if I encode with quality=100).
It can make a difference: in the first case, the good class is not in the top-3. In the second case, yes.
Is there any way to avoid this asymmetrical behavior?

Default Data + Tensor location with Tensorflow2

By default, are Tensors as well Keras Datasets (once loaded) placed on the GPU when using Tensorflow 2? Or CPU?
If you are referring to tf.data, it basically uses to RAM load data and based on the tf.data pipeline, it uses both CPU and GPU.
Below image will give you clear idea about the same.

parsing rendered image from OpenGL directly to tensorflow to train / analyse

is it possible to parse images from OpenGL directly to tensorflow without transmitting them from GPU (framebuffer or any other OpenGL buffer) to CPU and back to tensorflow (GPU) again?
We would like to train a network by generated / rendered data and get a decision made by the neuronal network back. This decision changes the position / orientation of the next rendered image which is the input for the network again.
This loop should be first used to train the tensorflow network and later to control a vehicle.
How can we achieve the transmission to tensorflow as fast as possible? Share GPU memory? Or combine OpenGL with OpenCL and OpenCL with CUDA and CUDA with tensorflow?

Do I need to switch between tensorflow and numpy?

Data set is numpy set. Some tutorial said: because it is needed to in advantage of GPU, we should change numpy array to tensorflow tensor. And then use tensorflow model.
But after training, some code use numpy function to test and interactive. But the code in tensorflow official tutorial still use the same tensorflow model and tf.dataset to test.
I want to know:
When testing or real time apply, should I use numpy or tensorflow tensor and model?
In other words, is there some bad influences using tensorflow tensor and function if not traing?
eg.:
we use selected_words =tf.argsort(o_j)
in stead of
selected_words = np.argsort(o_j)
Since TF tensor runs on GPU and numpy array runs on CPU, conversion from GPU to CPU needs memory allocation and content copy using CUDA API (see pycuda document), which causes a tiny delay. Such delay could be a problem in training because of the high throughput data stream, but I think it could be ignored in most inference usage case. Anyway, if the selected_words is the desired output, we normally would prefer to use tf.argsort to make an elegant end-to-end model. However, if the output would be used in multiple places like logits, use np.argsort in a specific situation is fine.

Large input image limitations for VGG19 transfer learning

I'm using the Tensorflow (using the Keras API) in Python 3.0. I'm using the VGG19 pre-trained network to perform style transfer on an Nvidia RTX 2070.
The largest input image that I have is 4500x4500 pixels (I have removed the fully-connected layers in the VGG19 to allow for a fully-convolutional network that handles arbitrary image sizes.) If it helps, my batch size is just 1 image at a time currently.
1.) Is there an option for parallelizing the evaluation of the model on the image input given that I am not training the model, but just passing data through the pre-trained model?
2.) Is there any increase in capacity for handling larger images in going from 1 GPU to 2 GPUs? Is there a way for the memory to be shared across the GPUs?
I'm unsure if larger images make my GPU compute-bound or memory-bound. I'm speculating that it's a compute issue, which is what started my search for parallel CNN evaluation discussions. I've seen some papers on tiling methods that seem to allow for larger images