Storing pre-processed images - object-detection

I am evaluating a couple of object detection models on a data set and was planning on performing pre-processing on the data using standardization to zero mean and unit variance. But I don't know how to store the images when they have been pre-processed. Currently they are in jpg format, but what format can be used after I have pre-processed them? Some of the models I evaluate are yolov4, yolov5, and SSD.
If i instead scaled the pixel values from 0-255 to 0-1, what image format could I then use?
Also, if I train the object detector on pre-processed images and then want to apply it to a video, I assume I need to somehow pre-process the video to get decent results. How would I go about doing that?
I have calculated mean and std on my data set using the python module cv2. I read the images using imread which returns a numpy array. Then I subtract mean and divide with std. This gives me a numpy array with both negative and positive floating point values. But when I try to save this numpy array as an image using the function imwrite(filename, array), it doesn't work. I assume because the numpy array isn't allowed to contain negative values.

Related

In MediaPipe, is it possible to see augmented landmarks rendered in real time?

So I am using MediaPipe Holistic Solutions to extract keypoints from a body, hands and face, and I am using the data from this extraction for my calculations just fine. The problem is, I want to see if my data augmentation works, but I am unable to see it in real time. An example of how the keypoints are extracted:
lh_arr = (np.array([[result .x, result .y, result .z] for result in results.left_hand_landmarks.landmark]).flatten()
if I then do lets say, lh_arr [10:15]*2, I cant use this new data in the draw_landmarks function, as lh_arr is not class 'mediapipe.python.solution_base.SolutionOutputs'. Is there a way to get draw_landmarks() to use an np array instead or can I convert the np array back into the correct format? I have tried to get get the flattened array back into a dictionary of the same format of results, but it did not work. I can neither augment the results directly, as they are unsupported operand types.

How does PIL handle a numpy matrix with negative values?

I am trying to build a machine learning model, and in the first step, I plan to convert my data matrix (consisting of real numbers, both positive and negative values) into RGB images. All of these values are smaller than 255. I know we can do that with the PIL package, but I wonder if the original negative values can still be retained if we make them into images? Or they will be all rounded to zero?
I went through so many google examples, but still confused. So I am asking to be certain.

Fitting Large Matrix Calculations into Memory when using Tensorflow

I am attempting to build a model which has two phases.
The first takes an input image and passes it through a conv-deconv network. The resulting Tensor has entries corresponding to pixels in a desired output image (same size as the input image).
To calculate the final output image I want to take the value generated at each pixel location from the first phase and use it as an additional input to a reduction function that is applied over the entire input image. This second step has no trainable variables, but it does have computation/memory costs that grow exponentially with the size of the input (each output pixel is a function of all input pixels).
I'm currently using the tf.map_fn to calculate the output image. I'm mapping the output pixel calculation function onto the results from the first phase. My desire is that tensorflow would allocate the memory to store the intermediate tensors needed for each pixel calculation and then free that memory before moving on to the next pixel calculation. But instead it seems to never free the intermediate calculations causing OOM errors.
Is there someway to tell tensorflow (either explicitly or implicitly) that it should free the memory allocated to hold the data of a Tensor that is no longer needed in the calculation?
TensorFlow deallocates memory for the tensor as soon as the tensor is no longer needed for any future calculations. You can verify this by looking at memory deallocation messages as shown in this notebook.
It's possible you are running out of memory because TensorFlow executes nodes in a memory inefficient order.
As an example, consider following computation:
k = 2000
a = tf.random_uniform(shape=(k,k))
for i in range(n):
a = tf.matmul(a, tf.random_uniform(shape=(k,k)))
The order in which it is evaluated can be shown below
All the circles (tf.random_uniform) nodes are evaluated first, followed by squares (tf.matmul). This has O(n) memory requirement compared to O(1) for the optimal order.
You can use control dependencies to force a specific execution order, ie, using helper function as below:
import tensorflow.contrib.graph_editor as ge
def run_after(a_tensor, b_tensor):
"""Force a to run after b"""
ge.reroute.add_control_inputs(a_tensor.op, [b_tensor.op])

Incorporating very large constants in Tensorflow

For example, the comments for the Tensorflow image captioning example model state:
NOTE: This script will consume around 100GB of disk space because each image
in the MSCOCO dataset is replicated ~5 times (once per caption) in the output.
This is done for two reasons:
1. In order to better shuffle the training data.
2. It makes it easier to perform asynchronous preprocessing of each image in
TensorFlow.
The primary goal of this question is to see if there is an alternative to this type of duplication. In my use case, storing the data in this way would require each image to be duplicated in the TFRecord files many more times, on the order of 20 - 50 times.
I should note first that I have already fed the images through VGGnet to extract 4096 dim features, and I have these stored as a mapping between filename and the vectors.
Before switching over to Tensorflow, I had been feeding batches containing filename strings and then looking up the corresponding vector on a per-batch basis. This allows me to store all of the image data in ~15GB without needing to duplicate the data on disk.
My first attempt to do this in in Tensorflow involved storing indices in the TFExample buffers and then doing a "preprocessing" step to slice into the corresponding matrix:
img_feat = pd.read_pickle("img_feats.pkl")
img_matrix = np.stack(img_feat)
preloaded_images = tf.Variable(img_matrix)
first_image = tf.slice(preloaded_images, [0,0], [1,4096])
However, in this case, Tensorflow disallows a variable larger than 2GB. So my next thought was to partition this across several variables:
img_tensors = []
for i in range(NUM_SPLITS):
with tf.Graph().as_default():
img_tensors.append(tf.Variable(img_matrices[i], name="preloaded_images_%i"%i))
first_image = tf.concat(1, [tf.slice(t, [0,0], [1,4096//NUM_SPLITS]) for t in img_tensors])
In this case, I'm forced to store each partition on a separate graph, because it seems any one graph cannot be this large either. However, now the concat fails because each tensor I am concatenating is on a separate graph.
Any advice on incorporating a large amount (~15GB) of preloaded into the Tensorflow graph.
Potentially related is this question; however in this case I'd like to override the decoding of the actual JPEG file with the preprocessed value in a tensor op.

Changing numpy array using dpi value

I have an numpy array which I save to a image using savefig(). Then I read it in my code and the image is multiplied bigger than my original aray as dpi while saving is 100.
Is it possible to use dpi to make the image size larger and get it in a numpy array without saving and loading it again?
Sounds like you want to take an array of size (a, b) and scale it by an arbitrary factor s so that the resulting array has shape (a*s, b*s)?
There are several ways of doing this as far as I am aware, but perhaps the best resource is the cookbook page on rebinning: http://www.scipy.org/Cookbook/Rebinning
HTH