Why do I get ValueError('\'image\' must be fully defined.') when transforming image in Tensorflow? - tensorflow

I want to do real time data augmentation by chaining different image transformation operators in tensorflow. My code begins with image decoding and then runs different transformations but it throw a ValueError('\'image\' must be fully defined.'). Here is an example to reproduce this error :
def decode_and_augment(image_raw):
decoded = tf.image.decode_jpeg(image_raw)
flipped = tf.image.random_flip_left_right(decoded)
return flipped

This error arises because the tf.image.random_flip_left_right() op checks the static shape of its input when you build the graph, and tf.image.decode_jpeg() produces tensors that have a data dependency on the contents of image_raw so it the shape isn't statically known. Currently the only way to work around this is to set the static shape of the decoded tensor using Tensor.set_shape(), as follows:
decoded = tf.image.decode_jpeg(image_raw)
decoded.set_shape([IMAGE_HEIGHT, IMAGE_WIDTH, NUM_CHANNELS])
flipped = tf.image.random_flip_left_right(decoded)
The downside of this is that all images must now have the same size (and number of channels).
Many of the image ops don't follow the same gradual and dynamic shape inference as the rest of TensorFlow (which allows you to have unknown shapes or dimensions, assumes that the program is correct as you build the graph, and checks the real shapes at runtime). This is considered a bug at the present time, and we'll figure out a way to fix it.

Related

Keras Upsampling2d -> tflite conversion results in failing shape inference and undefined output shape

Keras Upsampling2d operation is converted into this with additional operations and undefined shape
Tensorflow however converts without this operations with correct shape
This leads to undefined overall model output shape and leads to errors on device. How can this be fixed?
This behavior is described here https://github.com/tensorflow/tensorflow/issues/45090
Keras by default sets dynamic batch size to true.
That means that the model input shape is [*,28,28] not [1,28,28].
The old(deprecated) converter used to ignore the dynamic batch and override this to 1 - which is wrong since this is not what the original model has - you can imagine how bad it will be when you try to resize the inputs at runtime.
The current converter instead handles the dynamic batch size correct, and the model generated can be resized at runtime correct.
That's why the sequence of "Shape, StridedSlice, Pack" wasn't constant folded, since the shape is dependent on the shape defined at runtime.
For single input model this can be fixed by setting constant shape for keras model before saving
model.input.set_shape(1 + model.input.shape[1:])

Resnet50 image preprocessing

I am using https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 to extract image feature vectors. However, I'm confused when it comes to how to preprocess the images prior to passing them through the module.
Based on the related Github explanation, it's said that the following should be done:
image_path = "path/to/the/jpg/image"
image_string = tf.read_file(image_path)
image = tf.image.decode_jpeg(image_string, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# All other transformations (during training), in my case:
image = tf.random_crop(image, [224, 224, 3])
image = tf.image.random_flip_left_right(image)
# During testing:
image = tf.image.resize_image_with_crop_or_pad(image, 224, 224)
However, using the aforementioned transformation, the results I am getting suggest that something might be wrong. Moreover, the Resnet paper is saying that the images should be preprocessed by:
A 224×224 crop is randomly sampled from an image or its
horizontal flip, with the per-pixel mean subtracted...
which I can't quite understand what is means. Can someone point me in the right direction?
Looking forward to you answers!
The image modules on TensorFlow Hub all expect pixel values in range [0,1], like you get in your code snippet above. This makes it easy and safe to switch between modules.
Inside the module, the input values are scaled to the range that the network was trained for. The module https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/3 has been published from a TF-Slim checkpoint (see documentation), which uses yet another convention for normalizing inputs than He&al. -- but all this is taken care of.
To demystify the language in He&al.: it refers to the mean R, G and B values aggregated over all pixels of the dataset they studied, following the old wisdom that normalizing inputs to zero mean helps neural networks train better. However, later papers on image classification no longer expended this degree of attention to dataset-specific preprocessing.
The citation from the Resnet paper you mentioned is based on the following explanation from the Alexnet paper:
ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality. Therefore, we down-sampled the images to a fixed resolution of256×256. Given a rectangular image, we first rescaled the image such that the shorter side was of length 256, and thencropped out the central 256×256patch from the resulting image. We did not pre-process the images in any other way, except for subtracting the mean activity over the training set from each pixel.
So in the Resnet paper, a similar process consist in taking a of 224x224 pixels part of the image (or of its horizontally flipped version) to ensure the network is given constant-sized images, and then center it by substracting the mean.

Tensorflow object detection:ValueError: cannot reshape array

I have trained "SSD with Mobilenet" model from Tensorflow. And training went fine.
Now when I try to test the performance of inference graph by running object_detection_tutorial.ipynb on an image, I get following error:
ValueError: cannot reshape array of size X into shape (a,b,c)
X,a,b,c are different values for different test images.
I don't think image size is causing the issue as model must perform independent of input image size. Infact, I get this error even with an image I used for training.
Please help here.
As suggested by #Mandroid, programmatically changing the input image to 3 channel might be the way to go, but this is how I ended up solving my issue.
Note: I am not sure that removing alpha from the images might have some consequences. This is some kind of information loss however.
Replacing image = Image.open(<image_path>) with image = Image.open(<image_path>).convert('RGB') did the job for me.

tensorflow: batches of variable-sized images

When one passes to tf.train.batch, it looks like the shape of the element has to be strictly defined, else it would complain that All shapes must be fully defined if there exist Tensors with shape Dimension(None). How, then, does one train on images of different sizes?
You could set dynamic_pad=True in the argument of tf.train.batch.
dynamic_pad: Boolean. Allow variable dimensions in input shapes. The given dimensions are padded upon dequeue so that tensors within a batch have the same shapes.
Usually, images are resized to a certain number of pixels.
Depending on your task you might be able to use other techniques in order to process images of varying sizes. For example, for face recognition and OCR, a fix sized window is used, that is then moved over the image. On other tasks, convolutional neural networks with pooling layers or recurrent neural networks can be helpful.
I see that this is quite old question, but in case someone will be searching how variable-size images can be still used in batches, I can tell what I did for Image-to-Image convolutional network (inference), which was trained for variable image size and batch 1. Why: when I tried to process images in batches using padding, the results become much worse, because signal was "spreading" inside of the network and started to influence its convolution pyramids.
So what I did is possible when you have source code and can load weights manually into convolutional layers. I modified the network in the following way: along with a batch of zero-padded images, I added additional placeholder which received a batch of binary masks with 1 where actual data was on the patch, and 0 where padding was applied. Then I multiplied signal by these masks after each convolutional layer inside the network, fighting "spreading". Multiplication isn't expensive operation, so it did not affect performance much.
The result was not deformed already, but still had some border artifacts, so I modified this approach further by adding small (2px) symmetric padding around input images (kernel size of all the layers of my CNN was 3), and kept it during propagation by using slightly bigger (+[2px,2px]) mask.
One can apply the same approach for training as well. Then some sort of "masked" loss is needed, where only the ROI on each patch is used to calculate loss. For example, for L1/L2 loss you can calculate the difference image between generated and label images and apply masks before summing up. More complicated losses might involve unstacking or iterating batch, and extracting ROI using tf.where or tf.boolean_mask.
Such training can be indeed beneficial in some cases, because you can combine small and big inputs for the network without small inputs being affected by the loss of big padded surroundings.

Feeding both jpeg and png images to the pre-trained inception3 model?

I gather from this question and its answer [ feeding image data in tensorflow for transfer learning ] that adding a new op to the imported graph will help, but it isn't clear to me if the resulting graph will handle both png and jpeg inputs automatically, and at the same time.
The answer to the above question suggests the following:
png_data = tf.placeholder(tf.string, shape=[])
decoded_png = tf.image.decode_png(png_data, channels=3)
# ...
graph_def = ...
softmax_tensor = tf.import_graph_def(
graph_def,
input_map={'DecodeJpeg:0': decoded_png},
return_elements=['softmax:0'])
sess.run(softmax_tensor, {png_data: ...})
Does this mean that a PNG input must be passed in as
sess.run(softmax_tensor, {png_data: image_array})
And a JPEG input must be given to the graph as
sess.run(softmax_tensor, {'DecodeJpeg:0': image_array})
Would the second statement work after the graph has been modified and an op added at the bottom?
The answers in the previous question center around switching the graph from taking JPEGs to PNGs. With the network as specified, there's no way for it to handle both.
You have a few options if you need to deal with both types.
Handle the decoding yourself, either with PIL, or TensorFlow, and feed the decoded image bytes into the graph at the output of the existing decode node.
If you're happy feeding the network, then do a two-step operation where you re-plumb the input to read from a variable, and create two new nodes that write decoded output to that variable.
sess.run(feed_jpeg, feed_dict={in_jpg: my_jpg})
sess.run(the_network)
or
sess.run(feed_png, feed_dict={in_png: my_png})
sess.run(the_network)
Create a more complex conditional input path where you can feed a flag value that tells it what data type it is, and uses TF conditionals to only pull on the specified decode node.
Write a new op that dispatches to either decode_png or decode_jpeg as necessary, based upon the format string at the start of the data.
I'm hoping we'll expose some string comparison ops so that you could write (4) in pure TensorFlow, but I don't have a timeline for any of that.