Tensorflow avoid shape information with crop - tensorflow

again I have some issue with Tensorflow. I am using a FCN model and need to apply a random crop due to memory usage.
tf.random_crop(combined, size=[512, 512, 4])
unfortunately now the new size "sticks" to the tensor and I can not get rid of it.
The issue caused by this is, that the resulting model only accepts input of size 512x512, which cannot be worked around in a nice way, as far as I know.
Is there any solution to either remove the shape information caused by random_crop or to easily adapt the size afterwards after obtaining a trained model?
Thank you in advance.

I don't know if it will completely suit your use-case, but the size parameter of tf.random_crop() can be a tensor, so you can for instance use a placeholder as shown in the example below.
import tensorflow as tf
import numpy as np
image = tf.placeholder(tf.float64, [None, None, 4])
cropped_size = tf.placeholder(tf.int32, [2])
cropped_image = tf.random_crop(image, size=[cropped_size[0], cropped_size[1], 4])
print(cropped_image.get_shape().as_list())
# [None, None, 4]
with tf.Session() as sess:
res = sess.run(cropped_image,
feed_dict={image: np.random.rand(900, 600, 4), cropped_size: [512, 512]})
print(res.shape)
# (512, 512, 4)
EDIT:
There may be different solutions to have the value of cropped_size assigned without using a feed_dict, depending how the crop dimensions are stored ; e.g. using TF file readers (the values would stay unknown till read).
Another simple hack otherwise: take advantage of tf.placeholder_with_default(default_val, shape) (doc), providing default_val with the crop dimensions acquired anyhow. As tf.placeholder_with_default() value isn't actually assigned until runtime (in case you you want to feed this placeholder with a different value), your dimensions would stay None in the graph:
import tensorflow as tf
image = tf.random_uniform((900, 600, 4)) # image tensor, acquired anyhow e.g. from tf.data
cropped_size_for_this_run = [512, 512] # crop dimensions, acquired anyhow
cropped_size = tf.placeholder_with_default(cropped_size_for_this_run, shape=[2])
cropped_image = tf.random_crop(image, size=[cropped_size[0], cropped_size[1], 4])
print(cropped_image.get_shape().as_list())
# [None, None, 4]
with tf.Session() as sess:
# You can leave cropped_size with its default value assigned at runtime:
res = sess.run(cropped_image)
print(res.shape)
# (512, 512, 4)
# ... or you can specify a new one if you wish so:
res = sess.run(cropped_image, feed_dict={cropped_size: [256, 256]})
print(res.shape)
# (256, 256, 4)
# ... It would switch back to the default value if you don't feed one:
res = sess.run(cropped_image)
print(res.shape)
# (512, 512, 4)

Related

Input 0 is incompatible with layer model_1: expected shape=(None, 244, 720, 3), found shape=(None, 720, 3)

I wanted to test my model by uploading an image but I got this error. And I think I got the error somewhere in these lines, I'm just not sure how to fix.
IMAGE_SIZE = [244,720]
inception = InceptionV3(input_shape=IMAGE_SIZE + [3], weights='imagenet',include_top=False)
Also here's the code of uploading my test image
picture = image.load_img('/content/DSC_0365.JPG', target_size=(244,720))
img = img_to_array(picture)
prediction = model.predict(img)
print (prediction)
I'm still a newbie in Machine learning so my knowledge right now is not yet that deep.
This is mostly because you didn't prepare your input (its dimension) for your inception model. Here is one possible solution.
Model
from tensorflow.keras.applications import *
IMAGE_SIZE = [244,720]
inception = InceptionV3(input_shape=IMAGE_SIZE + [3],
weights='imagenet', include_top=False)
# check it's input shape
inception.input_shape
(None, 244, 720, 3)
Inference
Let's test a sample by passing it to the model.
from PIL import Image
a = Image.open('/content/1.png').convert('RGB')
display(a)
Check its basic properties.
a.mode, a.size, a.format
('RGB', (297, 308), None)
So, its shape already in (297 x 308 x 3). But to able to pass it to the model, we need an extra axis which is the batch axis. To do that, we can do
import tensorflow as tf
import numpy as np
a = tf.expand_dims(np.array(a), axis=0)
a.shape
TensorShape([1, 308, 297, 3])
Much better. Now, we may want to normalize our data and resize it according to the model input shape. To do that, we can do:
a = tf.divide(a, 255)
a = tf.image.resize(a, [244,720])
a.shape
TensorShape([1, 244, 720, 3])
And lastly, pass it to the model.
inception(a).shape
TensorShape([1, 6, 21, 2048])
# or, preserve the prediction to later analysis
y_pred = inception(a)
Updated
If you're using the [tf.keras] image processing function which loads the image into PIL format, then we can do simply:
image = tf.keras.preprocessing.image.load_img('/content/1.png',
target_size=(244,720))
input_arr = tf.keras.preprocessing.image.img_to_array(image)
input_arr = np.array([input_arr]) # Convert single image to a batch.
inception(input_arr).shape
TensorShape([1, 6, 21, 2048])

Random 3d image slicing tensorflow data, depth of NoneType shape

What I need to do is to cut some slices (fix size) of a 3D-binary masks randomly.
The data is stored in a tensorflow dataset (tf.data). It does have to be this kind of data type to be able to use caching for speed up.
My source code so far:
import tensorflow as tf #version 2.2.0
mask.shape # (512,512,None,1), where (width, height, depth, channel), depth is NOT FIXED and depends on the image and therefore unknown
slice_number = 10
positive = tf.where(tf.equal(masks[:, :, :-slice_number,:],1))[:, 2] #slices with non zero values
# now we need to select slice id from positive mask slices randomly,
# which failes since the shape is always None due to the fact that image depth is unknown.
pos_id = random.randint(0, positive.shape[0])
mask = mask[:, :, positive[pos_id]:positive[pos_id] + slice_number]
How do I get the shape? Any ideas are highly appreciated
Thanks in advance!
Assuming that you want to randomly slice a fixed slice_size from a Tensor dimension with unknown depth, the following demonstrates how it can be done:
import tensorflow as tf
#tf.function
def random_slice(slice_size):
# For demonstration purposes, generate your mask with random depth
random_depth = tf.random.uniform(shape=[], dtype=tf.int32,
minval=20, maxval=50)
mask = tf.ones([512, 512, random_depth, 1], dtype=tf.int32)
print(mask) # Mask with unknown depth: Tensor("ones:0", shape=(512, 512, None, 1), dtype=int32)
depth = tf.shape(mask)[2]
print(depth) # Unknown depth: Tensor("strided_slice:0", shape=(), dtype=int32)
depth_begin = tf.random.uniform(shape=[], dtype=tf.int32,
minval=0, maxval=depth-slice_size)
print(depth_begin) # Random begin of slice based on unknown depth: Tensor("random_uniform_1:0", shape=(), dtype=int32)
mask_sliced = tf.slice(mask,
begin=[0, 0, depth_begin, 0],
size=[512, 512, slice_size, 1])
print(mask_sliced) # Random slice with known dimensions: Tensor("Slice:0", shape=(512, 512, 10, 1), dtype=int32)
return mask_sliced
mask_sliced = random_slice(slice_size=10)
print(mask_sliced) # Resolved random slice

TensorFlow network is receiving wrong tensor shape after using `dataset.map()`

Following the example at https://www.tensorflow.org/guide/datasets#preprocessing_data_with_datasetmap, I want to create a tf.Dataset which takes in paths to images, and maps these to image tensors.
My first attempt was the following, which is very similar to the example in the above link:
def input_parser(image_path):
image_data_string = tf.read_file(image_path)
image_decoded = tf.image.decode_png(image_data_string, channels=3)
image_float = tf.image.convert_image_dtype(image_decoded, dtype=tf.float32)
return image_float
def train_model():
image_paths = ['test_image1.png', .test_image2.png', 'test_image3.png']
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(map_func=input_parser)
iterator = dataset.make_initializable_iterator()
input_images = iterator.get_next()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(iterator.initializer)
for i in range(3):
x = sess.run(input_images)
print(x.shape)
This seemed to work ok, and printed out:
(64, 64, 3)
(64, 64, 3)
(64, 64, 3)
Which are indeed the dimensions of my images.
So then I tried to actually feed this data into a network to train, and modified the code accordingly:
def input_parser(image_path):
image_data_string = tf.read_file(image_path)
image_decoded = tf.image.decode_png(image_data_string, channels=3)
image_float = tf.image.convert_image_dtype(image_decoded, dtype=tf.float32)
return image_float
def train_model():
image_paths = ['test_image1.png', .test_image2.png', 'test_image3.png']
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(map_func=input_parser)
iterator = dataset.make_initializable_iterator()
input_images = iterator.get_next()
x = tf.layers.conv2d(inputs=input_images, filters=50, kernel_size=[5, 5], name='layer1')
x = tf.layers.flatten(x, name='layer2')
prediction = tf.layers.dense(inputs=x, units=4, name='layer3')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(iterator.initializer)
for i in range(3):
p = sess.run(prediction)
print(p)
This then gave me the following error message:
ValueError: Input 0 of layer layer1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, None, 3]
I have two questions about this:
1) Why is my network receiving an input of shape [None, None, 3], when as we have seen, the data read by the iterator is of shape [64, 64, 3].
2) Why isn't the shape of the input actually [1, 64, 64, 3], i.e. with 4 dimensions? I thought that the first dimension would be 1 because this is the batch size (I am not batching the data, so effectively this is a batch size of 1).
Thanks!
The shape is None in the spatial dimensions because in principle you could be loading images of any size. There is no guarantee that they will be 64x64 so Tensorflow uses None shapes to allow for inputs of any size. Since you know that the images will always be the same size, you can use a Tensor's set_shape method to give this information. Just include a line image_float.set_shape((64, 64, 3)) in your parse function. Note that this seems to modify the tensor in place. There is even an example using images here.
You are not batching the data, so no batch axis is added at all. The elements of the dataset are simply images of shape (64, 64, 3) and these elements are returned one by one by the iterator. If you want batches of size 1 you should use dataset = dataset.batch(1). Now the elements of the dataset are image "batches" of shape (1, 64, 64, 3). Of course you could also use any other method to add an axis in front, such as tf.expand_dims.

Logits representation in TensorFlow’s sparse_softmax_cross_entropy

I’ve a question regarding to the sparse_softmax_cross_entropy cost function in TensorFlow.
I want to use it in a semantic segmentation context where I use an autoencoder architecture which uses typical convolution operations to downsample images to create a feature vector. This vector is than upsampled (using conv2d_transposeand one-by-one convolutions to create an output image.
Hence, my input consists of single channel images with shape (1,128,128,1), where the first index represents the batch size and the last one the number of channels. The pixel of the image are currently either 0 or 1. So each pixel is mapped to a class. The output image of the autoencoder follows the same rules. Hence, I can’t use any predefined cost function than either MSE or the previously mentioned one.
The network works fine with MSE. But I can’t get it working with sparse_softmax_cross_entropy. It seems like that this is the correct cost function in this context but I’m a bit confused about the representation of the logits. The official doc says that the logits should have the shape (d_i,...,d_n,num_classes). I tried to ignore the num_classes part but this causes an error which says that only the interval [0,1) is allowed. Of course, I need to specify the number of classes which would turn the allowed interval to [0,2) because the exclusive upper bound is obviously num_classes.
Could someone please explain how to turn my output image into the required logits?
The current code for the cost function is:
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
The squeeze removes the last dimension of the label input to create a shape for the labels of [1 128 128]. This causes the following exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).
Edit:
As requested, here's a minimal example to verfiy the behavior of the cost function in the context of fully-convolutional nets:
constructor snipped:
def __init__(self, img_channels=1, img_width=128, img_height=128):
...
self._loss_op = None
self._learning_rate_placeholder = tf.placeholder(tf.float32, [], 'lr')
self._input_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'x')
self._target_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'y')
self._model = self.build_model()
self.init_optimizer()
build_model() snipped:
def build_model(self):
with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE):
#not necessary
x = tf.reshape(self._input_placeholder, [-1, self._img_width, self._img_height, self._img_channels])
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
with tf.variable_scope('conv2', reuse=tf.AUTO_REUSE):
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
with tf.variable_scope('conv3_red', reuse=tf.AUTO_REUSE):
conv3 = tf.layers.conv2d(conv2, 1024, 30, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv4_red', reuse=tf.AUTO_REUSE):
conv4 = tf.layers.conv2d(conv3, 64, 1, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv5_up', reuse=tf.AUTO_REUSE):
conv5 = tf.layers.conv2d_transpose(conv4, 32, (128, 128), strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv6_1x1', reuse=tf.AUTO_REUSE):
conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu)
return conv6
init_optimizer() snipped:
def init_optimizer(self):
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
optimizer = tf.train.AdamOptimizer(learning_rate=self._learning_rate_placeholder)
self._train_op = optimizer.minimize(self._loss_op)
By definition the logit is an unscaled probability (strictly speaking odds) or simply put any number. The sequence of logits of length num_classes can be interpreted as unscaled probability distribution. For example, in your case, num_classes=2, then logits=[125.0, -10.0] is an unscaled probability distribution for one pixel (which clearly favors 0 over 1). This array can be squashed to a valid distribution by a softmax, and this is what tf.sparse_softmax_cross_entropy does internally. For [125.0, -10.0] the squashed distribution will be very close to [1.0, 0.0].
Once again, the array [2] is for a single pixel.
If you want to compute the cross-entropy over entire image, the network has to output the binary distribution for all pixels and all images in a batch, i.e. output [batch_size, 128, 128, 2] tensor. The term sparse in the name of the loss refers to the fact that the labels are not one-hot encoded (more details here). It's most useful when the number of classes is large, i.e. one-hot encoding becomes too inefficient in terms of memory, but in your case it's insignificant. If you decide to use tf.sparse_softmax_cross_entropy loss, the labels must be [batch_size, 128, 128], it must be tf.int32 or tf.int64 and must contain correct class indices, zero or one. That's it: tensorflow can compute the cross-entropy between these two arrays.

Minimal RNN example in tensorflow

Trying to implement a minimal toy RNN example in tensorflow.
The goal is to learn a mapping from the input data to the target data, similar to this wonderful concise example in theanets.
Update: We're getting there. The only part remaining is to make it converge (and less convoluted). Could someone help to turn the following into running code or provide a simple example?
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
init_scale = 0.1
num_steps = 7
num_units = 7
input_data = [1, 2, 3, 4, 5, 6, 7]
target = [2, 3, 4, 5, 6, 7, 7]
#target = [1,1,1,1,1,1,1] #converges, but not what we want
batch_size = 1
with tf.Graph().as_default(), tf.Session() as session:
# Placeholder for the inputs and target of the net
# inputs = tf.placeholder(tf.int32, [batch_size, num_steps])
input1 = tf.placeholder(tf.float32, [batch_size, 1])
inputs = [input1 for _ in range(num_steps)]
outputs = tf.placeholder(tf.float32, [batch_size, num_steps])
gru = rnn_cell.GRUCell(num_units)
initial_state = state = tf.zeros([batch_size, num_units])
loss = tf.constant(0.0)
# setup model: unroll
for time_step in range(num_steps):
if time_step > 0: tf.get_variable_scope().reuse_variables()
step_ = inputs[time_step]
output, state = gru(step_, state)
loss += tf.reduce_sum(abs(output - target)) # all norms work equally well? NO!
final_state = state
optimizer = tf.train.AdamOptimizer(0.1) # CONVERGEs sooo much better
train = optimizer.minimize(loss) # let the optimizer train
numpy_state = initial_state.eval()
session.run(tf.initialize_all_variables())
for epoch in range(10): # now
for i in range(7): # feed fake 2D matrix of 1 byte at a time ;)
feed_dict = {initial_state: numpy_state, input1: [[input_data[i]]]} # no
numpy_state, current_loss,_ = session.run([final_state, loss,train], feed_dict=feed_dict)
print(current_loss) # hopefully going down, always stuck at 189, why!?
I think there are a few problems with your code, but the idea is right.
The main issue is that you're using a single tensor for inputs and outputs, as in:
inputs = tf.placeholder(tf.int32, [batch_size, num_steps]).
In TensorFlow the RNN functions take a list of tensors (because num_steps can vary in some models). So you should construct inputs like this:
inputs = [tf.placeholder(tf.int32, [batch_size, 1]) for _ in xrange(num_steps)]
Then you need to take care of the fact that your inputs are int32s, but a RNN cell works on float vectors - that's what embedding_lookup is for.
And finally you'll need to adapt your feed to put in the input list.
I think the ptb tutorial is a reasonable place to look, but if you want an even more minimal example of an out-of-the-box RNN you can take a look at some of the rnn unit tests, e.g., here.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/rnn_test.py#L164