How to Feed Batched Sequences of Images through Tensorflow conv2d - tensorflow

This seems like a trivial question, but I've been unable to find the answer.
I have batched sequences of images of shape:
[batch_size, number_of_frames, frame_height, frame_width, number_of_channels]
and I would like to pass each frame through a few convolutional and pooling layers. However, TensorFlow's conv2d layer accepts 4D inputs of shape:
[batch_size, frame_height, frame_width, number_of_channels]
My first attempt was to use tf.map_fn over axis=1, but I discovered that this function does not propagate gradients.
My second attempt was to use tf.unstack over the first dimension and then use tf.while_loop. However, my batch_size and number_of_frames are dynamically determined (i.e. both are None), and tf.unstack raises {ValueError} Cannot infer num from shape (?, ?, 30, 30, 3) if num is unspecified. I tried specifying num=tf.shape(self.observations)[1], but this raises {TypeError} Expected int for argument 'num' not <tf.Tensor 'A2C/infer/strided_slice:0' shape=() dtype=int32>.

Since all the images (num_of_frames) are passed to the same convolutional model, you can stack both batch and frames together and do the normal convolution. Can be achieved by just using tf.resize as shown below:
# input with size [batch_size, frame_height, frame_width, number_of_channels
x = tf.placeholder(tf.float32,[None, None,32,32,3])
# reshape for the conv input
x_reshapped = tf.reshape(x,[-1, 32, 32, 3])
x_reshapped output size will be (50, 32, 32, 3)
# define your conv network
y = tf.layers.conv2d(x_reshapped,5,kernel_size=(3,3),padding='SAME')
#(50, 32, 32, 3)
#Get back the input shape
out = tf.reshape(x,[-1, tf.shape(x)[1], 32, 32, 3])
The output size would be same as the input: (10, 5, 32, 32, 3
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(out, {x:np.random.normal(size=(10,5,32,32,3))}).shape)
#(10, 5, 32, 32, 3)

Related

Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 250, 3), found shape=(None, 3) in trained transformer model

I have a keras transformer model trained with tensorflow 2.7.0 and python 3.7 with input shape: (None, 250, 3) and a 2D array input with shape: (250, 3)(not an image)
When making a prediction with:
prediction = model.predict(state)
I get ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 250, 3), found shape=(None, 3)
project code: https://github.com/MikeSifanele/TT
This is how state looks like:
state = np.array([[-0.07714844,-0.06640625,-0.140625],[-0.140625,-0.1650391,-0.2265625]...[0.6376953,0.6005859,0.6083984],[0.7714844,0.7441406,0.7578125]], np.float32)
Some explanation:
For input shape to the model i.e. (None, 250, 3), the first axis (represented by None) is the "sample" axis, while the rest i.e. 250,3 denotes the input dimension. Thus, when the input shape is (250, 3) it assumes the first axis as the "sample" axis and the rest as the input dimension i.e. just 3. So, to make it consistent we need to add a dimension at the beginning described in the following:
state = np.expand_dims(state, axis=0)
The shape of state then becomes (1, 250, 3) ~(None, 250, 3).

How transfer learning on EfficientNets work for grayscale images?

My question concerns more about how the algorithm work. I have successfully implemented EfficientNet integration and modelization for grayscale images and now I want to understand why it works.
Here the most important aspect is the grayscale and its 1 channel. When I put channels=1, the algorithm doesn't work because, if I understood right, it was made on 3-channel images. When I put channels=3 it works perfectly.
So my question is, when I put channels = 3 and feed the model with preprocessed images with channels=1, why it continues to work?
Code for EfficientNetB5
# Variable assignments
num_classes = 9
img_height = 84
img_width = 112
channels = 3
batch_size = 32
# Make the input layer
new_input = Input(shape=(img_height, img_width, channels),
name='image_input')
# Download and use EfficientNetB5
tmp = tf.keras.applications.EfficientNetB5(include_top=False,
weights='imagenet',
input_tensor=new_input,
pooling='max')
model = Sequential()
model.add(tmp) # adding EfficientNetB5
model.add(Flatten())
...
Code of preprocessing into grayscale
data_generator = ImageDataGenerator(
validation_split=0.2)
train_generator = data_generator.flow_from_directory(
train_path,
target_size=(img_height, img_width),
batch_size=batch_size,
color_mode="grayscale", ###################################
class_mode="categorical",
subset="training")
I dug into what happens when you give grayscale images to efficient net models with three-channel inputs.
Here are the first layers of Efficient Net B5 whose input_shape is (128,128,3)
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 128, 128, 3 0 []
)]
rescaling_7 (Rescaling) (None, 128, 128, 3) 0 ['input_7[0][0]']
normalization_13 (Normalizatio (None, 128, 128, 3) 7 ['rescaling_7[0][0]']
n)
tf.math.truediv_4 (TFOpLambda) (None, 128, 128, 3) 0 ['normalization_13[0][0]']
stem_conv_pad (ZeroPadding2D) (None, 129, 129, 3) 0 ['tf.math.truediv_4[0][0]']
And here is the shape of the output of each of these layers when the model has as input a grayscale image:
input_7 (128, 128, 1)
rescaling_7 (128, 128, 1)
normalization_13 (128, 128, 3)
tf.math.truediv_4 (128, 128, 3)
stem_conv_pad (129, 129, 3)
As you can see, the number of channels of the output tensor switches from 1 to 3 when proceeding to the normalization_13 layer, so let's see what this layer is actually doing.
The Normalization layer is performing this operation on the input tensor:
(input_tensor - self.mean) / sqrt(self.var) // see https://www.tensorflow.org/api_docs/python/tf/keras/layers/Normalization
The number of channels changes after the subtraction. As a matter of fact, self.mean looks like this :
<tf.Tensor: shape=(1, 1, 1, 3), dtype=float32, numpy=array([[[[0.485, 0.456, 0.406]]]], dtype=float32)>
So self.mean has three channels and when performing the subtraction between a tensor with one channel and a tensor with three channels, the output looks like this: [firstTensor - secondTensorFirstChannel, firstTensor - secondTensorSecondChannel, firstTensor - secondTensorThirdChannel]
And this is how the magic happens and this is why the model can take as input grayscale images!
I have checked this with efficient net B5 and with efficient net B2V2. Even if they have differences in the way the Normalization layer is declared, the process is the same. I suppose that is also the case for the other efficient net models.
I hope it was clear enough!
This is interesting. If training still works with channels = 3 even though the input is grayscale, I would check the batch shape of the train_generator(maybe print a couple of batches to get a feel for it). Here is a code snippet to quickly check the batch shape. (plotImages() is available in Tensorflow docs)
imgs,labels = next(train_generator)
print('Batch shape: ',imgs.shape)
plotImages(imgs,labels)

TensorFlow network is receiving wrong tensor shape after using `dataset.map()`

Following the example at https://www.tensorflow.org/guide/datasets#preprocessing_data_with_datasetmap, I want to create a tf.Dataset which takes in paths to images, and maps these to image tensors.
My first attempt was the following, which is very similar to the example in the above link:
def input_parser(image_path):
image_data_string = tf.read_file(image_path)
image_decoded = tf.image.decode_png(image_data_string, channels=3)
image_float = tf.image.convert_image_dtype(image_decoded, dtype=tf.float32)
return image_float
def train_model():
image_paths = ['test_image1.png', .test_image2.png', 'test_image3.png']
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(map_func=input_parser)
iterator = dataset.make_initializable_iterator()
input_images = iterator.get_next()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(iterator.initializer)
for i in range(3):
x = sess.run(input_images)
print(x.shape)
This seemed to work ok, and printed out:
(64, 64, 3)
(64, 64, 3)
(64, 64, 3)
Which are indeed the dimensions of my images.
So then I tried to actually feed this data into a network to train, and modified the code accordingly:
def input_parser(image_path):
image_data_string = tf.read_file(image_path)
image_decoded = tf.image.decode_png(image_data_string, channels=3)
image_float = tf.image.convert_image_dtype(image_decoded, dtype=tf.float32)
return image_float
def train_model():
image_paths = ['test_image1.png', .test_image2.png', 'test_image3.png']
dataset = tf.data.Dataset.from_tensor_slices(image_paths)
dataset = dataset.map(map_func=input_parser)
iterator = dataset.make_initializable_iterator()
input_images = iterator.get_next()
x = tf.layers.conv2d(inputs=input_images, filters=50, kernel_size=[5, 5], name='layer1')
x = tf.layers.flatten(x, name='layer2')
prediction = tf.layers.dense(inputs=x, units=4, name='layer3')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(iterator.initializer)
for i in range(3):
p = sess.run(prediction)
print(p)
This then gave me the following error message:
ValueError: Input 0 of layer layer1 is incompatible with the layer: expected ndim=4, found ndim=3. Full shape received: [None, None, 3]
I have two questions about this:
1) Why is my network receiving an input of shape [None, None, 3], when as we have seen, the data read by the iterator is of shape [64, 64, 3].
2) Why isn't the shape of the input actually [1, 64, 64, 3], i.e. with 4 dimensions? I thought that the first dimension would be 1 because this is the batch size (I am not batching the data, so effectively this is a batch size of 1).
Thanks!
The shape is None in the spatial dimensions because in principle you could be loading images of any size. There is no guarantee that they will be 64x64 so Tensorflow uses None shapes to allow for inputs of any size. Since you know that the images will always be the same size, you can use a Tensor's set_shape method to give this information. Just include a line image_float.set_shape((64, 64, 3)) in your parse function. Note that this seems to modify the tensor in place. There is even an example using images here.
You are not batching the data, so no batch axis is added at all. The elements of the dataset are simply images of shape (64, 64, 3) and these elements are returned one by one by the iterator. If you want batches of size 1 you should use dataset = dataset.batch(1). Now the elements of the dataset are image "batches" of shape (1, 64, 64, 3). Of course you could also use any other method to add an axis in front, such as tf.expand_dims.

TFLearn LSTM Time Series Classification

I am trying to build an LSTM network which takes a sequence and classifies the last time step in each sequence.
This is what I have so far:
#build
net = tf.input_data(shape=[None, 64, 17])
net = tf.lstm(net, 128, dropout=[.2,.8], return_seq=True)
net = tf.lstm(net, 128, dropout=[.2,.8], return_seq=True)
net = tf.lstm(net, 128, dropout=[.2,.8])
net = tf.fully_connected(net, 3, activation='softmax')
net = tf.regression(net, optimizer='adam', learning_rate=0.01, loss='categorical_crossentropy')
#train
model = tf.DNN(net, tensorboard_verbose=0)
model.fit(trainX, trainY, validation_set=(testX,testY), show_metric=True, batch_size=None)
My data has been shaped into a large number of sequences with each being 64 timesteps long. each timestep has 17 features. The first sequence being timesteps 0 to 63, the second being timesteps 1 to 64, etc.
The network builds just fine, but in the fit method I get this error:
'ValueError: Cannot feed value of shape (64,17) for Tensor
'InputData/X:0', which has shape (?,64,17)
Anyone has a suggestion as to my problem?
It's not in your snippet, but it looks like trainX has the shape (64, 17). If so, you should reshape it o a batch of size 1:
trainX = np.expand_dims(trainX, 0) # now it's [1, 64, 17]
The same for testX.

Using Estimator for building an LSTM network

I am trying to build an LSTM network using an Estimator. My data looks like
X = [[1,2,3], [2,3,4], ... , [98,99,100]]
y = [2, 3, ... , 99]
I am using an Estimator:
regressor = learn.Estimator(model_fn=lstm_model,
params=model_params,
)
where the lstm_model function is
def lstm_model(features, targets, mode, params):
def lstm_cells(layers):
if isinstance(layers[0], dict):
return [tf.nn.rnn_cell.BasicLSTMCell(layer['steps'],state_is_tuple=True) for layer in layers]
return [tf.nn.rnn_cell.BasicLSTMCell(steps, state_is_tuple=True) for steps in layers]
stacked_lstm = tf.nn.rnn_cell.MultiRNNCell(lstm_cells(params['rnn_layers']), state_is_tuple=True)
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
return learn.models.linear_regression(output, targets)
and params are
model_params = {
'steps': 1000,
'learning_rate': 0.03,
'batch_size': 24,
'time_steps': 3,
'rnn_layers': [{'steps': 3}],
'dense_layers': [10, 10]
}
and then I do the fitting
regressor.fit(X, y)
The issue I am facing is
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
requires a sequence but I am not sure how to split my features to into list of tensors. The shape of features inside the lstm_model function is (?, 3)
I have two questions, how do I do the training in batches? and how do I split 'features' so
output, layers = tf.nn.rnn(stacked_lstm, [features], dtype=tf.float32)
doesn't throw and error. The error I am getting is
raise TypeError("%s that don't all match." % prefix)
TypeError: Tensors in list passed to 'values' of 'Concat' Op have types [float64, float32] that don't all match.
I am using tensorflow 0.12
I had to set the shape for features to be
(batch_size, time_step, 1) or (None, time_step, 1) and then unstack the features to go in the rnn. Unstacking the features in the "time_step" so you have a list of tensors with the size of time steps and the shape for each tensor should be (None, 1) or (batch_size, 1)