The input dimension of the LSTM layer in Keras - tensorflow

I'm trying keras.layers.LSTM.
The following code works.
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([1, 2, 3]).reshape((1, 3, 1))
x = keras.layers.Input(shape=(3, 1))
y = keras.layers.LSTM(10)(x)
model = keras.Model(inputs=x, outputs=y)
print (model.predict(data))
As shown above, the input data shape is (1, 3, 1), and the actual input shape in the Input layer is (3, 1). I'm a little bit confused about this inconsistency of the dimension.
If I use the following shape in the Input layer, it doesn't work:
x = keras.layers.Input(shape=(1, 3, 1))
The error message is as follows:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1, 3, 1]
It seems that the rank of the input must be 3, but why should we use a rank-2 shape in the Input layer?

Keras works with "batches" of "samples". Since most models use variable batch sizes that you define only when fitting, for convenience you don't need to care about the batch dimension, but only with the sample dimension.
That said, when you use shape = (3,1), this is the same as defining batch_shape = (None, 3, 1) or batch_input_shape = (None, 3, 1).
The three options mean:
A variable batch size: None
With samples of shape (3, 1).
It's important to know this distinction especially when you are going to create custom layers, losses or metrics. The actual tensors all have the batch dimension and you should take that into account when making operations with tensors.

Check out the documentation for tf.keras.Input. The syntax is as-
shape: defines the shape of a single sample, with variable batch size.
Notice, that it expects the first value as batch_size otherwise pass batch_size as a parameter explicitly


How can I multiply a tensor with an unknown dimension to a tensorflow variable?

I'm working in Keras (Tensorflow 2). I'd like to multiply each element of a tensor with its own trainable weight. Let's say that my input tensor is 1D, with 10 elements; so I try to define the input as a Keras input tensor, the weights as a tf.Variable, and I try to use the Keras Multiply layer, thus:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=(10), name='inputs')
weights = tf.Variable(tf.random.normal([10]), name='weights')
outputs = tf.keras.layers.Multiply()([inputs, weights])
Now when I inspect the dimensions they are:
inputs: shape=(None, 10)
weights: shape=(10,)
outputs: shape=(10, 10)
The input dimension has a None dimension, for the batch size, which is what I expect and want. However I expected outputs to have shape=(None, 10). Instead, the initial dimension for the batch size seems to have taken a fixed size of 10. How should I correct this?
You need to broadcast weights along dimenstion 0. The shape of the dimension you want to fix must be constant.
That is, weights must have the shape (1, 10), not (10,).
This can be done using:
weights = tf.Variable(tf.random.normal([1, 10]), name='weights')
weights = tf.Variable(tf.random.normal([10]), name='weights')
weights = tf.expand_dims(weights, axis=0)

Is it possible to change the input shape of a tensorflow pretrained model?

I have a Tensorflow pre-trained model for Image Segmentation that receives 6 bands as input, I would like to change the input size of the model to receive 4 bands so I can retrain with my own dataset, but still not able to do it, no sure if this is even possible?
I tried getting the input node by name and change it using import_graph_def with no success, seems like it is asking to respect the dimensions when trying to substitute.
graph = tf.get_default_graph()
tf_new_input = tf.placeholder(shape=(4, 256, 256), dtype='float32', name='new_input')
tf.import_graph_def(graph_def, input_map={"ImageInputLayer": tf_new_input})
But I am getting the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 4 and 6 for 'import/ImageInputLayer_Sub' (op: 'Sub') with input shapes: [4,256,256], [6,256,256]
You have to convert your 4 channel placeholder input to 6 channel input and also the input image shape should be the same as your 6 channel model expects. You may use any operation but conv2d is an easy operation to perform before you feed it to your existing model. This is how you do it.
with tf.Graph().as_default() as old_graph:
# You have to load your 6 channel input graph here
saver.restore(tf.get_default_session(), <<save_path>>)
# Assuming that input node is named as 'input_node' and
# final node is named as 'softmax_node'
with tf.Graph().as_default() as new_graph:
tf_new_input = tf.placeholder(shape=(None, 256, 256, 4), dtype='float32')
# Map 4 channeled input to 6 channel and
# image input shape should be same as expected by old model.
new_node = tf.nn.conv2d(tf_new_input, (3, 3, 4, 6), strides=1, padding='SAME')
# If you want to obtain output node so that you can further perform operations.
softmax_node = tf.import_graph_def(old_graph, input_map={'input_node:0': new_node},
user1190882 responded this question very well. Just using this section to post the code for future reference, I had to make a small change by creating the filter in a separate variable since I was getting an error : Shape must be rank 4 but is rank 1 for 'Conv2D' . Also I made a small change since the input format of my model is "Channels First", I added data_format flag.
with tf.Graph().as_default() as new_graph:
tf_new_input = tf.placeholder(shape=(None, 4, 256, 256), dtype='float32')
# Creating separate variable for filter
filterc = tf.Variable(tf.random_normal([3, 3, 4, 6]))
new_node = tf.nn.conv2d(tf_new_input, filterc, strides=1, padding='SAME', data_format='NCHW')
tf.import_graph_def(old_graph, input_map={'ImageInputLayer': new_node})

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

Simple ML Algo not working: ValueError: Error when checking input: expected dense_4_input to have shape (None, 5) but got array with shape (5, 1)

I have an incredible simple algorithm that is erroring with, "ValueError: Error when checking input: expected dense_4_input to have shape (None, 5) but got array with shape (5, 1)"....
Here is the code I am running.
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
x = np.array([[1],[2],[3],[4],[5]])
y = np.array([[1],[2],[3],[4],[5]])
x_val = np.array([[6],[7]])
model = Sequential()
model.add(Dense(1, input_dim=5))
model.compile(optimizer='rmsprop', loss='mse'), y, epochs=2, validation_data=(x_val, y_val))
There are two problems:
First: As the output already says: "ValueError: Error when checking input: expected dense_4_input to have shape (None, 5) but got array with shape (5, 1)" This means, that the Neural Network expects an array of shape (*, 5). With the asterisk I want to indicate that the dimensions is free to choose by the user. Say if you have tons of data and every example is a vector of shape (1, 5) you can stack them all underneath and pass one big chunk of data to the neural net, it will know how to handle it. Therefore you have to make x a row vector as follows:
x = np.array([[1,2,3,4,5]])
See also in the Keras docs- Specifying the input shape.
Second: You specify the output of the first Layer to be one. This means, the 5 dimensional input will be connected to only one neuron. Your output vector y however has 5 values. So your output vector dimension and your neural net output don't fit together.
So you have to go with a scalar y:
y = np.array([1])
Furthermore, your validation data and training data should have the same dimensions. Additionaly there is a typo in your code: y_val is never defined.

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.