Tensorflow cannot initialize tf.Variable for dynamic batch size - tensorflow

I tried creating a tf.Variable with a dynamic shape. The following outlines the problem.
Doing this works.
init_bias = tf.random_uniform(shape=[self.config.hidden_layer_size, tf.shape(self.question_inputs)[0]])
However, when i try to do this:
init_bias = tf.Variable(init_bias)
It throws the error ValueError: initial_value must have a shape specified: Tensor("random_uniform:0", shape=(?, ?), dtype=float32)
Just come context (question input is a placeholder which dynamic batch ):
self.question_inputs = tf.placeholder(tf.int32, shape=[None, self.config.qmax])
It seems like putting a dynamic value into random uniform gives shape=(?,?) which gives an error with tf.Variable.
Thanks and appreciate any help!

This should work:
init_bias = tf.Variable(init_bias,validate_shape=False)
If validate_shape is False, tensorflow allows the variable to be initialized with a value of unknown shape.
However, what you're doing seems a little strange to me. In tensorflow, Variables are generally used to store weights of a neural net, whose shape remains fixed irrespective of the batch size. Variable batch size is handled by passing a variable length tensor into the graph (and multiplying/adding it with a fixed shape bias Variable).

Related

How to batch CsvDataset correctly in Tensorflow 2.0?

I'm using tf.data.experimental.make_csv_dataset to create a dataset from a .csv file. I'm also using tf.keras.layers.DenseFeatures as an input layer of my model.
I'm struggling to create a DenseFeatures layer properly so that it is compatible with my dataset in the case when batch_size parameter of make_csv_dataset is not equal to 1 (in case if batch_size=1 my setup works as expected).
I create DenseFeatures layer using a list of tf.feature_column.numeric_column elements with shape=(my_batch_size,), but it seems like in this case for some reason the input layer expects [my_batch_size,my_batch_size] shape instead of [my_batch_size,1].
With my_batch_size=19 I'm getting the following error when trying to fit the model:
ValueError: Cannot reshape a tensor with 19 elements to shape [19,19] (361 elements) for 'MyModel/Input/MyColumn1/Reshape' (op: 'Reshape') with input shapes: [19,1], [2] and with input
tensors computed as partial shapes: input[1] = [19,19].
If I don't specify shape when creating numeric_column it doesn't work either. I'm getting the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: The second input must be a scalar, but it has shape [19]
which assumes that numeric_column expects a scalar but recieves the whole batch in one Tensor.
How do I create an input layer of DenseFeatures so that it accepts the dataset produced by make_csv_dataset(batch_size=my_batch_size)?
From the tf.feature_column.numeric_column documentation:
shape: An iterable of integers specifies the shape of the Tensor. An integer can be given which means a single dimension Tensor with given width. The Tensor representing the column will have the shape of [batch_size] + shape.
This means that you must not pass the batch size to the shape argument: shape=().
Currently, with a batch size of 1, you get shape=(1,) that TF can handle thanks to broadcasting or something like that (dimensions of size 1 are easily added by TF if necessary), that's why it works.
Hope this can help. Provide more code if you want more help.

Custom loss function works even though dimensions mismatch

I'm using Keras/TF with the following model:
conv = Conv2D(4, 3, activation = None, use_bias=True)(inputs)
conv = Conv2D(2, 1, activation = None, use_bias=True)(conv)
model = Model(input = inputs, output = conv)
model.compile(optimizer=Adam(lr=1e-4), loss=keras.losses.mean_absolute_error)
In model.fit, I get an error saying:
ValueError: Error when checking target: expected conv2d_2 to have
shape (300, 320, 2) but got array with shape (300, 320, 1)
This is as expected because the targets are single channel images whereas the last layer in the model has 2 channels.
What I don't understand is why when I use a custom loss function:
def my_loss2(y_true, y_pred):
return keras.losses.mean_absolute_error(y_true, y_pred)
and compile the model:
model.compile(optimizer = Adam(lr=1e-4), loss=my_loss2)
it does work (or at least, not giving the error). Is there any kind of automatic conversion/truncation going on?
I'm using TF (CPU) 1.12.0, and Keras 2.2.2
Sincerely,
Elad
Why is the behavior different for built-in and custom losses?
It turns out that Keras is performing an upfront shape check for built-in functions that are defined in the losses module.
In the source code of Model._standardize_user_data, which is called by fit, I found this comment:
# If `loss_fn` is not a function (e.g. callable class)
# or if it not in the `losses` module, then
# it is a user-defined loss and we make no assumptions
# about it.
In the code around that comment you can see that indeed, depending on the type of loss function (built-in or custom), the output shape is either passed to an inner call of standardize_input_data or not. If the output shape is passed, standardize_input_data is raising the error message you are getting.
And I think this behavior makes some sense: Without knowing the implementation of a loss function, you cannot know its shape requirements. Someone may invent some loss function that needs different shapes. On the other hand, the docs clearly say that the loss function's parameters must have the same shape:
y_true: True labels. TensorFlow/Theano tensor.
y_pred: Predictions. TensorFlow/Theano tensor of the same shape as y_true.
So I find this a little inconsistent...
Why does your custom loss function work with incompatible shapes?
If you provide a custom loss, it may still work, even if the shapes do not perfectly match. In your case, where only the last dimension is different, I'm quite sure that broadcasting is what is happening. The last dimension of your targets will just be duplicated.
In many cases broadcasting is quite useful. Here, however, it is likely not since it hides a logical error.

TensorFlow: Initial value without shape

I tried to implement the following code.
import tensorflow as tf
a = tf.placeholder(tf.int32)
b = tf.placeholder(tf.int32)
def initw(a,b):
tf.Variable(tf.sign(tf.random_uniform(shape=[a,b],minval=-1.0,maxval=1.0)))
bla = initw(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run([bla], feed_dict={a:2, b:2}))
But I keep getting an error which states:
ValueError: initial_value must have a shape specified: Tensor("Sign:0",shape=(?, ?), dtype=float32)
Can someone tell me what I am doing wrong here? I really don't see what causes the error.
EDIT:
I want to use initw(a,b) to initialize the weights of a network. I want to be able to do something like:
weights = {
"h1": tf.get_variable("h1", initializer=initw(a,b).initialized_value())
}
Where a and b are the height and width of a matrix.
In my eyes the error message is actually quite precise. But I understand your confusion. You probably do not really understand how Tensorflow works under the hood. You might want to start reading here.
The shape of the computational graph must be known before runtime. There can only be one axis in every variable or placeholder which is unspecified at compile time, it is than later at runtime considered to be the batch dimension.
In your case you are trying to use placeholders to specify the dimensions of a variable, which is impossible because the graph can not be compiled this way.
I don't know what you are trying to do with this but I would guess there is a way to achieve what you need. You can actually use the length of the batch dimension dynamically to draw a uniform vector of that size.
Edit: After you updated the question I feel like I was right about my suspicion. There is no need for a and b to be placeholders, just make them Python variables, like this:
import tensorflow as tf
# Matrix shape must be known in advance, but can of course still be specified
# in some settings file or at the beginning of the python skript
A = 2
B = 2
W = tf.Variable(tf.sign(tf.random_uniform(shape=(A, B), minval=-1.0,
maxval=1.0)))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(W))

How do you create a dynamic_rnn with dynamic "zero_state" (Fails with Inference)

I have been working with the "dynamic_rnn" to create a model.
The model is based upon a 80 time period signal, and I want to zero the "initial_state" before each run so I have setup the following code fragment to accomplish this:
state = cell_L1.zero_state(self.BatchSize,Xinputs.dtype)
outputs, outState = rnn.dynamic_rnn(cell_L1,Xinputs,initial_state=state, dtype=tf.float32)
This works great for the training process. The problem is once I go to the inference, where my BatchSize = 1, I get an error back as the rnn "state" doesn't match the new Xinputs shape. So what I figured is I need to make "self.BatchSize" based upon the input batch size rather than hard code it. I tried many different approaches, and none of them have worked. I would rather not pass a bunch of zeros through the feed_dict as it is a constant based upon the batch size.
Here are some of my attempts. They all generally fail since the input size is unknown upon building the graph:
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
.....
state = tf.zeros([Xinputs.get_shape()[0], self.state_size], Xinputs.dtype, name="RnnInitializer")
Another approach, thinking the initializer might not get called until run-time, but still failed at graph build:
init = lambda shape, dtype: np.zeros(*shape)
state = tf.get_variable("state", shape=[Xinputs.get_shape()[0], self.state_size],initializer=init)
Is there a way to get this constant initial state to be created dynamically or do I need to reset it through the feed_dict with tensor-serving code? Is there a clever way to do this only once within the graph maybe with an tf.Variable.assign?
The solution to the problem was how to obtain the "batch_size" such that the variable is not hard coded.
This was the correct approach from the given example:
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
state = cell_L1.zero_state(Xinputs.get_shape()[0],Xinputs.dtype)
The problem is the use of "get_shape()[0]", this returns the "shape" of the tensor and takes the batch_size value at [0]. The documentation doesn't seem to be that clear, but this appears to be a constant value so when you load the graph into an inference, this value is still hard coded (maybe only evaluated at graph creation?).
Using the "tf.shape()" function, seems to do the trick. This doesn't return the shape, but a tensor. So this seems to be updated more at run-time. Using this code fragment solved the problem of a training batch of 128 and then loading the graph into TensorFlow-Service inference handling a batch of just 1.
Xinputs = tf.placeholder(tf.int32, (None, self.sequence_size, self.num_params), name="input")
batch_size = tf.shape(Xinputs)[0]
state = self.cell_L1.zero_state(batch_size,Xinputs.dtype)
Here is a good link to TensorFlow FAQ which describes this approach 'How do I build a graph that works with variable batch sizes?':
https://www.tensorflow.org/resources/faq

Why can't tensorflow determine the shape of this expression?

I have the following expression which is giving me problems. I have defined the batch_size as batch_size = tf.shape(input_tensor)[0] which dynamically determines the size of the batch based on the size of the input tensor to the model. I have used it elsewhere in the code without issue. What I am confused about is that when I run the following line of code it says the shape is (?, ?) I would expect it to be (?, 128) because it knows the second dimension.
print(tf.zeros((batch_size, 128)).get_shape())
I want to know the shape since I am trying to do the following and I am getting an error.
rnn_input = tf.reduce_sum(w * decoder_input, 1)
last_out = decoder_outputs[t - 1] if t else tf.zeros((batch_size, 128))
rnn_input = tf.concat(1, (rnn_input, last_out))
This code needs to set last_out to zero on the first time step.
Here is the error ValueError: Linear expects shape[1] of arguments: [[None, None], [None, 1024]]
I am doing something similar when I determine my initial state vector for the RNNs.
state = tf.zeros((batch_size, decoder_multi_rnn.state_size), tf.float32)
I also get (?, ?) when I try to print the size of state but it does not really throw any exceptions when I try to use it.
You are mixing static shapes and dynamic shapes. Static shape is what you get during tensor.get_shape(tensor) which is best-effort attempt to obtain shape, while dynamic shape comes from sess.run(tf.shape(tensor)) and it is always defined.
To be more precise, tf.shape(tensor) creates an op in the graph that will produce shape tensor on run call. If you do aop=tf.shape(tensor)[0], there's some magic through _SliceHelper that adds extra ops that will extract first element of the shape tensor on run call.
This means that myval=tf.zeros((aop, 128)) has to run aop to obtain the dimensions and this means that first dimension of myval is undefined until you issue the run call. IE, your run call could look like sess.run(myval, feed_dict={aop:2}, where feed_dict overrides aop with 2. Hence static shape inference reports ? for that dimension.
(EDIT: I rewrite an answer as what I wrote before was not up to the point)
The quick fix to your issue is to use set_shape() to update the static (inferred) shape of the Tensor:
input_tensor = tf.placeholder(tf.float32, [None, 32])
batch_size = tf.shape(input_tensor)[0]
res = tf.zeros((batch_size, 128))
print res.get_shape() # prints (?, ?) WHEREAS one could expect (?, 128)
res.set_shape([None, 128])
print res.get_shape() # prints (?, 128)
As for why TensorFlow looses the information about the second dimension being 128, I don't really know.
Maybe #Yaroslav will be able to answer.
EDIT:
The incorrect behavior was corrected following this issue.