ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel - tensorflow

This it the code:
X = tf.placeholder(tf.float32, [batch_size, seq_len_1, 1], name='X')
labels = tf.placeholder(tf.float32, [None, alpha_size], name='labels')
rnn_cell = tf.contrib.rnn.BasicLSTMCell(512)
m_rnn_cell = tf.contrib.rnn.MultiRNNCell([rnn_cell] * 3, state_is_tuple=True)
pre_prediction, state = tf.nn.dynamic_rnn(m_rnn_cell, X, dtype=tf.float32)
This is full error:
ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel, but specified shape (1024, 2048) and found shape (513, 2048).
I'm using a GPU version of tensorflow.

I encountered a similar problem when I upgraded to v1.2 (tensorflow-gpu).
Instead of using [rnn_cell]*3, I created 3 rnn_cells (stacked_rnn) by a loop (so that they don't share variables) and fed MultiRNNCell with stacked_rnn and the problem goes away. I'm not sure it is the right way to do it.
stacked_rnn = []
for iiLyr in range(3):
stacked_rnn.append(tf.nn.rnn_cell.LSTMCell(num_units=512, state_is_tuple=True))
MultiLyr_cell = tf.nn.rnn_cell.MultiRNNCell(cells=stacked_rnn, state_is_tuple=True)

An official TensorFlow tutorial recommends this way of multiple LSTM network definition:
def lstm_cell():
return tf.contrib.rnn.BasicLSTMCell(lstm_size)
stacked_lstm = tf.contrib.rnn.MultiRNNCell(
[lstm_cell() for _ in range(number_of_layers)])
You can find it here: https://www.tensorflow.org/tutorials/recurrent
Actually it it almost the same approach that Wasi Ahmad and Maosi Chen suggested above but maybe in a little bit more elegant form.

I guess it's because your RNN cells on each of your 3 layers share the same input and output shape.
On layer 1, the input dimension is 513 = 1(your x dimension) + 512(dimension of the hidden layer) for each timestamp per batch.
On layer 2 and 3, the input dimension is 1024 = 512(output from previous layer) + 512 (output from previous timestamp).
The way you stack up your MultiRNNCell probably implies that 3 cells share the same input and output shape.
I stack up MultiRNNCell by declaring two separate types of cells in order to prevent them from sharing input shape
rnn_cell1 = tf.contrib.rnn.BasicLSTMCell(512)
run_cell2 = tf.contrib.rnn.BasicLSTMCell(512)
stack_rnn = [rnn_cell1]
for i in range(1, 3):
stack_rnn.append(rnn_cell2)
m_rnn_cell = tf.contrib.rnn.MultiRNNCell(stack_rnn, state_is_tuple = True)
Then I am able to train my data without this bug.
I'm not sure whether my guess is correct, but it works for me. Hope it works for you.

I encountered the same issue using Google Colab's Jupiter notebook. I resolved the issue by restarting the kernel and then rerunning the code.

Related

What would be the output from tensorflow dense layer if we assign itself as input and output while making a neural network?

I have been going through the implementation of neural network in openAI code for any Vanilla Policy Gradient (As a matter of fact, this part is used nearly everywhere). The code looks something like this :
def mlp_categorical_policy(x, a, hidden_sizes, activation, output_activation, action_space):
act_dim = action_space.n
logits = mlp(x, list(hidden_sizes) + [act_dim], activation, None)
logp_all = tf.nn.log_softmax(logits)
pi = tf.squeeze(tf.random.categorical(logits, 1), axis=1)
logp = tf.reduce_sum(tf.one_hot(a, depth=act_dim) * logp_all, axis=1)
logp_pi = tf.reduce_sum(tf.one_hot(pi, depth=act_dim) * logp_all, axis=1)
return pi, logp, logp_pi
and this multi-layered perceptron network is defined as follows :
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(inputs=x, units=h, activation=activation)
return tf.layers.dense(inputs=x, units=hidden_sizes[-1], activation=output_activation)
My question is what is the return from this mlp function? I mean the structure or shape. Is it an N-dimentional tensor? If so, how is it given as an input to tf.random_categorical? If not, and its just has the shape [hidden_layer2, output], then what happened to the other layers? As per their website description about random_categorical it only takes a 2-D input. The complete code of openAI's VPG algorithm can be found here. The mlp is implemented here. I would be highly grateful if someone would just tell me what this mlp_categorical_policy() is doing?
Note: The hidden size is [64, 64], the action dimension is 3
Thanks and cheers
Note that this is a discrete action space - there are action_space.n different possible actions at every step, and the agent chooses one.
To do this the MLP is returning the logits (which are a function of the probabilities) of the different actions. This is specified in the code by + [act_dim] which is appending count of the action_space as the final MLP layer. Note that the last layer of an MLP is the output layer. The input layer is not specified in tensorflow, it is inferred from the inputs.
tf.random.categorical takes the logits and samples a policy action pi from them, which is returned as a number.
mlp_categorical_policy also returns logp, the log probability of the action a (used to assign credit), and logp_pi, the log probability of the policy action pi.
It seems your question is more about the return from the mlp.
The mlp creates a series of fully connected layers in a loop. In each iteration of the loop, the mlp is creating a new layer using the previous layer x as an input and assigning it's output to overwrite x, with this line x = tf.layers.dense(inputs=x, units=h, activation=activation).
So the output is not the same as the input, on each iteration x is overwritten with the value of the new layer. This is the same kind of coding trick as x = x + 1, which increments x by 1. This effectively chains the layers together.
The output of tf.layers.dense is a tensor of size [:,h] where : is the batch dimension (and can usually be ignored). The creation of the last layer happens outisde the loop, it can be seen that the number of nodes in this layer is act_dim (so shape is [:,3]). You can check the shape by doing this:
import tensorflow.compat.v1 as tf
import numpy as np
def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
for h in hidden_sizes[:-1]:
x = tf.layers.dense(x, units=h, activation=activation)
return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)
obs = np.array([[1.0,2.0]])
logits = mlp(obs, [64, 64, 3], tf.nn.relu, None)
print(logits.shape)
result: TensorShape([1, 3])
Note that the observation in this case is [1.,2.], it is nested inside a batch of size 1.

autocorrelation of the input in tensorflow/keras

I have a 1D input signal. I want to compute autocorrelation as the part of the neural net for further use inside the network.
I need to perform convolution of input with input itself.
To perform convolution in keras custom layer/ tensorflow. We need the following parameters
data shape is "[batch, in_height, in_width, in_channels]",
filter shape is "[filter_height, filter_width, in_channels, out_channels]
There is no batch present in filter shape, which needs to be input in my case
TensorFlow now has an auto_correlation function. It should be in release 1.6. If you build from source you can use it right now (see e.g. the github code).
Here is a possible solution.
By self convolution, I understood a regular convolution where the filter is exactly the same as the input (if it's not that, sorry for my misunderstanding).
We need a custom function for that, and a Lambda layer.
At first I used padding = 'same' which brings outputs with the same length as the inputs. I'm not sure about what output length you want exactly, but if you want more, you should add padding yourself before doing the convolution. (In the example with length 7, for a complete convolution from one end to another, this manual padding would include 6 zeros before and 6 zeros after the input length, and use padding = 'valid'. Find the backend functions here)
Working example - Input (5,7,2)
from keras.models import Model
from keras.layers import *
import keras.backend as K
batch_size = 5
length = 7
channels = 2
channels_batch = batch_size*channels
def selfConv1D(x):
#this function unfortunately needs to know previously the shapes
#mainly because of the for loop, for other lines, there are workarounds
#but these workarounds are not necessary since we'll have this limitation anyway
#original x: (batch_size, length, channels)
#bring channels to the batch position:
x = K.permute_dimensions(x,[2,0,1]) #(channels, batch_size, length)
#suppose channels are just individual samples (since we don't mix channels)
x = K.reshape(x,(channels_batch,length,1))
#here, we get a copy of x reshaped to match filter shapes:
filters = K.permute_dimensions(x,[1,2,0]) #(length, 1, channels_batch)
#now, in the lack of a suitable available conv function, we make a loop
allChannels = []
for i in range (channels_batch):
f = filters[:,:,i:i+1]
allChannels.append(
K.conv1d(
x[i:i+1],
f,
padding='same',
data_format='channels_last'))
#although channels_last is my default config, I found this bug:
#https://github.com/fchollet/keras/issues/8183
#convolution output: (1, length, 1)
#concatenate all results as samples
x = K.concatenate(allChannels, axis=0) #(channels_batch,length,1)
#restore the original form (passing channels to the end)
x = K.reshape(x,(channels,batch_size,length))
return K.permute_dimensions(x,[1,2,0]) #(batch_size, length, channels)
#input data for the test:
x = np.array(range(70)).reshape((5,7,2))
#little model that just performs the convolution
inp= Input((7,2))
out = Lambda(selfConv1D)(inp)
model = Model(inp,out)
#checking results
p = model.predict(x)
for i in range(5):
print("x",x[i])
print("p",p[i])
You can just use tf.nn.conv3d by treating the "batch size" as "depth":
# treat the batch size as depth.
data = tf.reshape(input_data, [1, batch, in_height, in_width, in_channels])
kernel = [filter_depth, filter_height, filter_width, in_channels, out_channels]
out = tf.nn.conv3d(data, kernel, [1,1,1,1,1], padding='SAME')

MulticellRNN Error with Tensorflow

I have Tensorflow 1.1.0
And am following a basic tutorial from https://github.com/ageron/handson-ml/blob/master/14_recurrent_neural_networks.ipynb on Multicell (stacked) RNNs
The following code generates an odd error and I can't figure out why from searching.
import tensorflow as tf
n_inputs = 2
n_neurons = 100
n_layers = 3
n_steps = 5
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
multi_layer_cell = tf.contrib.rnn.MultiRNNCell([basic_cell for _ in range(n_layers)])
outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)
init = tf.global_variables_initializer()
This seems to be the correct code but it gets the error:
ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope 'rnn/multi_rnn_cell/cell_0/basic_rnn_cell', this attempt is with scope 'rnn/multi_rnn_cell/cell_1/basic_rnn_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([BasicRNNCell(...)] * num_layers), change to: MultiRNNCell([BasicRNNCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)
I bolded a part that I made the switch to from the code in the Github.
Any idea on why this is still throwing an error?
Help is much appreciated!
In this line of code:
multi_layer_cell = tf.contrib.rnn.MultiRNNCell([basic_cell for _ in range(n_layers)])
The basic_cell variable needs to be replaced with a function. When you use a variable, each layer of the n_layers network will be using the same set of weights. Instead, trying to use a function:
def create_rnn_layer(n_neurons):
return tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
Then replace basic_cell with create_rnn_layer(n_neurons). This way, when the network is created. The function will be called n_layers of times, hence will be using different weights.

Dynamic shape and # of layers based on placeholders in TensorfFlow

Is it possible to create a shape based on a placeholder?
I have 2 use cases:
size = tf.placeholder(tf.int32, name="size")
x = tf.placeholder(tf.float32, [None, size], name="x")
And:
shapes = tf.placeholder(tf.int32, [None], name="shapes")
tf.contrib.layers.fully_connected(
inputs=x,
num_outputs=shapes[-1]
)
The first one I guess I can "fix" with [None, None] (not sure about performance penalties in such a case).
For the second one I have no idea?
The why: I want to build and export the Graph using Python and then read it for training/predictions using the Java API. I don't want to prepare a file for every single hidden layer combination so I wanted to export only a single "template" graph, something like this:
def fc(x, shape):
return tf.contrib.layers.fully_connected(inputs=x, num_outputs=shape[-1])
def body(x, hidden_layers, i):
# create a FC layer with shape=[hidden_layer[i], hidden_layer[i+1]]
out = fc(x, [tf.slice(hidden_layers, [i], [1]), tf.slice(hidden_layers, [i+1], [1])])
out = tf.tanh(out)
return out, hidden_layers, i + 1
def condition(x, hidden_layers, i):
return i < (len(hidden_layers) - 1)
# i.e. [200,200] or [50,50,50] etc.
hidden_layers = tf.placeholder(tf.int32, [None], "hidden_layers")
# loop counter
i = tf.constant(0, dtype='int32')
# loop creating the network
out = tf.while_loop(condition, body, [x, hidden_layers, i])
And then the Java user would simply feed an array with the hidden network config. But I am getting ValueError: ('num_outputs should be int or long, got %s.', <tf.Tensor 'while/fc/Slice_1:0' shape=(1,) dtype=int32>) when trying to generate the graph.
Unfortunately I don't think what you want is possible. What would happen with a dynamic number of hidden layers? Randomly spawning extra (untrained) layers is not something that is usual with neural networks.
Maybe clarifying your question would work. For example: lets say you either want to have two or three hidden layers, there are some possibilities. You could have two different sets of weights that either from layer two to layer X, or from layer two to layer three to layer X. If you want something like this, check tf.cond on this page: https://www.tensorflow.org/api_guides/python/control_flow_ops
Good luck!

TensorFlow error: logits and labels must be same size

I've been trying to learn TensorFlow by implementing ApproximatelyAlexNet based on various examples on the internet. Basically extending the AlexNet example here to take 224x224 RGB images (rather than 28x28 grayscale images), and adding a couple more layers, changing kernel sizes, strides, etc, per other AlexNet implementations I've found online.
Have worked through a number of mismatched shape type errors, but this one has me stumped:
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=dim { size: 49 } dim { size: 10 } labels_size=dim { size: 1 } dim { size: 10 }
[[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Softmax, _recv_Placeholder_1_0/_13)]]
[[Node: gradients/Mean_grad/range_1/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_457_gradients/Mean_grad/range_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
The 49 dimension is particularly puzzling. For debugging, my batch size is currently 1, if I increase it to 2 then the 49 becomes 98.
If I log the shape of the x and y that I pass to
sess.run(optimizer, feed_dict={x: batchImages, y: batchLabels, keepProb: P_DROPOUT})
I get
x shape: (1, 150528)
y shape: (1, 10)
Which is as expected: 150528 = 224 * 224 RGB pixels, and a one-hot vector representing my labels.
Would appreciate any help in figuring this out!
Update: code exhibiting the fault here:
https://gist.github.com/j4m3z0r/e70096d0f7bd4bd24c42
Thanks for sharing your code as a Gist. There are two changes that are necessary to make the shapes agree:
The line:
fc1 = tf.reshape(pool5, [-1, wd1Shape[0]])
...is responsible for the erroneous 49 in the batch dimension. The input is 1 x 7 x 7 x 256, and it is reshaped to be 49 x 256, because wd1Shape[0] is 256. One possible replacement is the following:
pool5Shape = pool5.get_shape().as_list()
fc1 = tf.reshape(pool5, [-1, pool5Shape[1] * pool5Shape[2] * pool5Shape[3]])
...which will give fc1 the shape 1 x 12544.
After making this change, the size of the 'wd1' weight matrix (256 x 4096) doesn't match the number of nodes in fc1. You could change the definition of this matrix as follows:
'wd1': tf.Variable(tf.random_normal([12544, 4096])),
...although you may want to modify the other weights, or perform additional pooling to reduce the size of this matrix.
I had a similar issue when using model.fit(..).
Turns out my output_size was defined as 2 while using "binary_crossentropy" as the loss function, when it should have been defined as 1.
Given that you didn't provide an actual code you are using it's hard to exactly say what's wrong.
Here are some general tips for debugging such problems:
Add print(tensor.get_shape()) to the places related to the problem (in your case dense2, out, _weights['out'], _biases['out'] are suspects).
Make sure your matrix multiplications are in right order (e.g. dense2 by _weights['out'], should result in batch_size x 10 matrix).
If you modified the code in AlexNet you linked, you probably have changed next lines:
dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]]) # Reshape conv3 output to fit dense layer input
dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1') # Relu activation
dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation
out = tf.matmul(dense2, _weights['out']) + _biases['out']
Probably shape of dense2 is [49, 1024] in your case. You can check by adding a print dense2.get_shape(). You should prints for shapes for all the tensors until you find one that gets 49. I can only guess what you changed, but it can be one of reshapes.
This issue is because your class variable and label is not matching.
ex:-
In your code, You declared class variable as 10
But label might not be 10.
Once you made your class variable and label same dimension. This issue will be resolved.