Why does Keras.backend.flatten not show proper dimension? I have the following:
x is <tf.Tensor 'concat_8:0' shape=(?, 4, 8, 62) dtype=float32>
After:
Keras.backend.flatten(x)
x becomes: <tf.Tensor 'Reshape_22:0' shape=(?,) dtype=float32>
Why is x not of shape=(?, 4*8*62)
EDIT-1
I get (?, ?) if I use batch_flatten (branch3x3 & branch5x5 below are tensors from previous convolutions):
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x = Lambda(lambda v: K.batch_flatten(v))(x)
Result of first Lambda is <tf.Tensor 'lambda_144/concat:0' shape=(?, 4, 8, 62) dtype=float32>
Result of second Lambda is <tf.Tensor 'lambda_157/Reshape:0' shape=(?, ?) dtype=float32>
EDIT-2
Tried batch_flatten but get an error downstream when I build the model output (using reshape instead of batch_flatten seems to work). branch3x3 is <tf.Tensor 'conv2d_202/Elu:0' shape=(?, 4, 8, 30) dtype=float32>, and branch5x5 is <tf.Tensor 'conv2d_203/Elu:0' shape=(?, 4, 8, 32) dtype=float32>:
from keras import backend as K
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x = Lambda(lambda v: K.batch_flatten(v))(x)
y = Conv1D(filters=2, kernel_size=4)(Input(shape=(4, 1)))
y = Lambda(lambda v: K.batch_flatten(v))(y)
z = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=1))([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2], outputs=output)
The output statement results in the following error for the kernel_initializer: TypeError: Failed to convert object of type to Tensor. Contents: (None, 32). Consider casting elements to a supported type.
From the docstring of flatten:
def flatten(x):
"""Flatten a tensor.
# Arguments
x: A tensor or variable.
# Returns
A tensor, reshaped into 1-D
"""
So it turns a tensor with shape (batch_size, 4, 8, 62) into a 1-D tensor with shape (batch_size * 4 * 8 * 62,). That's why your new tensor has a 1-D shape (?,).
If you want to keep the first dimension, use batch_flatten:
def batch_flatten(x):
"""Turn a nD tensor into a 2D tensor with same 0th dimension.
In other words, it flattens each data samples of a batch.
# Arguments
x: A tensor or variable.
# Returns
A tensor.
"""
EDIT: You see the shape being (?, ?) because the shape is determined dynamically at runtime. If you feed in a numpy array, you can easily verify that the shape is correct.
input_tensor = Input(shape=(4, 8, 62))
x = Lambda(lambda v: K.batch_flatten(v))(input_tensor)
print(x)
Tensor("lambda_1/Reshape:0", shape=(?, ?), dtype=float32)
model = Model(input_tensor, x)
out = model.predict(np.random.rand(32, 4, 8, 62))
print(out.shape)
(32, 1984)
EDIT-2:
From the error message, it seems that TruncatedNormal requires a fixed output shape from the previous layer. So the dynamic shape (None, None) from batch_flatten won't work.
I can think of two options:
Provide manually computed output_shape to the Lambda layers:
x = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=3))([branch3x3, branch5x5])
x_shape = (np.prod(K.int_shape(x)[1:]),)
x = Lambda(lambda v: K.batch_flatten(v), output_shape=x_shape)(x)
input_y = Input(shape=(4, 1))
y = Conv1D(filters=2, kernel_size=4)(input_y)
y_shape = (np.prod(K.int_shape(y)[1:]),)
y = Lambda(lambda v: K.batch_flatten(v), output_shape=y_shape)(y)
z = Lambda(lambda v: K.concatenate([v[0], v[1]], axis=1))([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2, input_y], outputs=output)
Use the Flatten layer (which calls batch_flatten and computes the output shape inside of it):
x = Concatenate(axis=3)([branch3x3, branch5x5])
x = Flatten()(x)
input_y = Input(shape=(4, 1))
y = Conv1D(filters=2, kernel_size=4)(input_y)
y = Flatten()(y)
z = Concatenate(axis=1)([x, y])
output = Dense(32, kernel_initializer=TruncatedNormal(), activation='linear')(z)
cnn = Model(inputs=[m1, m2, input_y], outputs=output)
I'd prefer the latter as it makes the code less cluttered. Also,
You can replace the Lambda layer wrapping K.concatenate() with a Concatenate layer.
Remember to move the Input(shape=(4, 1)) out and provide it in your Model(inputs=...) call.
Related
I am trying to make a spatio temporal graph convolutional network where a gcn layer is sandwiched between two tomporal cnn layer. The code is following:
inputs = Input(shape=(train_x.shape[1],train_x.shape[2],train_x.shape[3]), batch_size=None)
# temporal convolution
y = tf.keras.layers.Conv1D(128, 9, activation='relu')(inputs)
#graph convolution
y = tf.keras.layers.Conv2D(32, (1,1), activation='relu')(y)
n, v, t, kc = y.shape
y = tf.reshape(y,(n, 1, kc//1, t, v))
y = tf.einsum('nkctv,kvw->nwtc', y, AD_tensor)
#temporal convolution
y = tf.keras.layers.Conv1D(16, 9, activation='relu')(y)
concat = Flatten()(y)
fc = Dense(units=80, activation='relu')(concat)
fc1 = Dense(units=40, activation='relu')(fc)
fc2 = Dense(units=40, activation='relu')(fc1)
fc3 = Dense(units=80, activation='relu')(fc2)
out = Dense(1, activation = 'sigmoid')(fc3)
model = Model(inputs, out)
model.compile(loss='mse', optimizer= Adam(lr=0.0001))
model.fit(train_x, train_y, validation_data = (valid_x,valid_y), epochs=300, batch_size=2)
When I run this code it shows me this type error:
TypeError: Failed to convert object of type <class 'tuple'> to Tensor.
Contents: (None, 1, 32, 72, 25). Consider casting elements to a supported type.
To use Tensorflow operations with Keras layers, you should wrap them in a Lambda layer as such. The Lambda layer takes a function as its argument.
y = tf.keras.layers.Lambda(lambda x: tf.reshape(n, v, t, kc))(y)
However, for reshaping, Keras already provides a layer for this operation, so you could do
y = tf.keras.layers.Reshape(shape=(v, t, kc))(y)
The layer version of reshaping already takes into account the batch dimension, so you only need to specify the other dimensions.
For the einsum operation, you can use
y = tf.keras.layers.Lambda(lambda x: tf.einsum('nkctv,kvw->nwtc', x[0], x[1]))([y, AD_tensor])
When implementing lambda-opt(an algorithm published on KDD'19) in tensorflow, I came across a problem to compute gradients with tf.scatter_sub。
θ refers to an embedding matrix for docid.
The formulation is
θ(t+1)=θ(t) - α*(grad+2*λ*θ),
delta = theta_grad_no_reg.values * lr + 2 * lr * cur_scale * cur_theta
next_theta_tensor = tf.scatter_sub(theta,theta_grad_no_reg.indices,delta)
then I use θ(t+1) for some computation. Finally, I want to compute gradients with respect to λ, not θ.
But the gradient is None.
I wrote a demo like this:
import tensorflow as tf
w = tf.constant([[1.0], [2.0], [3.0]], dtype=tf.float32)
y = tf.constant([5.0], dtype=tf.float32)
# θ
emb_matrix = tf.get_variable("embedding_name", shape=(10, 3),
initializer=tf.random_normal_initializer(),dtype=tf.float32)
# get one line emb
cur_emb=tf.nn.embedding_lookup(emb_matrix,[0])
# The λ matrix
doc_lambda = tf.get_variable(name='docid_lambda', shape=(10, 3),
initializer=tf.random_normal_initializer(), dtype=tf.float32)
# get one line λ
cur_lambda=tf.nn.embedding_lookup(doc_lambda, [0])
# θ(t+1) Tensor("ScatterSub:0", shape=(10, 3), dtype=float32_ref)
next_emb_matrix=tf.scatter_sub(emb_matrix, [0], (cur_emb *cur_lambda))
# do some compute with θ(t+1) Tensor ,not Variable
next_cur_emb=tf.nn.embedding_lookup(next_emb_matrix,[0])
y_ = tf.matmul(next_cur_emb, w)
loss = tf.reduce_mean((y - y_) ** 2)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
grad_var_list=optimizer.compute_gradients(loss)
print(grad_var_list)
# [(None, <tf.Variable 'embedding_name:0' shape=(10, 3) dtype=float32_ref>), (None, <tf.Variable 'docid_lambda:0' shape=(10, 3) dtype=float32_ref>)]
The gradient is None, too. It seems that tf.scatter_sub op doesn't provide gradient?
Thanks for your help!
If you have an interest in this algorithm, you can search for it, but it's not important about this question.
I try to implement a special DNN architecture to be used for physics-informed machine learning. As you may know, in this architecture, partial differential equations are integrated into the loss function. The architecture of interest is plotted bellow:
As you may find, this special architecture allow us to evaluate differential operations such as K.gradients(model.outputs, model.inputs[0]) which is the partial derivative of Txy with respect to x, and have it as part of the loss function.
Now, I would like to have the following architecture:
As you can find, this network has (x,y) as the input, Txy as an output, followed by gradient operations on Txy and then (Uxy, V_xy) as the final outputs. This architecture, however, results in the following error:
raise ValueError('An operation has `None` for gradient. ')
When I check the gradients of the loss with respect to the weights, I find that they are None right at the layers I define (Grad_Txy_x, Grad_Txy,y).
Anyone knows what is the source of this error? How can I have intermediate layers that are some derivatives of some other layer with respect to the inputs?
Edited:
You can try the following code:
import numpy as np
import keras as k
import tensorflow as tf
def custom_gradient(y, x):
return tf.gradients(y, x, unconnected_gradients='zero')
x = k.layers.Input(shape=(1,), name='x')
y = k.layers.Input(shape=(1,), name='y')
lay = k.layers.Dense(50, name='lay1')(k.layers.concatenate([x,y]))
lay = k.layers.Activation('tanh', name='tanh')(lay)
lay = k.layers.Dense(50, name='lay2')(lay)
Txy = k.layers.Dense(1, name='Txy')(lay)
dT_dx = k.layers.Lambda(lambda F: custom_gradient(F, x)[0], name='dTxy_dx')
dT_dx = dT_dx(Txy)
dT_dy = k.layers.Lambda(lambda F: custom_gradient(F, y)[0], name='dTxy_dy')
dT_dy = dT_dy(Txy)
lay = k.layers.Dense(50, name='lay3')(k.layers.concatenate([dT_dx, dT_dy]))
Uxy = k.layers.Dense(1, name='Uxy')(lay)
Vxy = k.layers.Dense(1, name='Vxy')(lay)
model = k.models.Model([x,y], [Uxy, Vxy])
model.compile(optimizer='adam', loss='mse')
k.utils.plot_model(model, show_shapes=False, to_file='output.png')
for lay in model.layers:
print(k.backend.gradients(model.total_loss, lay.output))
model.fit([np.ones((10,1)), np.ones((10,1))],
[np.ones((10,1)), np.ones((10,1))])
The gradient evaluations show that the lambda layer is returning wrong gradients:
[<tf.Tensor 'gradients/concatenate_1/concat_grad/Slice:0' shape=(?, 1) dtype=float32>]
[<tf.Tensor 'gradients_1/concatenate_1/concat_grad/Slice_1:0' shape=(?, 1) dtype=float32>]
[<tf.Tensor 'gradients_2/lay1/MatMul_grad/MatMul:0' shape=(?, 2) dtype=float32>]
[<tf.Tensor 'gradients_3/tanh/Tanh_grad/TanhGrad:0' shape=(?, 50) dtype=float32>]
[<tf.Tensor 'gradients_4/AddN_1:0' shape=(?, 50) dtype=float32>]
[None]
[None]
[<tf.Tensor 'gradients_7/concatenate_2/concat_grad/Slice:0' shape=(?, 1) dtype=float32>]
[<tf.Tensor 'gradients_8/concatenate_2/concat_grad/Slice_1:0' shape=(?, 1) dtype=float32>]
[<tf.Tensor 'gradients_9/lay3/MatMul_grad/MatMul:0' shape=(?, 2) dtype=float32>]
[<tf.Tensor 'gradients_10/AddN:0' shape=(?, 50) dtype=float32>]
[<tf.Tensor 'gradients_11/loss/Uxy_loss/sub_grad/Reshape:0' shape=(?, 1) dtype=float32>]
[<tf.Tensor 'gradients_12/loss/Vxy_loss/sub_grad/Reshape:0' shape=(?, 1) dtype=float32>]
I added two non-Linear functions right before the Lambda function (I commented them out so you can spot them easily). You don't necessarily have to add Tanh function (could be any non-linearity, e.g. Sigmoid).
import numpy as np
import keras as k
import tensorflow as tf
def custom_gradient(y, x):
return tf.gradients(y, x, unconnected_gradients='zero')
x = k.layers.Input(shape=(1,), name='x')
y = k.layers.Input(shape=(1,), name='y')
lay = k.layers.Dense(50, name='lay1')(k.layers.concatenate([x,y]))
lay = k.layers.Activation('tanh', name='tanh')(lay)
lay = k.layers.Dense(50, name='lay2')(lay)
#lay = k.layers.Activation('tanh')(lay)
Txy = k.layers.Dense(1, name='Txy')(lay)
#Txy = k.layers.Activation('tanh')(Txy)
dT_dx = k.layers.Lambda(lambda F: custom_gradient(F, x)[0], name='dTxy_dx')
dT_dx = dT_dx(Txy)
dT_dy = k.layers.Lambda(lambda F: custom_gradient(F, y)[0], name='dTxy_dy')
dT_dy = dT_dy(Txy)
lay = k.layers.Dense(50, name='lay3')(k.layers.concatenate([dT_dx, dT_dy]))
Uxy = k.layers.Dense(1, name='Uxy')(lay)
Vxy = k.layers.Dense(1, name='Vxy')(lay)
model = k.models.Model([x,y], [Uxy, Vxy])
model.compile(optimizer='adam', loss='mse')
k.utils.plot_model(model, show_shapes=False, to_file='output.png')
for lay in model.layers:
print(k.backend.gradients(model.total_loss, lay.output))
model.fit([np.ones((10,1)), np.ones((10,1))],
[np.ones((10,1)), np.ones((10,1))])
I don't know if it is kind of bug or an error.
I have also reported this issue here.
The thing I am trying to do is that I want to make my custom LSTM statefull.
So this code running fine without adding return_state=True. Once I add this to the code it raises this error : The two structures don't have the same nested structure.
This is a reproducible code:
from keras.layers import Lambda
import keras
import numpy as np
import tensorflow as tf
SEQUENCE_LEN = 45
LATENT_SIZE = 20
EMBED_SIZE = 50
VOCAB_SIZE = 100
BATCH_SIZE = 10
def rev_entropy(x):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
count = tf.cast(count,tf.float32)
prob = count / tf.reduce_sum(count)
prob = tf.cast(prob,tf.float32)
rev = -tf.reduce_sum(prob * tf.log(prob))
return rev
nw = tf.reduce_sum(x,axis=1)
rev = tf.map_fn(row_entropy, x)
rev = tf.where(tf.is_nan(rev), tf.zeros_like(rev), rev)
rev = tf.cast(rev, tf.float32)
max_entropy = tf.log(tf.clip_by_value(nw,2,LATENT_SIZE))
concentration = (max_entropy/(1+rev))
new_x = x * (tf.reshape(concentration, [BATCH_SIZE, 1]))
return new_x
inputs = keras.layers.Input(shape=(SEQUENCE_LEN,), name="input")
embedding = keras.layers.Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE,return_state=True), merge_mode="sum", name="encoder_lstm")(embedding)
encoded = Lambda(rev_entropy)(encoded)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(EMBED_SIZE, return_sequences=True,return_state=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
x = np.random.randint(0, 90, size=(10, 45))
print(x.shape)
y = np.random.normal(size=(10, 45, 50))
print(y.shape)
history = autoencoder.fit(x, y, epochs=1)
Update1
After applying the idea of the comment tf.map_fn(row_entropy, encoded,dtype=tf.float32), I received a new error:
ValueError: Layer repeater expects 1 inputs, but it received 5 input tensors. Input received: [<tf.Tensor 'encoder_lstm/add_16:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while/Exit_3:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while/Exit_4:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while_1/Exit_3:0' shape=(?, 20) dtype=float32>, <tf.Tensor 'encoder_lstm/while_1/Exit_4:0' shape=(?, 20) dtype=float32>]
Also, consider that this error raises even without that lambda layer, So it seems there is something else wrong.
If I try encoded.shape, it says encoded is a list with length 5 however it has to be a tensor with (batch_size, latent size)!!!
everything is fine without adding return_state=True
Any help s appreciated!
I was trying to implement various GANs in Tensorflow (after doing it successfully in PyTorch), and I am having some problems while coding the discriminator part.
The code of the discriminator (very similar to the MNIST CNN tutorial) is:
def discriminator(x):
"""Compute discriminator score for a batch of input images.
Inputs:
- x: TensorFlow Tensor of flattened input images, shape [batch_size, 784]
Returns:
TensorFlow Tensor with shape [batch_size, 1], containing the score
for an image being real for each input image.
"""
with tf.variable_scope("discriminator"):
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
h_1 = leaky_relu(tf.layers.conv2d(x, 32, 5))
m_1 = tf.layers.max_pooling2d(h_1, 2, 2)
h_2 = leaky_relu(tf.layers.conv2d(m_1, 64, 5))
m_2 = tf.layers.max_pooling2d(h_2, 2, 2)
m_2 = tf.contrib.layers.flatten(m_2)
h_3 = leaky_relu(tf.layers.dense(m_2, 4*4*64))
logits = tf.layers.dense(h_3, 1)
return logits
while the code for the generator (architecture of InfoGAN paper) is:
def generator(z):
"""Generate images from a random noise vector.
Inputs:
- z: TensorFlow Tensor of random noise with shape [batch_size, noise_dim]
Returns:
TensorFlow Tensor of generated images, with shape [batch_size, 784].
"""
with tf.variable_scope("generator"):
batch_size = tf.shape(z)[0]
fc = tf.nn.relu(tf.layers.dense(z, 1024))
bn_1 = tf.layers.batch_normalization(fc)
fc_2 = tf.nn.relu(tf.layers.dense(bn_1, 7*7*128))
bn_2 = tf.layers.batch_normalization(fc_2)
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
c_1 = tf.nn.relu(tf.contrib.layers.convolution2d_transpose(bn_2, 64, 4, 2, padding='valid'))
bn_3 = tf.layers.batch_normalization(c_1)
c_2 = tf.tanh(tf.contrib.layers.convolution2d_transpose(bn_3, 1, 4, 2, padding='valid'))
So far, so good. The number of parameters is correct (checked it). However, I am having some problems in the next block of code:
tf.reset_default_graph()
# number of images for each batch
batch_size = 128
# our noise dimension
noise_dim = 96
# placeholder for images from the training dataset
x = tf.placeholder(tf.float32, [None, 784])
# random noise fed into our generator
z = sample_noise(batch_size, noise_dim)
# generated images
G_sample = generator(z)
with tf.variable_scope("") as scope:
#scale images to be -1 to 1
logits_real = discriminator(preprocess_img(x))
# Re-use discriminator weights on new inputs
scope.reuse_variables()
logits_fake = discriminator(G_sample)
# Get the list of variables for the discriminator and generator
D_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'discriminator')
G_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'generator')
# get our solver
D_solver, G_solver = get_solvers()
# get our loss
D_loss, G_loss = gan_loss(logits_real, logits_fake)
# setup training steps
D_train_step = D_solver.minimize(D_loss, var_list=D_vars)
G_train_step = G_solver.minimize(G_loss, var_list=G_vars)
D_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'discriminator')
G_extra_step = tf.get_collection(tf.GraphKeys.UPDATE_OPS, 'generator')
The problem I am getting is where I am doing the reshape in the discriminator, and the error says:
ValueError: None values not supported.
Sure, the value for the batch_size is None (btw, the same error I am getting even where I am changing it to some number), but shape function (as far as I understand) should get the dynamic shape, not the static one. I think that I am a bit lost here.
For what is worth, I am giving here the link to the entire notebook I am working: https://github.com/TheRevanchist/GANs/blob/master/GANs-TensorFlow.ipynb if someone wants to look at it.
NB: The code here is part of the Stanford CS231n assignment. I have no affiliation with Stanford though, so it isn't homework cheating (proof: the course is finished months ago).
The generator seems to be the problem. The output size should match the discriminator. And the other issues are batch norm should be applied before the activation unit. I have modified the code:
with tf.variable_scope("generator"):
fc = tf.layers.dense(z, 4*4*128)
bn_1 = leaky_relu(tf.layers.batch_normalization(fc))
bn_1 = tf.reshape(bn_1, [-1, 4, 4, 128])
c_1 = tf.layers.conv2d_transpose(bn_1, 64, 5, strides=2, padding='same')
bn_2 = leaky_relu(tf.layers.batch_normalization(c_1))
c_2 = tf.layers.conv2d_transpose(bn_2, 32, 5, strides=2, padding='same')
bn_3 = leaky_relu(tf.layers.batch_normalization(c_2))
c_3 = tf.layers.conv2d_transpose(bn_3, 1, 5, strides=2, padding='same')
c_3 = tf.layers.batch_normalization(c_3)
c_3 = tf.image.resize_images(c_3, (28, 28))
c_3 = tf.contrib.layers.flatten(c_3)
c_3 = tf.tanh(c_3)
return c_3
Your code gives the below output when run with the above changes
Instead of passing None to reshape you must pass -1.
So this:
x = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
becomes
x = tf.reshape(x, [-1, 28, 28, 1])
and this:
bn_2 = tf.reshape(bn_2, [batch_size, 7, 7, 128])
becomes:
bn_2 = tf.reshape(bn_2, [-1, 7, 7, 128])
It will infer the batch size from the rest of the shape you provided.