ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2) - tensorflow

So when I was trying to build a lstm network, every time it tells me that "ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)".
My dataset has more than 1000 samples, 2 features, and I set the time_step as 12.
I have already reshaped my dataset to 3-dim, however, the error tells that my last layer-Dense layer(I use this layer as output) expected a 2-dimention array.What shell I do?
My codes are as follows:
# read train set
readColsPro = (7, 20)
filename = 'train_set.txt'
xProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[0, 1200])
yProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
xProTrain_1 = xProTrain_1.reshape(xProTrain_1.shape[0], 2)
yProTrain_1 = yProTrain_1.reshape(yProTrain_1.shape[0], 2)
# erase 'nan' datas
for i in xProTrain_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTrain_1:
if np.isnan(i[1]):
i[1] = 0
# read test set
xProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
yProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[2, 1202])
xProTest_1 = np.reshape(xProTest_1, (xProTest_1.shape[0], xProTest_1.shape[1]))
yProTest_1 = np.reshape(yProTest_1, (yProTest_1.shape[0], yProTest_1.shape[1]))
for i in xProTest_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTest_1:
if np.isnan(i[1]):
i[1] = 0
# parameters
timeStepPro = 12
epoch = 24
batch_size = 24
trainNumPro = xProTrain_1.shape[0]
testNumPro = yProTrain_1.shape[0]
# reshape datas to 3D
xProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
xProTrain_2.append(xProTrain_1[i - timeStepPro:i])
xProTrain_2 = np.array(xProTrain_2)
yProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
yProTrain_2.append(yProTrain_1[i - timeStepPro:i])
yProTrain_2 = np.array(yProTrain_2)
print(xProTrain_2.shape)
print(yProTrain_2.shape)
# reshape datas to 3D
xProTest_2 = []
for i in range(timeStepPro, trainNumPro):
xProTest_2.append(xProTest_1[i - timeStepPro:i])
xProTest_2 = np.array(xProTest_2)
yProTest_2 = []
for i in range(timeStepPro, trainNumPro):
yProTest_2.append(yProTest_1[i - timeStepPro:i])
yProTest_2 = np.array(yProTest_2)
# define network
modelA = Sequential()
modelA.add(LSTM(units=64, return_sequences=True,
input_shape=[xProTrain_2.shape[1], 2]))
modelA.add(BatchNormalization())
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=256, return_sequences=True))
modelA.add(LSTM(units=64, return_sequences=False))
modelA.add(Dense(units=2, activation='relu'))
modelA.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
modelA.fit(x=xProTrain_2, y=yProTrain_2, epochs=epoch, batch_size=batch_size)
Error message are as follows:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)

Related

Keras Functional API Multiple Input Shape Errors

My goal is to use a CNN to go through a picture, then add an array of extra data before the dense layers.
picIn = keras.Input(shape=x[0].shape)
conv1 = layers.Conv2D(32,kernel_size=3,padding='same',use_bias=False)(picIn)
batch1 = layers.BatchNormalization()(conv1)
leaky1 = layers.LeakyReLU(alpha=.3)(batch1)
conv2 = layers.Conv2D(32,kernel_size=3,padding='same',use_bias=False)(leaky1)
batch2 = layers.BatchNormalization()(conv2)
leaky2 = layers.LeakyReLU(alpha=.3)(batch2)
cdrop1 = layers.Dropout(.20)(leaky2)
conv3= layers.Conv2D(64,kernel_size=3,padding='same',use_bias=False)(cdrop1)
batch3 = layers.BatchNormalization()(conv3)
leaky3 = layers.LeakyReLU(alpha=.3)(batch3)
conv4 = layers.Conv2D(64,kernel_size=3,padding='same',use_bias=False)(leaky3)
batch4 = layers.BatchNormalization()(conv4)
leaky4 = layers.LeakyReLU(alpha=.3)(batch4)
cdrop2 = layers.Dropout(.20)(leaky4)
flat1 = layers.Flatten()(cdrop2)
rtheta1 = rtheta[trainCut]
rtheta1 = rtheta1.reshape(467526,1)
rtheta2 = rtheta[testCut]
rtheta2 = rtheta2.reshape(82247,1)
ip2 = keras.Input(shape=rtheta1.shape)
flat2 = layers.Flatten()(ip2)
merge = layers.Concatenate()([flat1,flat2])
hidden1 = layers.Dense(512,use_bias=False)(merge)
batch5 = layers.BatchNormalization()(hidden1)
leaky5 = layers.LeakyReLU(alpha=.3)(batch5)
ddrop1 = layers.Dropout(.20)(leaky5)
hidden2 = layers.Dense(512,use_bias=False)(ddrop1)
batch6 = layers.BatchNormalization()(hidden2)
leaky6 = layers.LeakyReLU(alpha=.3)(batch6)
ddrop2 = layers.Dropout(.20)(leaky6)
hidden3 = layers.Dense(512,use_bias=False)(merge)
batch7 = layers.BatchNormalization()(hidden1)
leaky7 = layers.LeakyReLU(alpha=.3)(batch5)
ddrop3 = layers.Dropout(.20)(leaky5)
output = layers.Dense(1)(ddrop3)
model = keras.Model(inputs = [picIn,ip2], outputs = output)
H = model.fit(x =[ x[trainCut],rtheta[trainCut]],y= y[trainCut],batch_size=args.bsize,validation_data=([x[testCut],rtheta[testCut]], y[testCut]),epochs=args.epochs)
I always get an error related to the shape of the inputs
Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 473926 but received input with shape [None, 6401]
Model was constructed with shape (None, 467526, 1) for input Tensor("input_2:0", shape=(None, 467526, 1), dtype=float32), but it was called on an input with incompatible shape (None, 1, 1).
Im confused on what exactly to do here.
x[traincut] is a matrix of size (467526,10,10,2)
rtheta1 is (467526,1) and so is y[traincut]
The validation data is the same except it is 82247 instead of 467526.
I have tried it without flattening after ip2 and I get a different error but I think the core issue is still the same.
Any help would be appreciated. Thanks!
Edit: The data was not the right shape, obviously, but I figured out how to fix it.
Are you ensuring that all of your training data's shape is uniform before you put it through and into the first tensor?

Error on tensorflow: Shape must be rank 2 but is rank 1 for 'MatMul_25'

I'm trying to create a conditional GAN. However, i'm stuck as to why no matter what i do, it appears the same error over and over again.
Here's the code:
image_dim = 784 #28 * 28
Y_dimension = 10
gen_hidd_dim = 256
disc_hidd_dim = 256
z_noise_dim =100 #input noise datapoint
def xavier_init(shape):
return tf.random_normal(shape = shape, stddev = 1/tf.sqrt(shape[0]/2.0))
weights = {
'disc_H' : tf.Variable(xavier_init([image_dim + Y_dimension, disc_hidd_dim])),
'disc_final' : tf.Variable(xavier_init([disc_hidd_dim, 1])),
'gen_H': tf.Variable([z_noise_dim + Y_dimension, gen_hidd_dim]),
'gen_final': tf.Variable(xavier_init([gen_hidd_dim, image_dim]))
}
bias = {
'disc_H': tf.Variable(xavier_init([disc_hidd_dim])),
'disc_final': tf.Variable(xavier_init([1])),
'gen_H': tf.Variable(xavier_init([gen_hidd_dim])),
'gen_final': tf.Variable(xavier_init([image_dim]))
}
Z_input = tf.placeholder(tf.float32, shape= [None, z_noise_dim ], name = 'input_noise')
Y_input = tf.placeholder(tf.float32, shape= [None, Y_dimension], name='Labels')
X_input = tf.placeholder(tf.float32, shape=[None, image_dim], name = 'real_input')
def Discriminator(x,y):
inputs = tf.concat(axis = 1, values = [x,y])
hidden_layer = tf.nn.relu(tf.add(tf.matmul(inputs, weights['disc_H']), bias['disc_H']))
final_layer = tf.add(tf.matmul(hidden_layer, weights['disc_final']), bias['disc_final'])
disc_output = tf.nn.sigmoid(final_layer)
return final_layer, disc_output
def Generator(x,y):
inputs = tf.concat(axis=1, values=[x,y])
hidden_layer = tf.nn.relu(tf.add(tf.matmul(tf.cast(inputs, tf.float32), tf.cast(weights['gen_H'], tf.float32)), tf.cast(bias['gen_H'],tf.float32)))
final_layer = tf.add(tf.matmul(hidden_layer, weights['gen_final']), bias['gen_final'])
gen_output = tf.nn.sigmoid(final_layer)
return gen_output
output_Gen = Generator(Z_input, Y_input)
Right after executing the Generator i get the following error:
ValueError: Shape must be rank 2 but is rank 1 for 'MatMul_25' (op: 'MatMul') with input shapes: [?,110], [2].
What to do?
I think you just missed one call to xavier_init() when initialising your weights.
You have this:
weights = {
'disc_H' : tf.Variable(xavier_init([image_dim + Y_dimension, disc_hidd_dim])),
'disc_final' : tf.Variable(xavier_init([disc_hidd_dim, 1])),
'gen_H': tf.Variable([z_noise_dim + Y_dimension, gen_hidd_dim]),
'gen_final': tf.Variable(xavier_init([gen_hidd_dim, image_dim]))
}
but I think you want this:
weights = {
'disc_H' : tf.Variable(xavier_init([image_dim + Y_dimension, disc_hidd_dim])),
'disc_final' : tf.Variable(xavier_init([disc_hidd_dim, 1])),
'gen_H': tf.Variable(xavier_init([z_noise_dim + Y_dimension, gen_hidd_dim])),
'gen_final': tf.Variable(xavier_init([gen_hidd_dim, image_dim]))
}
The error message was because weights['gen_H'] had shape [2] whereas you expected it to have shape [110, 256]. This meant that the call to tf.matmul() failed because it's impossible to matrix multiply a matrix with shape [m, 110] by a matrix of shape [2]

how can I use mxnet warpctc in right dimension

FC = mx.sym.FullyConnected(data=x_3,flatten=False, num_hidden=n_class)
x = mx.sym.softmax(data=FC)
sm_label = mx.sym.Reshape(data=label, shape=(-1,))
sm_label = mx.sym.Cast(data = sm_label, dtype = ‘int32’)
sm = mx.sym.WarpCTC(data=x, label=sm_label, label_length =n_len ,
input_length =rnn_length )
my x layer's shape[(32L, 35L, 27L)] (bacthsize,input_lenth,n_class)
label的shape[(32L,21L)] (batchsize,label_lenth)
warpctc
simple_bind error.
Arguments:
data: (32, 1L, 32L, 286L)
label: (32, 21L)
Error in operator warpctc48: Shape inconsistent, Provided = [672], inferred shape=[0,1]
What can I do?
MXNet repo has a WarpCTC example here. You can run the training using python lstm_ocr_train.py --gpu 1 --num_proc 4 --loss warpctc font/Ubuntu-M.ttf. In the example, here are the shapes of prediction and label used with WarpCTC operator:
Prediction is (10240, 11)
Label is (512,)
label_length: 4
input_length: 80
batch_size = 128
seq_length = 80
In the above case,
Prediction is (batch_size*seq_length, n_class).
Label is (batch_size*label_length,).
Following along the lines of the example, I would suggest calling WarpCTC with prediction shape = (1120, 27), label shape = (672,), label_length = 21, input_length = 35 in your case.

Foward pass in LSTM netwok learned by keras

I have the following code that I am hoping to get a forward pass from a 2 layer LSTM:
"""
this is a simple numerical example of LSTM forward pass to allow deep understanding
the LSTM is trying to learn the sin function by learning to predict the next value after a sequence of 3 inputs
example 1: {0.583, 0.633, 0.681} --> {0.725}, these values correspond to
{sin(35.66), sin(39.27}, sin(42.92)} --> {sin(46.47)}
example 2: {0.725, 0.767, 0.801} --> {0.849}, these values correspond to
{sin(46.47), sin(50.09), sin(53.23)} --> {sin(58.10)}
example tested: [[['0.725323664']
['0.7671179']
['0.805884672']]]
predicted_instance: [ 0.83467698]
training example pair: [['0.680666907']
['0.725323664']
['0.7671179']] 0.805884672
"""
import numpy as np
# linear activation matrix-wise (works also element-wise)
def linear(x):
return x
# sigmoid function matrix-wise (works also element-wise)
def sigmoid(x):
return 1/(1 + np.exp(-x))
# hard sigmoid function element wise
def hard_sig(x):
# in Keras for both tensorflow and theano backend
return np.max(np.array([0.0, np.min(np.array([1.0, x * 0.2 + 0.5]))]))
# Courbariaux et al. 2016 (Binarized Neural Networks)
# return np.max(np.array([0.0, np.min(np.array([1.0, (x + 1.0)/2.0]))]))
# hard sigmoid function matrix wise
def hard_sigmoid(x, fun=hard_sig):
return np.vectorize(fun)(x)
# hyperbolic tangent function matrix wise (works also element-wise)
def hyperbolic_tangent(x):
return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))
print(sigmoid(np.array([-100, 0, 100])))
print(hard_sigmoid(np.array([-100, 0, 0.1, 100])))
print(hyperbolic_tangent(np.array([-100, 0, 100])))
parameter_names = ['lstm_1_kernel_0.npy',
'lstm_1_recurrent_kernel_0.npy',
'lstm_1_bias_0.npy',
'lstm_2_kernel_0.npy',
'lstm_2_recurrent_kernel_0.npy',
'lstm_2_bias_0.npy',
'dense_1_kernel_0.npy',
'dense_1_bias_0.npy']
# LSTM 1 Weights
lstm_1_kernel_0 = np.load('lstm_1_kernel_0.npy')
print('lstm_1_kernel_0: ', lstm_1_kernel_0.shape)
lstm_1_recurrent_kernel_0 = np.load('lstm_1_recurrent_kernel_0.npy')
print('lstm_1_recurrent_kernel_0: ', lstm_1_recurrent_kernel_0.shape)
lstm_1_bias_0 = np.load('lstm_1_bias_0.npy')
print('lstm_1_bias_0: ', lstm_1_bias_0.shape)
# LSTM 2 Wights
lstm_2_kernel_0 = np.load('lstm_2_kernel_0.npy')
print('lstm_2_kernel_0: ', lstm_2_kernel_0.shape)
lstm_2_recurrent_kernel_0 = np.load('lstm_2_recurrent_kernel_0.npy')
print('lstm_2_recurrent_kernel_0: ', lstm_2_recurrent_kernel_0.shape)
lstm_2_bias_0 = np.load('lstm_2_bias_0.npy')
print('lstm_2_bias_0: ', lstm_2_bias_0.shape)
# Dense layer
dense_1_kernel_0 = np.load('dense_1_kernel_0.npy')
print('dense_1_kernel_0: ', dense_1_kernel_0.shape)
dense_1_bias_0 = np.load('dense_1_bias_0.npy')
print('dense_1_bias_0: ', dense_1_bias_0.shape)
time_seq = [0, 1, 2]
"""
input_seq = np.array([[[0.725323664],
[0.7671179],
[0.805884672]]])
"""
input_seq = np.array([[[0.680666907],
[0.725323664],
[0.7671179]]])
print('input_seq: ', input_seq.shape)
for time in time_seq:
print('input t', time, ':', input_seq[0, time, 0])
"""
# z0 = z[:, :self.units]
# z1 = z[:, self.units: 2 * self.units]
# z2 = z[:, 2 * self.units: 3 * self.units]
# z3 = z[:, 3 * self.units:]
# i = self.recurrent_activation(z0)
# f = self.recurrent_activation(z1)
# c = f * c_tm1 + i * self.activation(z2)
# o = self.recurrent_activation(z3)
# activation =' tanh'
# recurrent_activation = 'hard_sigmoid'
"""
# LSTM 1
x_1_lstm_1 = input_seq[0, 0, 0]
print('x_1: ', x_1_lstm_1)
x_2_lstm_1 = input_seq[0, 1, 0]
print('x_2: ', x_2_lstm_1)
x_3_lstm_1 = input_seq[0, 2, 0]
print('x_3: ', x_3_lstm_1)
c_0_lstm_1 = np.zeros((1, 3))
h_0_lstm_1 = np.zeros((1, 3))
z_1_lstm_1 = np.dot(x_1_lstm_1, lstm_1_kernel_0) + np.dot(h_0_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_1_lstm_1.shape)
i_1_lstm_1 = sigmoid(z_1_lstm_1[:, 0:3])
f_1_lstm_1 = sigmoid(z_1_lstm_1[:, 3:6])
input_to_c_1_lstm_1 = z_1_lstm_1[:, 6:9]
o_1_lstm_1 = sigmoid(z_1_lstm_1[:, 9:12])
c_1_lstm_1 = np.multiply(f_1_lstm_1, c_0_lstm_1) + np.multiply(i_1_lstm_1, hyperbolic_tangent(input_to_c_1_lstm_1))
h_1_lstm_1 = np.multiply(o_1_lstm_1, hyperbolic_tangent(c_1_lstm_1))
print('h_1_lstm_1: ', h_1_lstm_1.shape, h_1_lstm_1)
z_2_lstm_1 = np.dot(x_2_lstm_1, lstm_1_kernel_0) + np.dot(h_1_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_2_lstm_1.shape)
i_2_lstm_1 = sigmoid(z_2_lstm_1[:, 0:3])
f_2_lstm_1 = sigmoid(z_2_lstm_1[:, 3:6])
input_to_c_2_lstm_1 = z_2_lstm_1[:, 6:9]
o_2_lstm_1 = sigmoid(z_2_lstm_1[:, 9:12])
c_2_lstm_1 = np.multiply(f_2_lstm_1, c_1_lstm_1) + np.multiply(i_2_lstm_1, hyperbolic_tangent(input_to_c_2_lstm_1))
h_2_lstm_1 = np.multiply(o_2_lstm_1, hyperbolic_tangent(c_2_lstm_1))
print('h_2_lstm_1: ', h_2_lstm_1.shape, h_2_lstm_1)
z_3_lstm_1 = np.dot(x_3_lstm_1, lstm_1_kernel_0) + np.dot(h_2_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_3_lstm_1.shape)
i_3_lstm_1 = sigmoid(z_3_lstm_1[:, 0:3])
f_3_lstm_1 = sigmoid(z_3_lstm_1[:, 3:6])
input_to_c_3_lstm_1 = z_3_lstm_1[:, 6:9]
o_3_lstm_1 = sigmoid(z_3_lstm_1[:, 9:12])
c_3_lstm_1 = np.multiply(f_3_lstm_1, c_2_lstm_1) + np.multiply(i_3_lstm_1, hyperbolic_tangent(input_to_c_3_lstm_1))
h_3_lstm_1 = np.multiply(o_3_lstm_1, hyperbolic_tangent(c_3_lstm_1))
print('h_3_lstm_1: ', h_3_lstm_1.shape, h_3_lstm_1)
# LSTM 2
x_1_lstm_2 = h_1_lstm_1
x_2_lstm_2 = h_2_lstm_1
x_3_lstm_2 = h_3_lstm_1
c_0_lstm_2 = np.zeros((1, 1))
h_0_lstm_2 = np.zeros((1, 1))
z_1_lstm_2 = np.dot(x_1_lstm_2, lstm_2_kernel_0) + np.dot(h_0_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_1_lstm_2.shape)
i_1_lstm_2 = sigmoid(z_1_lstm_2[:, 0])
f_1_lstm_2 = sigmoid(z_1_lstm_2[:, 1])
input_to_c_1_lstm_2 = z_1_lstm_2[:, 2]
o_1_lstm_2 = sigmoid(z_1_lstm_2[:, 3])
c_1_lstm_2 = np.multiply(f_1_lstm_2, c_0_lstm_2) + np.multiply(i_1_lstm_2, hyperbolic_tangent(input_to_c_1_lstm_2))
h_1_lstm_2 = np.multiply(o_1_lstm_2, hyperbolic_tangent(c_1_lstm_2))
print('h_1_lstm_2: ', h_1_lstm_2.shape, h_1_lstm_2)
z_2_lstm_2 = np.dot(x_2_lstm_2, lstm_2_kernel_0) + np.dot(h_1_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_2_lstm_2.shape)
i_2_lstm_2 = sigmoid(z_2_lstm_2[:, 0])
f_2_lstm_2 = sigmoid(z_2_lstm_2[:, 1])
input_to_c_2_lstm_2 = z_2_lstm_2[:, 2]
o_2_lstm_2 = sigmoid(z_2_lstm_2[:, 3])
c_2_lstm_2 = np.multiply(f_2_lstm_2, c_1_lstm_2) + np.multiply(i_2_lstm_2, hyperbolic_tangent(input_to_c_2_lstm_2))
h_2_lstm_2 = np.multiply(o_2_lstm_2, hyperbolic_tangent(c_2_lstm_2))
print('h_2_lstm_2: ', h_2_lstm_2.shape, h_2_lstm_2)
z_3_lstm_2 = np.dot(x_3_lstm_2, lstm_2_kernel_0) + np.dot(h_2_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_3_lstm_2.shape)
i_3_lstm_2 = sigmoid(z_3_lstm_2[:, 0])
f_3_lstm_2 = sigmoid(z_3_lstm_2[:, 1])
input_to_c_3_lstm_2 = z_3_lstm_2[:, 2]
o_3_lstm_2 = sigmoid(z_3_lstm_2[:, 3])
c_3_lstm_2 = np.multiply(f_3_lstm_2, c_2_lstm_2) + np.multiply(i_3_lstm_2, hyperbolic_tangent(input_to_c_3_lstm_2))
h_3_lstm_2 = np.multiply(o_3_lstm_2, hyperbolic_tangent(c_3_lstm_2))
print('h_3_lstm_2: ', h_3_lstm_2.shape, h_3_lstm_2)
output = np.dot(h_3_lstm_2, dense_1_kernel_0) + dense_1_bias_0
print('output: ', output)
The weights have been saved to file at train time and they can be retrieved from the following location:
LSTM weights
In order to create the LSTM which is fitting a sinwave signal I have used the following code in Keras:
def build_simple_model(layers):
model = Sequential()
model.add(LSTM(input_shape=(layers[1], layers[0]),
output_dim=layers[1],
return_sequences=True,
activation='tanh',
recurrent_activation='sigmoid')) # 'hard_sigmoid'
# model.add(Dropout(0.2))
model.add(LSTM(layers[2],
return_sequences=False,
activation='tanh',
recurrent_activation='sigmoid')) # 'hard_sigmoid'
# model.add(Dropout(0.2))
model.add(Dense(output_dim=layers[3]))
model.add(Activation("linear"))
start = time.time()
model.compile(loss="mse", optimizer="rmsprop")
print("> Compilation Time : ", time.time() - start)
plot_model(model, to_file='lstm_model.png', show_shapes=True, show_layer_names=True)
print(model.summary())
return model
This resulted in the following model:
I have used the training procedure as follows:
seq_len = 3
model = lstm.build_simple_model([1, seq_len, 1, 1])
model.fit(X_train,
y_train,
batch_size=512,
nb_epoch=epochs,
validation_split=0.05)
Would it be possible to understand why my forward pass does not produce the desired output in predicting a future sin() signal value based on three previous consecutive ones.
The original example on which I am trying to base my forward pass exercise originates here. The weights uploaded in .npy format are from a network that is able to perfectly predict the next sin() value in a series.
I realised what the problem was. I was trying to extract my model weights using Tensorflow session (after model fitting), rather than via Keras methods directly. This resulted in weights matrices that made perfect sense (dimension wise) but contained the values from initialization step.
model.fit(X_train,
y_train,
batch_size=batch_size,
nb_epoch=epochs,
validation_split=0.05,
callbacks=callbacks_list)
print('n_parameters: ', len(model.weights))
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
parameter_names = ['lstm_1_kernel_0',
'lstm_1_recurrent_kernel_0',
'lstm_1_bias_0',
'lstm_2_kernel_0',
'lstm_2_recurrent_kernel_0',
'lstm_2_bias_0',
'dense_1_kernel_0',
'dense_1_bias_0']
weights = model.get_weights()
trainable_weights = model.trainable_weights
for parameter in range(len(model.weights)):
print('')
# using Keras methods is the correct way
print('parameter: ', trainable_weights[parameter])
print('parameter Keras: ', weights[parameter])
# using session with TF is the wrong way
print('parameter TF: ', model.weights[parameter].eval(session=sess))
#np.save(parameter_names[parameter], model.weights[parameter].eval(session=sess))
#np.save(parameter_names[parameter], weights[parameter])
This prints the following to screen:
parameter: <tf.Variable 'lstm_1/kernel:0' shape=(1, 12) dtype=float32_ref>
parameter Keras: [[ 0.02005039 0.59627813 -0.77670902 -0.17643917 0.64905447 -0.49418128
0.01204901 0.79791737 -1.58887422 -0.3566488 0.67758918 0.77245694]]
parameter TF: [[-0.20346385 -0.07166874 -0.58842945 0.03744811 0.46911311 -0.0469712
-0.07291448 0.27316415 -0.53298378 0.08367682 0.10194337 0.20933461]]
parameter: <tf.Variable 'lstm_1/recurrent_kernel:0' shape=(3, 12) dtype=float32_ref>
parameter Keras: [[ 0.01916649 -0.30881727 -0.07018201 0.28770521 -0.45713434 -0.33738521
0.53091544 -0.78456688 0.50647908 0.12326431 -0.18517831 -0.28752103]
[ 0.44490865 -0.09020164 1.00983524 0.43070397 -0.14646551 -0.53908533
1.33833826 0.76106179 -1.28808987 0.71029669 -0.19338571 -0.30499896]
[ 0.76727188 -0.10291406 0.53285897 0.31021088 0.46876401 0.04961515
0.0573149 1.17765784 -0.45716232 0.26181531 0.60458028 -0.6042906 ]]
parameter TF: [[-0.044281 -0.42013288 -0.06702472 0.16710882 0.07229936 0.20263752
0.01935999 -0.65925431 0.21676332 0.02481769 0.50321299 -0.08369029]
[-0.17725646 -0.14031938 -0.07758044 -0.39292315 0.36675838 -0.20198873
0.59491426 -0.12469263 0.14705807 0.39603388 -0.25511321 -0.01221756]
[ 0.51603764 0.34401873 0.36002275 0.05344227 -0.00293417 -0.36086732
0.1636388 -0.24916036 0.09064917 -0.04246153 0.05563453 -0.5006755 ]]
parameter: <tf.Variable 'lstm_1/bias:0' shape=(12,) dtype=float32_ref>
parameter Keras: [ 3.91339064e-01 -2.09703773e-01 -4.88098420e-04 1.15376031e+00
6.24452651e-01 2.24053934e-01 4.06851530e-01 4.78419960e-01
1.77846551e-01 3.19107175e-01 5.16630232e-01 -2.22970009e-01]
parameter TF: [ 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0.]
parameter: <tf.Variable 'lstm_2/kernel:0' shape=(3, 4) dtype=float32_ref>
parameter Keras: [[ 2.01334882 1.9168334 1.77633524 -0.90856379]
[ 1.17618477 1.02978265 -0.06435115 0.66180402]
[-1.33014703 -0.71629387 -0.87376142 1.35648465]]
parameter TF: [[ 0.83115911 0.72150767 0.51600969 -0.52725452]
[ 0.53043616 0.59162521 -0.59219611 0.0951736 ]
[-0.8030411 -0.00424314 -0.06715947 0.67533839]]
parameter: <tf.Variable 'lstm_2/recurrent_kernel:0' shape=(1, 4) dtype=float32_ref>
parameter Keras: [[-0.09348518 -0.7667768 0.24031806 -0.39155772]]
parameter TF: [[-0.085137 -0.59010917 0.61000961 -0.52193022]]
parameter: <tf.Variable 'lstm_2/bias:0' shape=(4,) dtype=float32_ref>
parameter Keras: [ 1.21466994 2.22224903 1.34946632 0.19186479]
parameter TF: [ 0. 1. 0. 0.]
parameter: <tf.Variable 'dense_1/kernel:0' shape=(1, 1) dtype=float32_ref>
parameter Keras: [[ 2.69569159]]
parameter TF: [[ 1.5422312]]
parameter: <tf.Variable 'dense_1/bias:0' shape=(1,) dtype=float32_ref>
parameter Keras: [ 0.20767514]
parameter TF: [ 0.]
The forward pass code was therefore correct.The weights were wrong.The correct weights .npy files have also been updated at the link mentioned in the question. This forward pass can be used to illustrate sequence generation with LSTM by recycling the output.

ctc_loss error "No valid path found."

Training a model with tf.nn.ctc_loss produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.
I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))
For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.
Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.