Stateful LSTM Tensorflow Invalid Input_h Shape Error - tensorflow

I am experimenting with stateful LSTM on a time-series regression problem by using TensorFlow. I apologize that I cannot share the dataset.
Below is my code.
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
model.fit(train_feature, train_label,
epochs=10,
batch_size=batch_size)
When I run the above code, after the end of the first epoch, I will get an error as follows.
InvalidArgumentError: [_Derived_] Invalid input_h shape: [1,64,50] [1,49,50]
[[{{node CudnnRNN}}]]
[[sequential_1/lstm_1/StatefulPartitionedCall]] [Op:__inference_train_function_1152847]
Function call stack:
train_function -> train_function -> train_function
However, the model will be successfully trained if I change the batch_size to 1, and change the code for model training to the following.
total_epochs = 10
for i in range(total_epochs):
model.fit(train_feature, train_label,
epochs=1,
validation_data=(val_feature, val_label),
batch_size=batch_size,
shuffle=False)
model.reset_states()
Nevertheless, with a very large data (1 million rows), the model training will take a very long time since the batch_size is 1.
So, I wonder, how to train a stateful LSTM with a batch size larger than 1 (e.g. 64), without getting the invalid input_h shape error?
Thanks for your answers.

The fix is to ensure batch size never changes between batches. They must all be the same size.
Method 1
One way is to use a batch size that perfectly divides your dataset into equal-sized batches. For example, if total size of data is 1500 examples, then use a batch size of 50 or 100 or some other proper divisor of 1500.
batch_size = len(data)/proper_divisor
Method 2
The other way is to ignore any batch that is less than the specified size, and this can be done using the TensorFlow Dataset API and setting the drop_remainder to True.
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
When using the Dataset API like above, you will need to also specify how many rounds of training count as an epoch (essentially how many batches to count as 1 epoch). A tf.data.Dataset instance (the result from tf.data.Dataset.from_tensor_slices) doesn't know the size of the data that it's streaming to the model, so what constitutes as one epoch has to be manually specified with steps_per_epoch.
Your new code will look like this:
train_feature = train_feature.reshape((train_feature.shape[0], 1, train_feature.shape[1]))
val_feature = val_feature.reshape((val_feature.shape[0], 1, val_feature.shape[1]))
batch_size = 64
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(50, batch_input_shape=(batch_size, train_feature.shape[1], train_feature.shape[2]), stateful=True))
model.add(tf.keras.layers.Dense(1))
model.compile(optimizer='adam',
loss='mse',
metrics=[tf.keras.metrics.RootMeanSquaredError()])
steps_per_epoch = len(train_feature) // batch_size
model.fit(train_data,
epochs=10, steps_per_epoch = steps_per_epoch)
You can also include the validation set as well, like this (not showing other code):
batch_size = 64
val_data = tf.data.Dataset.from_tensor_slices((val_feature, val_label))
val_data = val_data.repeat().batch(batch_size, drop_remainder=True)
validation_steps = len(val_feature) // batch_size
model.fit(train_data, epochs=10,
steps_per_epoch=steps_per_epoch,
validation_steps=validation_steps)
Caveat: This means a few datapoints will never be seen by the model. To get around that, you can shuffle the dataset each round of training, so that the datapoints left behind each epoch changes, giving everyone a chance to be seen by the model.
buffer_size = 1000 # the bigger the slower but more effective shuffling.
train_data = tf.data.Dataset.from_tensor_slices((train_feature, train_label))
train_data = train_data.shuffle(buffer_size=buffer_size, reshuffle_each_iteration=True)
train_data = train_data.repeat().batch(batch_size, drop_remainder=True)
Why the error occurs
Stateful RNNs and their variants (LSTM, GRU, etc.) require fixed batch size. The reason is simply because statefulness is one way to realize Truncated Backprop Through Time, by passing the final hidden state for a batch as the initial hidden state of the next batch. The final hidden state for the first batch has to have exactly the same shape as the initial hidden state of the next batch, which requires that batch size stay the same across batches.
When you set the batch size to 64, model.fit will use the remaining data at the end of an epoch as a batch, and this may not have up to 64 datapoints. So, you get such an error because the batch size is different from what the stateful LSTM expects. You don't have the problem with batch size of 1 because any remaining data at the end of an epoch will always contain exactly 1 datapoint, so no errors. More generally, 1 is always a divisor of any integer. So, if you picked any other divisor of your data size, you should not get the error.
In the error message you posted, it appears the last batch has size of 49 instead of 64. On a side note: The reason the shapes look different from the input is because, under the hood, keras works with the tensors in time_major (i.e. the first axis is for steps of sequence). When you pass a tensor of shape (10, 15, 2) that represents (batch_size, steps_per_sequence, num_features), keras reshapes it to (15, 10, 2) under the hood.

Related

Prevent exploding loss function multi-step multi-variate/output forecast ConvLSTM

I have a plausible problem that currently fails to solve. During the training, my loss function explodes becomes inf or NaN, because the MSE of all errors becomes huge if the predictions (at the beginning of the training) are worse. And that is the normal intended behavior and correct. But, how do I train a ConvLSTM to which loss function to be able to learn a multi-step multi-variate output?
E.g. i try a (32, None, 200, 12) to predict (32, None, 30, 12). 32 is the batch size, None is the number of samples (>3000). 200 is the number of time steps, 12 features wide. 30 output time steps, 12 features wide.
My ConvLSTM model:
input = Input(shape=(None, input_shape[1]))
conv1d_1 = Conv1D(filters=64, kernel_size=3, activation=LeakyReLU())(input)
conv1d_2 = Conv1D(filters=32, kernel_size=3, activation=LeakyReLU())(conv1d_1)
dropout = Dropout(0.3)(conv1d_2)
lstm_1 = LSTM(32, activation=LeakyReLU())(dropout)
dense_1 = Dense(forecasting_horizon * input_shape[1], activation=LeakyReLU())(lstm_1)
output = Reshape((forecasting_horizon, input_shape[1]))(dense_1)
model = Model(inputs=input, outputs=output)
My ds generation:
ds_inputs = tf.keras.utils.timeseries_dataset_from_array(df[:-forecast_horizon], None, sequence_length=window_size, sequence_stride=1,
shuffle=False, batch_size=None)
ds_targets = tf.keras.utils.timeseries_dataset_from_array(df[forecast_horizon:], None, sequence_length=forecast_horizon, sequence_stride=1,
shuffle=False, batch_size=None)
ds_inputs = ds_inputs.batch(batch_size, drop_remainder=True)
ds_targets = ds_targets.batch(batch_size, drop_remainder=True)
ds = tf.data.Dataset.zip((ds_inputs, ds_targets))
ds = ds.shuffle(buffer_size=(len(ds)))
Besides MSE, I already tried MeanAbsoluteError, MeanSquaredLogarithmicError, MeanAbsolutePercentageError, CosineSimilarity. Where the last, produce non-sense. MSLE works best but does not favor large errors and therefore the MSE (used as metric has an incredible variation during training). Additionally, after a while, the Network becomes stale and gets no better loss (my explanation is that the difference in loss becomes too minor on the logarithmic scale and therefore the weights cannot be well adjusted).
I can partially answer my own question. One issue is that I used ReLu/LeakyReLu which will lead to exploding gradient problem because the RNN/LSTM Layer applies the same weights over time leading to exploding values as the values add up. Weights will not be reduced by any chance (ReLu min == 0). With Tanh as activation, it is possible to have negative values which also allow a reduction of the internal weights and really minimize the chance of exploding weights/predictions within the network. After some tests, the LSTM layer stays numerical stable.

Tf.Dataset with Keras returning a ValueError

Getting a ValueError related to shape when passing Tensorflow Dataset into a Keras's model.fit function.
My dataset's X_train has shape (100 samples x 62 features) and Y_train is (100 samples x 1 label
Reproducible code below:
import numpy as np
from tensorflow.keras import layers, Sequential, optimizers
from tensorflow.data import Dataset
num_samples = 100
num_features = 62
num_labels = 1
batch_size = 32
steps_per_epoch = int(num_samples/batch_size)
X_train = np.random.rand(num_samples,num_features)
Y_train = np.random.rand(num_samples, num_labels)
final_dataset = Dataset.from_tensor_slices((X_train, Y_train))
model = Sequential()
model.add(layers.Dense(256, activation='relu',input_shape=(num_features,)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(num_labels, activation='softmax'))
model.compile(optimizer=optimizers.Adam(0.001), loss='categorical_crossentropy',metrics=['accuracy'])
history = model.fit(final_dataset,epochs=10,batch_size=batch_size,steps_per_epoch = steps_per_epoch)
The error is:
ValueError: Error when checking input: expected dense_input to have shape (62,) but got array with shape (1,)
Why is the dense_input getting an array with shape (1,)? I am clearly passing it an X_train of shape (n_samples, n_features).
Interestingly the error goes away if I were to apply a batch(some number) function to the dataset, but seems like I am missing something.
It's an intended behavior.
When you use Tensorflow Dataset , you shouldn't specify the batch_size in the fit method of 'Model'. Instead as you mentioned you have to generate the batches using the function with tensorflow dataset.
As mentioned here in the documentation
batch_size: Integer or None. Number of samples per gradient update. If unspecified, batch_size will default to 32. Do not specify the batch_size if your data is in the form of symbolic tensors, dataset, dataset iterators, generators, or keras.utils.Sequence instances (since they generate batches).
The classical behavior is therefore to do as you did: generate the batches with the dataset.
Also use repeat if you want to perform multiple epochs. On the .fit side you'll have to specify the steps_per_epoch to indicate how many batch is one epoch and epochs for your number of epochs.

Validation loss is inconsistent if I predict same results and calculate loss afterwards

I have an LSTM model that predicts weather. When I run the model with model.fit, it gives around %20 MAPE.
When I try to predict the same data that given to model.fit, when I calculate the loss, it results %60 MAPE. What might be causing this difference? I would have ignored it but the difference is too much.
Here is my code in main:
#preparing the data and building the model first
regressor.fit(x_train, y_train, epochs = 100, batch_size = 32,
validation_data = (x_test, y_test))
results = regressor.predict(x_test)
print(bm.mean_absolute_percentage_error(y_test, results))
in bm:
def mean_absolute_percentage_error(real, est):
"""Calculates the mean absolute precentage error.
"""
sess = Session()
with sess.as_default():
tensor = losses.mean_absolute_percentage_error(real, est)
return tensor.eval()[-1]
I used the same function that keras uses for calculating MAPE. Even if I made a mistake when preparing test data, they both should be consistently wrong because they take the same set as argument.

Keras LSTM always underfits

I am trying to train an LSTM with Keras and Tensorflow backend but it seems to always underfit; the loss and validation loss curves have an initial drop and then flatten out very fast (see image). I have tried adding more layers, more neurons, no dropout, etc., but can't get it even anywhere near an overfit and I do have a good bit of data (almost 4 hours with 100 samples per second, and I have tried downsampling to 50/sec).
My problem is multidimensional time series prediction with continuous values.
Any ideas would be appreciated!
Here is my basic keras architecture:
data_dim = 30 #input dimensions => each timestep has 30 features
timesteps = 200
out_dim = 30 #output dimensions => each predicted output timestep
# has 30 dimensions
batch_size = 50
num_epochs = 300
learning_rate = 0.0005 #tried values between around 0.001 and 0.0003
decay=0.9
#hidden layers size
h1 = 120
h2 = 340
h3 = 340
h4 = 120
model = Sequential()
model.add(LSTM(h1, return_sequences=True,input_shape=(timesteps, data_dim)))
model.add(LSTM(h2, return_sequences=True))
model.add(LSTM(h3, return_sequences=True))
model.add(LSTM(h4, return_sequences=True))
model.add(Dense(out_dim, activation='linear'))
rmsprop_otim = keras.optimizers.RMSprop(lr=learning_rate, rho=0.9, epsilon=1e-08, decay=decay)
model.compile(loss='mean_squared_error', optimizer=rmsprop_otim,metrics=['mse'])
#data preparation
[x_train, y_train] = readData()
x_train = x_train.reshape((int(num_samples/timesteps),timesteps,data_dim))
y_train = y_train.reshape((int(num_samples/timesteps),timesteps,num_classes))
history_callback = model.fit(x_train, y_train, validation_split=0.1,
batch_size=batch_size, epochs=num_epochs,shuffle=False,callbacks=[checkpointer, losses])
When you say 0.06 mse is underfit, this depends lot on data distribution. mse is relative term, so if the data is not normalizaed, 0.06 might even be overfit. In such case, pre-processing might help. Also, check if there is significant noise in the data.
Using 4 LSTM layers with large sizes means a lot of parameters to learn. Lesser number of layers might be enough.
Try non-linear activation in the final layer.
I suspect that your model only learns the weights of the Dense Layer properly, but not those of the LSTM layers below. As a quick check, what kind of performance do you get when you get rid of all LSTM layers and replace the Dense with a TimeDistributed(Dense...) layer? If your graphs look the same, training doesn't work, i.e. error gradients with respect to the lower layer weights may be too small. Another way of checking this is to inspect the gradients directly and/or to compare the final weighs after training with the initial weights. If this is indeed the problem you can try the following 1) Standardize your inputs, 2) use a smaller learning rate (decrease logarithmically), 3) add skip layer connections, 4) use Adam instead of RMSprop, and 5) train for more epochs.

Stateful LSTM fails to predict due to batch_size issue

I am able to successfully train my stateful LSTM using keras. My batch size is 60 and every input I am sending in the network is divisible by batch_size
Following is my snippet :
model = Sequential()
model.add(LSTM(80,input_shape = trainx.shape[1:],batch_input_shape=(60,
trainx.shape[1], trainx.shape[2]),stateful=True,return_sequences=True))
model.add(Dropout(0.15))
model.add(LSTM(40,return_sequences=False))
model.add(Dense(40))
model.add(Dropout(0.3))
model.add(Dense(output_dim=1))
model.add(Activation("linear"))
keras.optimizers.RMSprop(lr=0.005, rho=0.9, epsilon=1e-08, decay=0.0)
model.compile(loss="mse", optimizer="rmsprop")
My training line which runs successfully:
model.fit(trainx[:3000,:],trainy[:3000],validation_split=0.1,shuffle=False,nb_epoch=9,batch_size=60)
Now I try to predict on test set which is again divisible by 60 , but I get error :
ValueError: In a stateful network, you should only pass inputs with a
number of samples that can be divided by the batch size. Found: 240
samples. Batch size: 32.
Can anyone tell me what is wrong above ? I am confused , tried so many things but nothing helps.
I suspect that the reason for the error is that you did not specify the batch size in model.predict. As you can see in the documentation in the "predict" section, the default parameters are
model.predict(self, x, batch_size=32, verbose=0)
which is why 32 appears in your error message. So you need to specify batch_size=60 in model.predict.