Explain the outputs of Bidirectional LSTM . Output length is 5 , which of them are hidden state and cell state respectively? - tensorflow

I am trying to use bidirectional LSTM as encoder. I set return_sequence=True and return_state=True and I am getting an output list with a length of 5.
# define model
inputs1 = Input(shape=(3, 1))
lstm1 = Bidirectional(merge_mode = 'ave', layer = LSTM(3, return_sequences=True, return_state = True))(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)
# define input data
data = array([0.1, 0.2, 0.3]).reshape((1,3,1))
# make and show prediction
pred = model.predict(data)
length of the output of Bidirectional lstm is 5
len(pred) # 5
Shapes of all output from Bidirectional LSTM is
for num,i in enumerate(pred):
print(num, ': ',i.shape)
output
0 : (1, 3, 3)
1 : (1, 3)
2 : (1, 3)
3 : (1, 3)
4 : (1, 3)
Since it is bidirectional, I am assuming two of them are hidden states and 2 of them are cell states. Tell me the sequence. Thank you

Related

Forecast Multiple output for N days using LSTM

I am trying to create a LSTM model that can take multiple input features and Forecast those multiple features for N days
Eg I want to give 4 Features as input for 103 days not I want to forecast those 4 features for next 5 days
input features = [A1,B1,C1,D1],[A2,B2,C2,D2],...., [A103,B103,C103,D103]
output = [A104,B104,C104,D105],...,[A109,B109,C109,D109]
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
train = scaled_data[:13813,:]
test = scaled_data[13813:,:]
def create_dataset(dataset,size):
X = []
y = []
for i in range(len(dataset)-size-1):
a = dataset[i:(i+size),:]
a = dataset[i:(i+size),:]
X.append(a)
y.append(dataset[i+size])
return np.array(X),np.array(y)
##535 is number of data i want to use to forecast the feature
X_test, y_test = create_dataset(test,535)
X_train,y_train = create_dataset(train,535)
##X_train and y_train shape ((13277, 535, 17), (13277, 17))
##X_test and y_test shape ((2917, 535, 17), (2917, 17))
model_2 = Sequential()
model_2.add(tf.keras.layers.LSTM(units=128,return_sequences=True,input_shape = (X_train.shape[1],17)))
model_2.add(tf.keras.layers.LSTM(units=64,return_sequences=True))
model_2.add(tf.keras.layers.LSTM(units=50,return_sequences=True))
model_2.add(tf.keras.layers.LSTM(units=32))
model_2.add(tf.keras.layers.Dense(units=17,activation='linear'))
model_2.summary()
model_2.compile(loss='MSE',optimizer='adam')
model_history_2 = model_2.fit(X_train,y_train,validation_data=(X_test,y_test),callbacks=[early_stopping],epochs = 5,batch_size=150)
##But after this I am unable understand how to create a sliding window to predict the next 103 records.

Multiple parameters for 2 models

I have 2 models which are merged into a single model. This single model is created as follows:
fm = layers.Concatenate()([m1.output, m2.output])
# create dense layer for fusion
t = layers.Dense(143, activation='relu')(fm )
d = layers.Dropout(0.5)(t)
# create softmax layer for predictions
output = layers.Dense(len(classes), activation='softmax')(d)
# create model from all those layers :)
model = keras.models.Model(inputs=[ m1.input, m2.input], outputs=output)
This single model accepts the input from the two models : these two models are
First model - 2 input parameters tx[0] and tx[1]
Second model - 1 input parameter vx
and a common Y vy
Here's how I try to use them:
history = model.fit(
[
[ np.array(tx[0]), np.array(tx[1]) ],
np.array(vx)
],
np.array(vy), verbose = 1,
validation_data = (
[
[ np.array(txv[0]), np.array(txv[1]) ],
np.array(tvv)
],
np.array(vy)),
epochs = 1200,
batch_size = 128,
callbacks = [es, mcp_save])
So my first model has 2 parameters and my second one has one parameter. They both have a common y.
But the problem is that for some reason the first parameters end up being inputted for the second model. How to resolve that?
If I understand your model correctly, it has three inputs layers and you only mentioned two in your final model.
Use the variable name of each of the two input_layers of the first model instead of m1.input.
#the two input layers of the first Model m1:
inputA = tf.keras.Input(shape=(32,))
inputB = tf.keras.Input(shape=(128,))
#the input layer of the second Model m2:
inputC = tf.keras.Input(shape=(128,))
#after merging the two models m1 and m2 into one Model model :
model = keras.models.Model(inputs=[inputA, inputB , inputC], outputs=output)

ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)

So when I was trying to build a lstm network, every time it tells me that "ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)".
My dataset has more than 1000 samples, 2 features, and I set the time_step as 12.
I have already reshaped my dataset to 3-dim, however, the error tells that my last layer-Dense layer(I use this layer as output) expected a 2-dimention array.What shell I do?
My codes are as follows:
# read train set
readColsPro = (7, 20)
filename = 'train_set.txt'
xProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[0, 1200])
yProTrain_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
xProTrain_1 = xProTrain_1.reshape(xProTrain_1.shape[0], 2)
yProTrain_1 = yProTrain_1.reshape(yProTrain_1.shape[0], 2)
# erase 'nan' datas
for i in xProTrain_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTrain_1:
if np.isnan(i[1]):
i[1] = 0
# read test set
xProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[1, 1201])
yProTest_1 = readCsv.csvMat(filename, 1, cols=readColsPro, rows=[2, 1202])
xProTest_1 = np.reshape(xProTest_1, (xProTest_1.shape[0], xProTest_1.shape[1]))
yProTest_1 = np.reshape(yProTest_1, (yProTest_1.shape[0], yProTest_1.shape[1]))
for i in xProTest_1:
if np.isnan(i[1]):
i[1] = 0
for i in yProTest_1:
if np.isnan(i[1]):
i[1] = 0
# parameters
timeStepPro = 12
epoch = 24
batch_size = 24
trainNumPro = xProTrain_1.shape[0]
testNumPro = yProTrain_1.shape[0]
# reshape datas to 3D
xProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
xProTrain_2.append(xProTrain_1[i - timeStepPro:i])
xProTrain_2 = np.array(xProTrain_2)
yProTrain_2 = []
for i in range(timeStepPro, trainNumPro):
yProTrain_2.append(yProTrain_1[i - timeStepPro:i])
yProTrain_2 = np.array(yProTrain_2)
print(xProTrain_2.shape)
print(yProTrain_2.shape)
# reshape datas to 3D
xProTest_2 = []
for i in range(timeStepPro, trainNumPro):
xProTest_2.append(xProTest_1[i - timeStepPro:i])
xProTest_2 = np.array(xProTest_2)
yProTest_2 = []
for i in range(timeStepPro, trainNumPro):
yProTest_2.append(yProTest_1[i - timeStepPro:i])
yProTest_2 = np.array(yProTest_2)
# define network
modelA = Sequential()
modelA.add(LSTM(units=64, return_sequences=True,
input_shape=[xProTrain_2.shape[1], 2]))
modelA.add(BatchNormalization())
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=128, return_sequences=True))
modelA.add(LSTM(units=256, return_sequences=True))
modelA.add(LSTM(units=64, return_sequences=False))
modelA.add(Dense(units=2, activation='relu'))
modelA.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
modelA.fit(x=xProTrain_2, y=yProTrain_2, epochs=epoch, batch_size=batch_size)
Error message are as follows:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (1188, 12, 2)

How do I discover the values for variables of an equation with keras/tensorflow?

I have an equation that describes a curve in two dimensions. This equation has 5 variables. How do I discover the values of them with keras/tensorflow for a set of data? Is it possible? Someone know a tutorial of something similar?
I generated some data to train the network that has the format:
sample => [150, 66, 2] 150 sets with 66*2 with the data something like "time" x "acceleration"
targets => [150, 5] 150 sets with 5 variable numbers.
Obs: I know the range of the variables. I know too, that 150 sets of data are too few sample, but I need, after the code work, to train a new network with experimental data, and this is limited too. Visually, the curve is simple, it has a descendent linear part at the beggining and at the end it gets down "like an exponential".
My code is as follows:
def build_model():
model = models.Sequential()
model.add(layers.Dense(512, activation='relu', input_shape=(66*2,)))
model.add(layers.Dense(5, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['mae'])
return model
def smooth_curve(points, factor=0.9):
[...]
return smoothed_points
#load the generated data
train_data = np.load('samples00.npy')
test_data = np.load('samples00.npy')
train_targets = np.load('labels00.npy')
test_targets = np.load('labels00.npy')
#normalizing the data
mean = train_data.mean()
train_data -= mean
std = train_data.std()
train_data /= std
test_data -= mean
test_data /= std
#k-fold validation:
k = 3
num_val_samples = len(train_data)//k
num_epochs = 100
all_mae_histories = []
for i in range(k):
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
model = build_model()
#reshape the data to get the format (100, 66*2)
partial_train_data = partial_train_data.reshape(100, 66 * 2)
val_data = val_data.reshape(50, 66 * 2)
history = model.fit(partial_train_data,
partial_train_targets,
validation_data = (val_data, val_targets),
epochs = num_epochs,
batch_size = 1,
verbose = 1)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
average_mae_history = [
np.mean([x[i] for x in all_mae_histories]) for i in range(num_epochs)]
smooth_mae_history = smooth_curve(average_mae_history[10:])
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
Obviously as it is, I need to get the best accuracy possible, but I am getting an "median absolute error(MAE)" like 96%, and this is inaceptable.
I see some basic bugs in this methodology. Your final layer of the network has a softmax layer. This would mean it would output 5 values, which sum to 1, and behave as a probability distribution. What you actually want to predict is true numbers, or rather floating point values (under some fixed precision arithmetic).
If you have a range, then probably using a sigmoid and rescaling the final layer would to match the range (just multiply with the max value) would help you. By default sigmoid would ensure you get 5 numbers between 0 and 1.
The other thing should be to remove the cross entropy loss and use a loss like RMS, so that you predict your numbers well. You could also used 1D convolutions instead of using Fully connected layers.
There has been some work here: https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq which tries to solve DEs and might be relevant to your work.

ctc_loss error "No valid path found."

Training a model with tf.nn.ctc_loss produces an error every time the train op is run:
tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Unlike in previous questions about this function, this is not due to divergence. I have a low learning rate, and the error occurs on even the first train op.
The model is a CNN -> LSTM -> CTC. Here is the model creation code:
# Build Graph
self.videoInput = tf.placeholder(shape=(None, self.maxVidLen, 50, 100, 3), dtype=tf.float32)
self.videoLengths = tf.placeholder(shape=(None), dtype=tf.int32)
self.keep_prob = tf.placeholder(dtype=tf.float32)
self.targets = tf.sparse_placeholder(tf.int32)
self.targetLengths = tf.placeholder(shape=(None), dtype=tf.int32)
conv1 = tf.layers.conv3d(self.videoInput ...)
pool1 = tf.layers.max_pooling3d(conv1 ...)
conv2 = ...
pool2 = ...
conv3 = ...
pool3 = ...
cnn_out = tf.reshape(pool3, shape=(-1, self.maxVidLength, 4*7*96))
fw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
bw_cell = tf.nn.rnn_cell.MultiRNNCell(self.cell(), for _ in range(3))
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
fw_cell, bw_cell, cnn_out, sequence_length=self.videoLengths, dtype=tf.float32)
outputs = tf.concat(outputs, 2)
outputs = tf.reshape(outputs, [-1, self.hidden_size * 2])
w = tf.Variable(tf.random_normal((self.hidden_size * 2, len(self.char2index) + 1), stddev=0.2))
b = tf.Variable(tf.zeros(len(self.char2index) + 1))
out = tf.matmul(outputs, w) + b
out = tf.reshape(out, [-1, self.maxVidLen, len(self.char2index) + 1])
out = tf.transpose(out, [1, 0, 2])
cost = tf.reduce_mean(tf.nn.ctc_loss(self.targets, out, self.targetLengths))
self.train_op = tf.train.AdamOptimizer(0.0001).minimize(cost)
And here is the feed dict creation code:
indices = []
values = []
shape = [len(vids) * 2, self.maxLabelLen]
vidInput = np.zeros((len(vids) * 2, self.maxVidLen, 50, 100, 3), dtype=np.float32)
# Actual video, then left-right flip
for j in range(len(vids) * 2):
# K is video index
k = j if j < len(vids) else j - len(vids)
# convert video and label to input format
vidInput[j, 0:len(vids[k])] = vids[k] if k == j else vids[k][:,::-1,:]
indices.extend([j, i] for i in range(len(labelList[k])))
values.extend(self.char2index[c] for c in labelList[k])
fd[self.targets] = (indices, values, shape)
fd[self.videoInput] = vidInput
# Collect video lengths and label lengths
vidLengths = [len(j) for j in vids] + [len(j) for j in vids]
labelLens = [len(l) for l in labelList] + [len(l) for l in labelList]
fd[self.videoLengths] = vidLengths
fd[self.targetLengths] = labelLens
It turns out that the ctc_loss requires that the label lengths be shorter than the input lengths. If the label lengths are too long, the loss calculator cannot unroll completely and therefore cannot compute the loss.
For example, the label BIFI would require input length of at least 4 while the label BIIF would require input length of at least 5 due to a blank being inserted between the repeated symbols.
I had the same issue but I soon realized it was just because I was using glob and my label was in the filename so it was exceeding.
You can fix this issue by using:
os.path.join(*(filename.split(os.path.sep)[noOfDir:]))
For me the problem was fixed by setting preprocess_collapse_repeated=True.
FWIW: My target sequence length was already shorter than inputs, and the RNN outputs are that of softmax.
Another possible reason which I found out in my case is the input data range is not normalized to 0~1, due to that LSTM activation function becomes saturated in the beginning of the training, and causes "no valid path" log somehow.