Keras LSTM with stateful in Reinforcement Learning - tensorflow

I'm doing a simple DQN RL algorithm with Keras, but using an LSTM in the network. The idea is that a stateful LSTM will remember the relevant information from all prior states and thus predict rewards for different actions better. This problem is more of a keras problem than RL. I think the stateful LSTM is not being handled by me correctly.
MODEL CODE - functional api used:
state_input = Input( batch_shape=(batch_size,look_back,1,resolution[0], resolution[1]))
conv1 = TimeDistributed(Conv2D(8, 6, strides=3, activation='relu', data_format="channels_first"))(
state_input) # filters, kernal_size, stride
conv2 = TimeDistributed(Conv2D(8, 3, strides=2, activation='relu', data_format="channels_first"))(
conv1) # filters, kernal_size, stride
flatten = TimeDistributed(Flatten())(conv2)
fc1 = TimeDistributed(Dense(128,activation='relu'))(flatten)
fc2 = TimeDistributed(Dense(64, activation='relu'))(fc1)
lstm_layer = LSTM(4, stateful=True)(fc2)
fc3 = Dense(128, activation='relu')(lstm_layer)
fc4 = Dense(available_actions_count)(fc3)
model = keras.models.Model(input=state_input, output=fc4)
adam = RMSprop(lr= learning_rate)#0.001
model.compile(loss="mse", optimizer=adam)
print(model.summary())
This is the model summary:
Layer (type) Output Shape Param
=================================================================
input_1 (InputLayer) (1, 1, 1, 30, 45) 0
_________________________________________________________________
time_distributed_1 (TimeDist (1, 1, 8, 9, 14) 296
_________________________________________________________________
time_distributed_2 (TimeDist (1, 1, 8, 4, 6) 584
_________________________________________________________________
time_distributed_3 (TimeDist (1, 1, 192) 0
_________________________________________________________________
time_distributed_4 (TimeDist (1, 1, 128) 24704
_________________________________________________________________
time_distributed_5 (TimeDist (1, 1, 64) 8256
_________________________________________________________________
lstm_1 (LSTM) (1, 4) 1104
_________________________________________________________________
dense_3 (Dense) (1, 128) 640
_________________________________________________________________
dense_4 (Dense) (1, 8) 1032
=================================================================
Total params: 36,616
Trainable params: 36,616
Non-trainable params: 0
================================================================
I feed in one frame at a time to fit the model. When I need to predict to act, I make sure I save the model state and restore it as follows.
CODE TO FIT/TRAIN THE MODEL:
#save the state (lstm memory) for recovery before fitting.
prev_state = get_model_states(model)
target_q = model.predict(s1, batch_size=batch_size)
#lstm predict updates the state of the lstm modules
q_next = model.predict(s2, batch_size=batch_size)
max_q_next = np.max(q_next, axis=1)
target_q[np.arange(target_q.shape[0]), a] = r + discount_factor * (1 - isterminal) * max_q_next
#now recover states for fitting the model correctly
set_model_states(model,prev_state)#to before s1 prediction
model.fit(s1, target_q,batch_size=batch_size, verbose=0)
#after fitting, the state and weights both get updated !!
#so lstm has already moved forward in the sequence
The model does not seem to be working at all. The variance remains very high across different epochs. I am resetting the model after every episode , as one would expect. So stateful does not affect the training between episodes. Each episode is fed in one frame at a time, that is why I need stateful.
I've tried different discount factors and learning rates.In theory, this should be a superior model to the vanilla dqn (CNN with 4 frames )
What am I doing wrong ? Any help would be appreciated.

Related

LSTM output unexpected predict shape

I want build a LSTM model to predict category label, bases on 60 days data
Basically:
Input - 60 days timewindow, 1 feature
- train data x (2571, 60, 1) y (2571, 1)
- test data x (60, 1), y (1)
Output - 1 label either 0 or 1
One thing I am not sure is, should I shape train/test x as (60,1) or (1, 60)
I made a LSTM network like:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_15 (LSTM) (None, 60, 128) 66560
dropout_10 (Dropout) (None, 60, 128) 0
lstm_16 (LSTM) (None, 60, 64) 49408
dropout_11 (Dropout) (None, 60, 64) 0
lstm_17 (LSTM) (None, 16) 5184
dense_5 (Dense) (None, 1) 17
=================================================================
Total params: 121,169
Trainable params: 121,169
Non-trainable params: 0
_________________________________________________________________
here is my code:
lookback_time_win = 60
num_features = 1
model = Sequential()
model.add(LSTM(128, input_shape=(time_window_size, num_features), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.1))
# no need return sequences from 'the last layer'
model.add(LSTM(units=16))
# adding the output layer
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
but after train, I call the function model.predict like:
y = model.predict(x_test)
instead of my expected 0 or 1, I get y with shape like (60, 1)
After some debugging, I suspect the root cause was because of my x shape was wrong.
Originally, my x test shape was(60, 1), after I reshape it to (1, 60), I get 1 output as y every time, shape (1). If I shape my test x as (60, 1), I get predicted y shape as (60,1)
But I get a new problem...
If I plot it together with my y_test, the y_predict is just in the middle.
My y_predict is completely making no sense, they are in very narrowed range from 0.45 to 0.447
If I take #Frightera's advise, using np.where(y_predicted_result>0.454, 1, 0) convert them into 0 or 1, it does not looks working, by comparing it with ground truth, no idea why it is like

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
else:
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
layer(x)
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to model.build(input_shape)).
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
encoder.summary()
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 16) 0
_________________________________________________________________
dense_56 (Dense) (None, 10) 170
=================================================================
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
encoder.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
encoder.fit(x, y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

How is it possible to encode an input with one 2D Convolution and applying the opposite 2D DeConv / Transposed Conv to get the same dimension back?

I am working on an autoencoder and I have an issue with reproducing the input in the same size. If I am using transposed convolution / deconvolution operation with the same parameters, I got a different output size then the original input was. For illustrating my problem, let us assume our model consists of just one convlution (to encode the input) and one deconvolution (to decode the encoded input). However, I not get the same size as my input. More precisely, the second and third dimension / axis 1 and axis 2 are 16 and not as one would expect: 15. Here is the code:
import tensorflow as tf
input = tf.keras.Input(shape=(15, 15, 3), name="Input0")
conv2d_layer2 = tf.keras.layers.Conv2D(filters=32, strides=[2, 2], kernel_size=[3, 3],
padding='same',
activation='selu', name="Conv1")
conv2d_trans_layer2 = tf.keras.layers.Conv2DTranspose(filters=32, strides=[2, 2],
kernel_size=[3, 3], padding='same',
activation='selu', name="DeConv1")
x_endcoded_1 = conv2d_layer2(input)
x_reconstructed = conv2d_trans_layer2(x_endcoded_1)
model = tf.keras.Model(inputs=input, outputs=x_reconstructed)
Results in the following model:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input0 (InputLayer) [(None, 15, 15, 3)] 0
_________________________________________________________________
Conv1 (Conv2D) (None, 8, 8, 32) 896
_________________________________________________________________
DeConv1 (Conv2DTranspose) (None, 16, 16, 32) 9248
=================================================================
Total params: 10,144
Trainable params: 10,144
How can I reproduce my original input with using just this tranposed convolution? Is this possible?
deleting padding from both you can reproduce the mapping
input = Input(shape=(15, 15, 3), name="Input0")
conv2d_layer2 = Conv2D(filters=32, strides=[2, 2], kernel_size=[3, 3],
activation='selu', name="Conv1")(input)
conv2d_trans_layer2 = Conv2DTranspose(filters=32, strides=[2, 2],
kernel_size=[3, 3],
activation='selu', name="DeConv1")(conv2d_layer2)
model = Model(inputs=input, outputs=conv2d_trans_layer2)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
Input0 (InputLayer) [(None, 15, 15, 3)] 0
_________________________________________________________________
Conv1 (Conv2D) (None, 7, 7, 32) 896
_________________________________________________________________
DeConv1 (Conv2DTranspose) (None, 15, 15, 32) 9248
=================================================================
In general, to do this in deeper structures you have to play with padding, strides and pooling
online there are a lot of good resources that explain how this operation works and their application in keras
Padding and Stride for Convolutional Neural Networks
Pooling Layers for Convolutional Neural Networks
How to use the UpSampling2D and Conv2DTranspose

Variable Number of channels

I need a convolutional layer which outputs a variable number of channels, depending on the input.
conv2d(filters = variable_number)
A model cannot have varying number of filters depending on the input. The model need to have below arguments to be fixed for it to be trained.
Name and type of all layers in the model.
Output shape for each layer.
Number of weight parameters of each layer.
The inputs each layer receives.
The total number of trainable and non-trainable parameters of the model.
If you have varying number of channels, then the model architecture is changing for different input and thus all the above listed points get impacted.
You can build a model with all the fixed parameters and later use dropout for the layer based on the input. But again the dropout is a regularization technique, Simply put, dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during a particular forward or backward pass.
OR
The most appropriate solution would be -
Build multiple input layer for different inputs.
Concatenate all these layers, but make sure the output shape of all these layers are same in case of Convolution layers else concatenate throws error.
Add the remaining layers of the model.
Below is an example for this -
from keras.models import Model
from keras.layers import Input, concatenate, Conv2D, ZeroPadding2D
from keras.optimizers import Adagrad
import tensorflow.keras.backend as K
import tensorflow as tf
input_img1 = Input(shape=(44,44,3))
x1 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img1)
input_img2 = Input(shape=(34,34,3))
x2 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img2)
# Zero Padding of 5 at the top, bottom, left and right side of an image tensor
x3 = ZeroPadding2D(padding = (5,5))(x2)
# Concatenate works as layers have same size output
x4 = concatenate([x1,x3])
output = Dense(18, activation='relu')(x4)
model = Model(inputs=[input_img1,input_img2], outputs=output)
model.summary()
Output -
Model: "model_22"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_91 (InputLayer) (None, 34, 34, 3) 0
__________________________________________________________________________________________________
input_90 (InputLayer) (None, 44, 44, 3) 0
__________________________________________________________________________________________________
conv2d_73 (Conv2D) (None, 34, 34, 3) 84 input_91[0][0]
__________________________________________________________________________________________________
conv2d_72 (Conv2D) (None, 44, 44, 3) 84 input_90[0][0]
__________________________________________________________________________________________________
zero_padding2d_14 (ZeroPadding2 (None, 44, 44, 3) 0 conv2d_73[0][0]
__________________________________________________________________________________________________
concatenate_30 (Concatenate) (None, 44, 44, 6) 0 conv2d_72[0][0]
zero_padding2d_14[0][0]
__________________________________________________________________________________________________
dense_47 (Dense) (None, 44, 44, 18) 126 concatenate_30[0][0]
==================================================================================================
Total params: 294
Trainable params: 294
Non-trainable params: 0
__________________________________________________________________________________________________
Hope this answers you question. Happy Learning.