LSTM output unexpected predict shape - tensorflow

I want build a LSTM model to predict category label, bases on 60 days data
Basically:
Input - 60 days timewindow, 1 feature
- train data x (2571, 60, 1) y (2571, 1)
- test data x (60, 1), y (1)
Output - 1 label either 0 or 1
One thing I am not sure is, should I shape train/test x as (60,1) or (1, 60)
I made a LSTM network like:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_15 (LSTM) (None, 60, 128) 66560
dropout_10 (Dropout) (None, 60, 128) 0
lstm_16 (LSTM) (None, 60, 64) 49408
dropout_11 (Dropout) (None, 60, 64) 0
lstm_17 (LSTM) (None, 16) 5184
dense_5 (Dense) (None, 1) 17
=================================================================
Total params: 121,169
Trainable params: 121,169
Non-trainable params: 0
_________________________________________________________________
here is my code:
lookback_time_win = 60
num_features = 1
model = Sequential()
model.add(LSTM(128, input_shape=(time_window_size, num_features), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.1))
# no need return sequences from 'the last layer'
model.add(LSTM(units=16))
# adding the output layer
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
but after train, I call the function model.predict like:
y = model.predict(x_test)
instead of my expected 0 or 1, I get y with shape like (60, 1)

After some debugging, I suspect the root cause was because of my x shape was wrong.
Originally, my x test shape was(60, 1), after I reshape it to (1, 60), I get 1 output as y every time, shape (1). If I shape my test x as (60, 1), I get predicted y shape as (60,1)
But I get a new problem...
If I plot it together with my y_test, the y_predict is just in the middle.
My y_predict is completely making no sense, they are in very narrowed range from 0.45 to 0.447
If I take #Frightera's advise, using np.where(y_predicted_result>0.454, 1, 0) convert them into 0 or 1, it does not looks working, by comparing it with ground truth, no idea why it is like

Related

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

Understanding the values of summary() (Output Shape, Param#)?

I'm checking the output of summary function and don't understand all the printed values.
For example, look on this simple code:
x = [1, 2, 3, 4, 5]
y = [1.2, 1.8, 3.5, 3.7, 5.3]
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(30, input_dim=1, activation='relu'))
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
The output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10) 20
_________________________________________________________________
dense_1 (Dense) (None, 30) 330
_________________________________________________________________
dense_2 (Dense) (None, 10) 310
_________________________________________________________________
dense_3 (Dense) (None, 1) 11
=================================================================
Total params: 671
Trainable params: 671
Non-trainable params: 0
_________________________________________________________________
Why is the meaning of None value , under the column Output Shape? what the None mean here?
Which network will not show None in the summary?
Why is the meaning of the Params # column ? How this value is calculated ?
The None is just a placeholder saying that the network can input more than one sample at the time. None means this dimension is variable. The first dimension in a keras model is always the batch size. ... That's why this dimension is often ignored when you define your model. For instance, when you define input_shape=(100,200) , actually you're ignoring the batch size and defining the shape of "each sample".
None won't show If you set a fixed batch. As an example, if you would send in a batch of 10 images your shape would be (10, 64, 64, 3) and if you changed it to 25 you would have (25, 64, 64, 3)
For the dense_1st layer , number of params is 20. This is obtained as
: 10 (input values) + 10 (bias values)
For dense_2nd layer, number of params is 330. This is obtained as : 10
(input values) * 30 (neurons in the second layer) + 30 (bias values
for neurons in the second layer)
For dense_3rd layer, number of params is 310. This is obtained as : 30
(input values) * 10 (neurons in the third layer) + 10 (bias values
for neurons in the third layer)
For final layer, number of params is 11. This is obtained as : 10 (input values) * 1 (neurons in the second layer) + 1 (bias values for neurons in the final layer)
total params = 20+330+310+11 = 671

Using dilated convolution in Keras

In WaveNet, dilated convolution is used to increase receptive field of the layers above.
From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
And the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 200, 5) 55
_________________________________________________________________
conv1d_6 (Conv1D) (None, 200, 5) 130
_________________________________________________________________
dense_2 (Dense) (None, 200, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?

Keras LSTM with stateful in Reinforcement Learning

I'm doing a simple DQN RL algorithm with Keras, but using an LSTM in the network. The idea is that a stateful LSTM will remember the relevant information from all prior states and thus predict rewards for different actions better. This problem is more of a keras problem than RL. I think the stateful LSTM is not being handled by me correctly.
MODEL CODE - functional api used:
state_input = Input( batch_shape=(batch_size,look_back,1,resolution[0], resolution[1]))
conv1 = TimeDistributed(Conv2D(8, 6, strides=3, activation='relu', data_format="channels_first"))(
state_input) # filters, kernal_size, stride
conv2 = TimeDistributed(Conv2D(8, 3, strides=2, activation='relu', data_format="channels_first"))(
conv1) # filters, kernal_size, stride
flatten = TimeDistributed(Flatten())(conv2)
fc1 = TimeDistributed(Dense(128,activation='relu'))(flatten)
fc2 = TimeDistributed(Dense(64, activation='relu'))(fc1)
lstm_layer = LSTM(4, stateful=True)(fc2)
fc3 = Dense(128, activation='relu')(lstm_layer)
fc4 = Dense(available_actions_count)(fc3)
model = keras.models.Model(input=state_input, output=fc4)
adam = RMSprop(lr= learning_rate)#0.001
model.compile(loss="mse", optimizer=adam)
print(model.summary())
This is the model summary:
Layer (type) Output Shape Param
=================================================================
input_1 (InputLayer) (1, 1, 1, 30, 45) 0
_________________________________________________________________
time_distributed_1 (TimeDist (1, 1, 8, 9, 14) 296
_________________________________________________________________
time_distributed_2 (TimeDist (1, 1, 8, 4, 6) 584
_________________________________________________________________
time_distributed_3 (TimeDist (1, 1, 192) 0
_________________________________________________________________
time_distributed_4 (TimeDist (1, 1, 128) 24704
_________________________________________________________________
time_distributed_5 (TimeDist (1, 1, 64) 8256
_________________________________________________________________
lstm_1 (LSTM) (1, 4) 1104
_________________________________________________________________
dense_3 (Dense) (1, 128) 640
_________________________________________________________________
dense_4 (Dense) (1, 8) 1032
=================================================================
Total params: 36,616
Trainable params: 36,616
Non-trainable params: 0
================================================================
I feed in one frame at a time to fit the model. When I need to predict to act, I make sure I save the model state and restore it as follows.
CODE TO FIT/TRAIN THE MODEL:
#save the state (lstm memory) for recovery before fitting.
prev_state = get_model_states(model)
target_q = model.predict(s1, batch_size=batch_size)
#lstm predict updates the state of the lstm modules
q_next = model.predict(s2, batch_size=batch_size)
max_q_next = np.max(q_next, axis=1)
target_q[np.arange(target_q.shape[0]), a] = r + discount_factor * (1 - isterminal) * max_q_next
#now recover states for fitting the model correctly
set_model_states(model,prev_state)#to before s1 prediction
model.fit(s1, target_q,batch_size=batch_size, verbose=0)
#after fitting, the state and weights both get updated !!
#so lstm has already moved forward in the sequence
The model does not seem to be working at all. The variance remains very high across different epochs. I am resetting the model after every episode , as one would expect. So stateful does not affect the training between episodes. Each episode is fed in one frame at a time, that is why I need stateful.
I've tried different discount factors and learning rates.In theory, this should be a superior model to the vanilla dqn (CNN with 4 frames )
What am I doing wrong ? Any help would be appreciated.

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________