I want build a LSTM model to predict category label, bases on 60 days data
Basically:
Input - 60 days timewindow, 1 feature
- train data x (2571, 60, 1) y (2571, 1)
- test data x (60, 1), y (1)
Output - 1 label either 0 or 1
One thing I am not sure is, should I shape train/test x as (60,1) or (1, 60)
I made a LSTM network like:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_15 (LSTM) (None, 60, 128) 66560
dropout_10 (Dropout) (None, 60, 128) 0
lstm_16 (LSTM) (None, 60, 64) 49408
dropout_11 (Dropout) (None, 60, 64) 0
lstm_17 (LSTM) (None, 16) 5184
dense_5 (Dense) (None, 1) 17
=================================================================
Total params: 121,169
Trainable params: 121,169
Non-trainable params: 0
_________________________________________________________________
here is my code:
lookback_time_win = 60
num_features = 1
model = Sequential()
model.add(LSTM(128, input_shape=(time_window_size, num_features), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.1))
# no need return sequences from 'the last layer'
model.add(LSTM(units=16))
# adding the output layer
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
but after train, I call the function model.predict like:
y = model.predict(x_test)
instead of my expected 0 or 1, I get y with shape like (60, 1)
After some debugging, I suspect the root cause was because of my x shape was wrong.
Originally, my x test shape was(60, 1), after I reshape it to (1, 60), I get 1 output as y every time, shape (1). If I shape my test x as (60, 1), I get predicted y shape as (60,1)
But I get a new problem...
If I plot it together with my y_test, the y_predict is just in the middle.
My y_predict is completely making no sense, they are in very narrowed range from 0.45 to 0.447
If I take #Frightera's advise, using np.where(y_predicted_result>0.454, 1, 0) convert them into 0 or 1, it does not looks working, by comparing it with ground truth, no idea why it is like
Related
Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.
I'm checking the output of summary function and don't understand all the printed values.
For example, look on this simple code:
x = [1, 2, 3, 4, 5]
y = [1.2, 1.8, 3.5, 3.7, 5.3]
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(30, input_dim=1, activation='relu'))
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
The output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10) 20
_________________________________________________________________
dense_1 (Dense) (None, 30) 330
_________________________________________________________________
dense_2 (Dense) (None, 10) 310
_________________________________________________________________
dense_3 (Dense) (None, 1) 11
=================================================================
Total params: 671
Trainable params: 671
Non-trainable params: 0
_________________________________________________________________
Why is the meaning of None value , under the column Output Shape? what the None mean here?
Which network will not show None in the summary?
Why is the meaning of the Params # column ? How this value is calculated ?
The None is just a placeholder saying that the network can input more than one sample at the time. None means this dimension is variable. The first dimension in a keras model is always the batch size. ... That's why this dimension is often ignored when you define your model. For instance, when you define input_shape=(100,200) , actually you're ignoring the batch size and defining the shape of "each sample".
None won't show If you set a fixed batch. As an example, if you would send in a batch of 10 images your shape would be (10, 64, 64, 3) and if you changed it to 25 you would have (25, 64, 64, 3)
For the dense_1st layer , number of params is 20. This is obtained as
: 10 (input values) + 10 (bias values)
For dense_2nd layer, number of params is 330. This is obtained as : 10
(input values) * 30 (neurons in the second layer) + 30 (bias values
for neurons in the second layer)
For dense_3rd layer, number of params is 310. This is obtained as : 30
(input values) * 10 (neurons in the third layer) + 10 (bias values
for neurons in the third layer)
For final layer, number of params is 11. This is obtained as : 10 (input values) * 1 (neurons in the second layer) + 1 (bias values for neurons in the final layer)
total params = 20+330+310+11 = 671
In WaveNet, dilated convolution is used to increase receptive field of the layers above.
From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
And the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 200, 5) 55
_________________________________________________________________
conv1d_6 (Conv1D) (None, 200, 5) 130
_________________________________________________________________
dense_2 (Dense) (None, 200, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?
I'm doing a simple DQN RL algorithm with Keras, but using an LSTM in the network. The idea is that a stateful LSTM will remember the relevant information from all prior states and thus predict rewards for different actions better. This problem is more of a keras problem than RL. I think the stateful LSTM is not being handled by me correctly.
MODEL CODE - functional api used:
state_input = Input( batch_shape=(batch_size,look_back,1,resolution[0], resolution[1]))
conv1 = TimeDistributed(Conv2D(8, 6, strides=3, activation='relu', data_format="channels_first"))(
state_input) # filters, kernal_size, stride
conv2 = TimeDistributed(Conv2D(8, 3, strides=2, activation='relu', data_format="channels_first"))(
conv1) # filters, kernal_size, stride
flatten = TimeDistributed(Flatten())(conv2)
fc1 = TimeDistributed(Dense(128,activation='relu'))(flatten)
fc2 = TimeDistributed(Dense(64, activation='relu'))(fc1)
lstm_layer = LSTM(4, stateful=True)(fc2)
fc3 = Dense(128, activation='relu')(lstm_layer)
fc4 = Dense(available_actions_count)(fc3)
model = keras.models.Model(input=state_input, output=fc4)
adam = RMSprop(lr= learning_rate)#0.001
model.compile(loss="mse", optimizer=adam)
print(model.summary())
This is the model summary:
Layer (type) Output Shape Param
=================================================================
input_1 (InputLayer) (1, 1, 1, 30, 45) 0
_________________________________________________________________
time_distributed_1 (TimeDist (1, 1, 8, 9, 14) 296
_________________________________________________________________
time_distributed_2 (TimeDist (1, 1, 8, 4, 6) 584
_________________________________________________________________
time_distributed_3 (TimeDist (1, 1, 192) 0
_________________________________________________________________
time_distributed_4 (TimeDist (1, 1, 128) 24704
_________________________________________________________________
time_distributed_5 (TimeDist (1, 1, 64) 8256
_________________________________________________________________
lstm_1 (LSTM) (1, 4) 1104
_________________________________________________________________
dense_3 (Dense) (1, 128) 640
_________________________________________________________________
dense_4 (Dense) (1, 8) 1032
=================================================================
Total params: 36,616
Trainable params: 36,616
Non-trainable params: 0
================================================================
I feed in one frame at a time to fit the model. When I need to predict to act, I make sure I save the model state and restore it as follows.
CODE TO FIT/TRAIN THE MODEL:
#save the state (lstm memory) for recovery before fitting.
prev_state = get_model_states(model)
target_q = model.predict(s1, batch_size=batch_size)
#lstm predict updates the state of the lstm modules
q_next = model.predict(s2, batch_size=batch_size)
max_q_next = np.max(q_next, axis=1)
target_q[np.arange(target_q.shape[0]), a] = r + discount_factor * (1 - isterminal) * max_q_next
#now recover states for fitting the model correctly
set_model_states(model,prev_state)#to before s1 prediction
model.fit(s1, target_q,batch_size=batch_size, verbose=0)
#after fitting, the state and weights both get updated !!
#so lstm has already moved forward in the sequence
The model does not seem to be working at all. The variance remains very high across different epochs. I am resetting the model after every episode , as one would expect. So stateful does not affect the training between episodes. Each episode is fed in one frame at a time, that is why I need stateful.
I've tried different discount factors and learning rates.In theory, this should be a superior model to the vanilla dqn (CNN with 4 frames )
What am I doing wrong ? Any help would be appreciated.
This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________