How to interpret get_weights for Keras GRU? - tensorflow

I am unable to interpret the results of get_weights from a GRU layer. Here's my code -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
I am assuming it has something to do with GRU gates.
Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.

Best way I know would be to track the add_weight() calls in the build() function of the GRUCell.
Let's take an example model,
model = tf.keras.models.Sequential(
[
tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
tf.keras.layers.Dense(10)
]
)
How we'll print some metadata about what's returned by weights = model.get_layer('gru').get_weights(). Which gives,
Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]
Let's go back to what weights defined by the GRUCell. We got,
self.kernel = self.add_weight(
shape=(input_dim, self.units * 3),
...
)
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units * 3),
...
)
...
bias_shape = (2, 3 * self.units)
self.bias = self.add_weight(
shape=bias_shape,
...
)
This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.
The first matrix in weights (of shape [10, 96]) is a concatenation of Wz|Wr|Wh (in that order). Each of these is a [10, 32] sized tensor. Concatenation gives a [10, 32*3=96] sized tensor.
Similarly, the second matrix is a concatenation of Uz|Ur|Uh. Each of these is a [32, 32] sized tensor which becomes [32, 96] after concatenation.
You can see how they break this combined weight matrix to each of z, r and h components here.
Finally the bias. It contains 2 biases i.e. [2, 96] sized tensor; input_bias and recurrent_bias. Again, biases from all gates/weights are combined to a single tensor. Typically, only the input_bias is used. But if you have reset_after (decides how the reset gate is applied) set to True, then the recurrent_bias gets used. It's an implementation detail.

Related

keras: add constant number to all input values

I want to create a simple toy model in keras. The model should take an input, then add a 1 to every element and produce an output.
I found an example using keras, but it requires 2 inputs
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# create model
input1 = layers.Input(shape=(2,))
input2 = layers.Input(shape=(2,))
added = layers.Add()([input1, input2])
model = keras.models.Model(inputs=[input1, input2], outputs=added)
# run inference
input_shape = (2,)
x1 = tf.ones(input_shape)
x2 = tf.ones(input_shape)
y = model([x1, x2])
However, I need the model to only have a single input and simply increase every input value by 1, for example.
You can replace the second input of your toy model with a call to tf.ones_like:
input1 = layers.Input(shape=())
added = layers.Add()([input1, tf.ones_like(input1)])
model = keras.models.Model(inputs=input1, outputs=added)
tf.ones_like creates a tensor full of ones of the shape of the tensor passed as an argument. As this op depends only on the shape of the input tensor, you can technically create your network without a specified input shape, and it will accept any shape as input:
>>> model(3)
<tf.Tensor: shape=(), dtype=float32, numpy=4.0>
>>> model(tf.ones((1,2,3)))
<tf.Tensor: shape=(1, 2, 3), dtype=float32, numpy=
array([[[2., 2., 2.],
[2., 2., 2.]]], dtype=float32)>

tensorflow:Model was constructed with shape (None, 4, 1), but it was called on an input with incompatible shape (4, 1, 1)

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
train_data = np.array(
[[ 0.045964252, 0.08585282, 0.056468535, 0.087974496],
[ 0.06128449, 0.027692182, 0.01929527, 0.027361592],
[ 0.076604135, 0., 0., 0. ],
[-0.15014096, -0.6869674, -0.6869674, 0. ]], np.float32)
train_label= np.array(
[[0.08585282 ],
[0.027692182],
[0. ],
[0.036714412]], np.float32)
mydataset = tf.data.Dataset.from_tensor_slices((train_data, train_label))
myinput = tf.keras.layers.Input(shape=(4, 1), ragged=True)
output = tf.keras.layers.Dense(1)(myinput)
model = tf.keras.models.Model(inputs=myinput, outputs=output)
model.compile(
optimizer='sgd',
loss='mse',
metrics=[tf.keras.metrics.MeanSquaredError()])
print("model.fit mydatasetelement_spec:\n", mydataset.element_spec)
# (TensorSpec(shape=(4,), dtype=tf.float32, name=None), TensorSpec(shape=(1,), dtype=tf.float32, name=None))
history = model.fit(
mydataset,
epochs=4,
steps_per_epoch=4,
verbose=0)
How can I eliminate the warning by correcting the model input layer?
WARNING:tensorflow:Model was constructed with shape (None, 4, 1) for
input Tensor("Placeholder_1:0", shape=(None, 4, 1), dtype=float32),
but it was called on an input with incompatible shape (4, 1, 1)
I cannot seem to get tf.keras.layers.Input to accept the input from model.fit without throwing the warning. I don't want to change my data (reshape, squeeze etc.). I want to keep the input as a dataset with features and labels. I want to adapt the model to accept the input of my data.
You can fix it by doing:
myinput = tf.keras.layers.Input(shape=(1,), ragged=True)
Note that Dense layers' input shape should be in the following form: (batch_size, input_size)

Custom Keras binary_crossentropy loss function not working

I’m trying to re-define keras’s binary_crossentropy loss function so that I can customize it but it’s not giving me the same results as the existing one.
I'm using TF 1.13.1 with Keras 2.2.4.
I went through Keras’s github code. My understanding is that the loss in model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']), is defined in losses.py, using binary_crossentropy defined in tensorflow_backend.py.
I ran a dummy data and model to test it. Here are my findings:
The custom loss function outputs the same results as keras’s one
Using the custom loss in a keras model gives different accuracy results
from numpy.random import seed
seed(1)
from tensorflow import set_random_seed
set_random_seed(2)
import tensorflow as tf
from keras import losses
import keras.backend as K
import keras.backend.tensorflow_backend as tfb
from keras.layers import Dense
from keras import Sequential
#Dummy check of loss output
def binary_crossentropy_custom(y_true, y_pred):
return K.mean(binary_crossentropy_custom_tf(y_true, y_pred), axis=-1)
def binary_crossentropy_custom_tf(target, output, from_logits=False):
"""Binary crossentropy between an output tensor and a target tensor.
# Arguments
target: A tensor with the same shape as `output`.
output: A tensor.
from_logits: Whether `output` is expected to be a logits tensor.
By default, we consider that `output`
encodes a probability distribution.
# Returns
A tensor.
"""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
logits = tf.constant([[-3., -2.11, -1.22],
[-0.33, 0.55, 1.44],
[2.33, 3.22, 4.11]])
labels = tf.constant([[1., 1., 1.],
[1., 1., 0.],
[0., 0., 0.]])
custom_sigmoid_cross_entropy_with_logits = binary_crossentropy_custom(labels, logits)
keras_binary_crossentropy = losses.binary_crossentropy(y_true=labels, y_pred=logits)
with tf.Session() as sess:
print('CUSTOM sigmoid_cross_entropy_with_logits: ', sess.run(custom_sigmoid_cross_entropy_with_logits), '\n')
print('KERAS keras_binary_crossentropy: ', sess.run(keras_binary_crossentropy), '\n')
#CUSTOM sigmoid_cross_entropy_with_logits: [16.118095 10.886106 15.942386]
#KERAS keras_binary_crossentropy: [16.118095 10.886106 15.942386]
#Dummy check of model accuracy
X_train = tf.random.uniform((3, 5), minval=0, maxval=1, dtype=tf.dtypes.float32)
labels = tf.constant([[1., 0., 0.],
[0., 0., 1.],
[1., 0., 0.]])
model = Sequential()
#First Hidden Layer
model.add(Dense(5, activation='relu', kernel_initializer='random_normal', input_dim=5))
#Output Layer
model.add(Dense(3, activation='sigmoid', kernel_initializer='random_normal'))
#I ran model.fit for each model.compile below 10 times using the same X_train and provide the range of accuracy measurement
# model.compile(optimizer='adam', loss='binary_crossentropy', metrics =['accuracy']) #0.748 < acc < 0.779
# model.compile(optimizer='adam', loss=losses.binary_crossentropy, metrics =['accuracy']) #0.761 < acc < 0.778
model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['accuracy']) #0.617 < acc < 0.663
history = model.fit(X_train, labels, steps_per_epoch=100, epochs=1)
I'd expect the custom loss function to give similar model accuracy output but it does not. Any idea? Thanks!
Keras automatically selects which accuracy implementation to use according to the loss, and this won't work if you use a custom loss. But in this case you can just explictly use the right accuracy, which is binary_accuracy:
model.compile(optimizer='adam', loss=binary_crossentropy_custom, metrics =['binary_accuracy'])

Keras TimeDistributed Not Masking CNN Model

For the sake of example, I have an input consisting of 2 images,of total shape (2,299,299,3). I'm trying to apply inceptionv3 on each image, and then subsequently process the output with an LSTM. I'm using a masking layer to exclude a blank image from being processed (specified below).
The code is:
import numpy as np
from keras import backend as K
from keras.models import Sequential,Model
from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D, BatchNormalization, \
Input, GlobalAveragePooling2D, Masking,TimeDistributed, LSTM,Dense,Flatten,Reshape,Lambda, Concatenate
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import inception_v3
IMG_SIZE=(299,299,3)
def create_base():
base_model = inception_v3.InceptionV3(weights='imagenet', include_top=False)
x = GlobalAveragePooling2D()(base_model.output)
base_model=Model(base_model.input,x)
return base_model
base_model=create_base()
#Image mask to ignore images with pixel values of -1
IMAGE_MASK = -2*np.expand_dims(np.ones(IMG_SIZE),0)
final_input=Input((2,IMG_SIZE[0],IMG_SIZE[1],IMG_SIZE[2]))
final_model = Masking(mask_value = -2.)(final_input)
final_model = TimeDistributed(base_model)(final_model)
final_model = Lambda(lambda x: x, output_shape=lambda s:s)(final_model)
#final_model = Reshape(target_shape=(2, 2048))(final_model)
#final_model = Masking(mask_value = 0.)(final_model)
final_model = LSTM(5,return_sequences=False)(final_model)
final_model = Model(final_input,final_model)
#Create a sample test image
TEST_IMAGE = np.ones(IMG_SIZE)
#Create a test sample input, consisting of a normal image and a masked image
TEST_SAMPLE = np.concatenate((np.expand_dims(TEST_IMAGE,axis=0),IMAGE_MASK))
inp = final_model.input # input placeholder
outputs = [layer.output for layer in final_model.layers] # all layer outputs
functors = [K.function([inp]+ [K.learning_phase()], [out]) for out in outputs]
layer_outs = [func([np.expand_dims(TEST_SAMPLE,0), 1.]) for func in functors]
This does not work correctly. Specifically, the model should mask the IMAGE_MASK part of the input, but it instead processes it with inception (giving a nonzero output). here are the details:
layer_out[-1] , the LSTM output is fine:
[array([[-0.15324114, -0.09620268, -0.01668587, 0.07938149, -0.00757846]], dtype=float32)]
layer_out[-2] and layer_out[-3] , the LSTM input is wrong, it should have all zeros in the second array:
[array([[[ 0.37713543, 0.36381325, 0.36197218, ..., 0.23298527,
0.43247852, 0.34844452],
[ 0.24972123, 0.2378867 , 0.11810347, ..., 0.51930511,
0.33289322, 0.33403745]]], dtype=float32)]
layer_out[-4], the input to the CNN is correctly masked:
[[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
...,
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]]],
[[[-0., -0., -0.],
[-0., -0., -0.],
[-0., -0., -0.],
...,
[-0., -0., -0.],
[-0., -0., -0.],
[-0., -0., -0.]],
Note that the code seems to work correctly with a simpler base_model such as:
def create_base():
input_layer=Input(IMG_SIZE)
base_model=Flatten()(input_layer)
base_model=Dense(2048)(base_model)
base_model=Model(input_layer,base_model)
return base_model
I have exhausted most online resources on this. Permutations of this question have been asked on Keras's github, such as here, here and here, but I can't seem to find any concrete resolution.
The links suggest that the issues seem to be stemming from a combination of TimeDistributed being applied to BatchNormalization, and the hacky fixes of either the Lambda identity layer, or Reshape layers remove errors but don't seem to output the correct model.
I've tried to force the base model to support masking via:
base_model.__setattr__('supports_masking',True)
and I've also tried applying an identity layer via:
TimeDistributed(Lambda(lambda x: base_model(x), output_shape=lambda s:s))(final_model)
but none of these seem to work. Note that I would like the final model to be trainable, in particular the CNN part of it should remain trainable.
Not entirely sure this will work, but based on the comment made here, with a newer version of tensorflow + keras it should work:
final_model = TimeDistributed(Flatten())(final_input)
final_model = Masking(mask_value = -2.)(final_model)
final_model = TimeDistributed(Reshape(IMG_SIZE))(final_model)
final_model = TimeDistributed(base_model)(final_model)
final_model = Model(final_input,final_model)
I took a look at the source code of masking, and I noticed Keras creates a mask tensor that only reduces the last axis. As long as you're dealing with 5D tensors, it will cause no problem, but when you reduce the dimensions for the LSTM, this masking tensor becomes incompatible.
Doing the first flatten step, before masking, will assure that the masking tensor works properly for 3D tensors. Then you expand the image again to its original size.
I'll probably try to install newer versions soon to test it myself, but these installing procedures have caused too much trouble and I'm in the middle of something important here.
On my machine, this code compiles, but that strange error appears in prediction time (see link at the first line of this answer).
Creating a model for predicting the intermediate layers
I'm not sure, by the code I've seen, that the masking function is kept internally in tensors. I don't know exactly how it works, but it seems to be managed separately from the building of the functions inside the layers.
So, try using a keras standard model to make the predictions:
inp = final_model.input # input placeholder
outputs = [layer.output for layer in final_model.layers] # all layer outputs
fullModel = Model(inp,outputs)
layerPredictions = fullModel.predict(np.expand_dims(TEST_SAMPLE,0))
print(layerPredictions[-2])
It seems to be working as intended. Masking in Keras doesn't produce zeros as you would expect, it instead skips the timesteps that are masked in upstream layers such as LSTM and loss calculation. In case of RNNs, Keras (at least tensorflow) is implemented such that the states from the previous step are carried over, tensorflow_backend.py. This is done in part to preserve the shapes of tensors when dynamic input is given.
If you really want zeros you will have to implement your own layer with a similar logic to Masking and return zeros explicitly. To solve your problem, you need a mask before the final LSTM layer using the final_input:
class MyMask(Masking):
"""Layer that adds a mask based on initial input."""
def compute_mask(self, inputs, mask=None):
# Might need to adjust shapes
return K.any(K.not_equal(inputs[0], self.mask_value), axis=-1)
def call(self, inputs):
# We just return input back
return inputs[1]
def compute_output_shape(self, input_shape):
return input_shape[1]
final_model = MyMask(mask_value=-2.)([final_input, final_model])
You probably can attach the mask in a simpler manner but this custom class essentially adds a mask based on your initial inputs and outputs a Keras tensor that now has a mask.
Your LSTM will ignore in your example the second image. To confirm you can return_sequences=Trueand check that the output for 2 images are identical.
I'm trying implement the same thing, I want my LSTM sequences to have variable sizes. However I can't even implement your original model. I obtain the following error: TypeError: Layer input_1 does not support masking, but was passed an input_mask: Tensor("time_distributed_1/Reshape_1:0", shape=(?, 100, 100), dtype=bool) I'm using tensorflow 1.10 and keras 2.2.2
I solved the problem by adding a second input, a mask to specify which timesteps to take into account for the LSTM. That way the image sequence always has the same number of timesteps, the CNN always generates an output, but some of them are ignored for the LSTM input. However, the missing images need to be chosen carefully so that the batch normalization is not affected.
def LSTM_CNN(params):
resnet = ResNet50(include_top=False, weights='imagenet', pooling = 'avg')
input_layer = Input(shape=(params.numFrames, params.height, params.width, 3))
input_mask = Input(shape=(params.numFrames,1))
curr_layer = TimeDistributed(resnet)(input_layer)
resnetOutput = Dropout(0.5)(curr_layer)
curr_layer = multiply([resnetOutput,input_mask])
cnn_output = curr_layer
curr_layer = Masking(mask_value=0.0)(curr_layer)
lstm_out = LSTM(256, dropout=0.5)(curr_layer)
output = Dense(output_dim=params.numClasses, activation='sigmoid')(lstm_out)
model = Model([input_layer, input_mask], output)
return model

Initializing LSTM hidden state Tensorflow/Keras

Can someone explain how can I initialize hidden state of LSTM in tensorflow? I am trying to build LSTM recurrent auto-encoder, so after i have that model trained i want to transfer learned hidden state of unsupervised model to hidden state of supervised model.
Is that even possible with current API?
This is paper I am trying to recreate:
http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
Yes - this is possible but truly cumbersome. Let's go through an example.
Defining a model:
from keras.layers import LSTM, Input
from keras.models import Model
input = Input(batch_shape=(32, 10, 1))
lstm_layer = LSTM(10, stateful=True)(input)
model = Model(input, lstm_layer)
model.compile(optimizer="adam", loss="mse")
It's important to build and compile model first as in compilation the initial states are reset. Moreover - you need to specify a batch_shape where batch_size is specified as in this scenario our network should be stateful (which is done by setting a stateful=True mode.
Now we could set the values of initial states:
import numpy
import keras.backend as K
hidden_states = K.variable(value=numpy.random.normal(size=(32, 10)))
cell_states = K.variable(value=numpy.random.normal(size=(32, 10)))
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
Note that you need to provide states as a keras variables. states[0] holds hidden states and states[1] holds cell states.
Hope that helps.
As stated in the Keras API documentation for recurrent layers (https://keras.io/layers/recurrent/):
Note on specifying the initial state of RNNs
You can specify the initial state of RNN layers symbolically by calling them with the keyword argument initial_state. The value of initial_state should be a tensor or list of tensors representing the initial state of the RNN layer.
You can specify the initial state of RNN layers numerically by calling reset_states with the keyword argument states. The value of states should be a numpy array or list of numpy arrays representing the initial state of the RNN layer.
Since the LSTM layer has two states (hidden state and cell state) the value of initial_state and states is a list of two tensors.
Examples
Stateless LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
import tensorflow as tf
import numpy as np
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8)
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
Stateful LSTM
Input shape: (batch, timesteps, features) = (1, 10, 1)
Number of units in the LSTM layer = 8 (i.e. dimensionality of hidden and cell state)
Note that for stateful lstm you need to specify also batch_size.
import tensorflow as tf
import numpy as np
from pprint import pprint
inputs = np.random.random([1, 10, 1]).astype(np.float32)
lstm = tf.keras.layers.LSTM(8, stateful=True, batch_size=(1, 10, 1))
c_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
h_0 = tf.convert_to_tensor(np.random.random([1, 8]).astype(np.float32))
outputs = lstm(inputs, initial_state=[h_0, c_0])
With a Stateful LSTM, the states are not reset at the end of each sequence and we can notice that the output of the layer correspond to the hidden state (i.e. lstm.states[0]) at the last timestep:
>>> pprint(outputs)
<tf.Tensor: id=821, shape=(1, 8), dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>
>>>
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.07119043, 0.07012419, -0.06118739, -0.11008392, 0.00573938,
-0.05663438, 0.11196419, 0.02663924]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[ 0.14726108, 0.13584498, -0.12986949, -0.22309153, 0.0125412 ,
-0.11446435, 0.22290672, 0.05397629]], dtype=float32)>]
Calling reset_states() it is possible to reset the states:
>>> lstm.reset_states()
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>]
>>>
or to set them to a specific value:
>>> lstm.reset_states(states=[h_0, c_0])
>>> pprint(lstm.states)
[<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>,
<tf.Variable 'lstm_1/Variable:0' shape=(1, 8) dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>]
>>>
>>> pprint(h_0)
<tf.Tensor: id=422, shape=(1, 8), dtype=float32, numpy=
array([[0.59103394, 0.68249655, 0.04518601, 0.7800545 , 0.3799634 ,
0.27347744, 0.54415804, 0.9889024 ]], dtype=float32)>
>>>
>>> pprint(c_0)
<tf.Tensor: id=421, shape=(1, 8), dtype=float32, numpy=
array([[0.43390197, 0.28252542, 0.27139077, 0.19655049, 0.7568088 ,
0.05909375, 0.68569875, 0.19087408]], dtype=float32)>
>>>
I used this approach, totally worked out for me:
lstm_cell = LSTM(cell_num, return_state=True)
output, h, c = lstm_cell(input, initial_state=[h_prev, c_prev])
Assuming an RNN is in layer 1 and hidden/cell states are numpy arrays. You can do this:
from keras import backend as K
K.set_value(model.layers[1].states[0], hidden_states)
K.set_value(model.layers[1].states[1], cell_states)
States can also be set using
model.layers[1].states[0] = hidden_states
model.layers[1].states[1] = cell_states
but when I did it this way my state values stayed constant even after stepping the RNN.