Order of LSTM weights in Keras - tensorflow

I am trying to do a simple evaluation (i.e. forward pass) for a learned LSTM model and I cannot figure out in what order can f_t, i_t, o_t, c_in be extracted from z. It is my understanding that they are computed in bulk.
Here is the model architecture obtained using Keras:
My input sequence is:
input_seq = np.array([[[0.725323664],
[0.7671179],
[0.805884672]]])
The output should be:
[ 0.83467698]
Using Keras, I have obtained the following parameters for the first LSTM layer:
lstm_1_kernel_0 = np.array([[-0.40927699, -0.53539848, 0.40065038, -0.07722378, 0.30405849, 0.54959822, -0.23097005, 0.4720422, 0.05197877, -0.52746099, -0.5856396, -0.43691438]])
lstm_1_recurrent_kernel_0 = np.array([[-0.25504839, -0.0823682, 0.11609183, 0.41123426, 0.03409858, -0.0647027, -0.59183347, -0.15359771, 0.21647622, 0.24863823, 0.46169096, -0.21100986],
[0.29160395, 0.46513283, 0.33996364, -0.31195125, -0.24458826, -0.09762905, 0.16202784, -0.01602131, 0.34460208, 0.39724654, 0.31806156, 0.1102117],
[-0.15919448, -0.33053166, -0.22857222, -0.04912394, -0.21862955, 0.55346996, 0.38505834, 0.18110731, 0.270677, -0.02759281, 0.42814475, -0.13496138]])
lstm_1_bias_0 = np.array([0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.])
# LSTM 1
z_1_lstm_1 = np.dot(x_1_lstm_1, lstm_1_kernel_0) + np.dot(h_0_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
i_1_lstm_1 = z_1_lstm_1[0, 0:3]
f_1_lstm_1 = z_1_lstm_1[0, 3:6]
input_to_c_1_lstm_1 = z_1_lstm_1[0, 6:9]
o_1_lstm_1 = z_1_lstm_1[0, 9:12]
So the question is what is the correct order for i_1_lstm_1, f_1_lstm_1, input_to_c_1_lstm_1, o_1_lstm_1 ?

It's (i, f, c, o). In recurrent.py, in LSTMCell, the weights are constructed by:
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units: self.units * 2]
self.bias_c = self.bias[self.units * 2: self.units * 3]
self.bias_o = self.bias[self.units * 3:]

Related

emcee generates the same samples twice

I'm using emcee to generate samples with a given ln_prob twice, but both times yield the exact same samples.
I am using the same initial state for both samplers, but I don't see why it should matter.
Am I wrong thinking that it should yield different results?
import emcee
import numpy as np
NWALKERS = 32
NDIM = 2
NSAMPLES = 1000
def ln_gaussian(x):
# mu = 0, cov = 1
a = (2*np.pi)** -0.5
return np.log(a * np.exp(-0.5 * np.dot(x,x)))
p0 = np.random.rand(NWALKERS, NDIM)
sampler1 = emcee.EnsembleSampler(NWALKERS, NDIM, ln_gaussian)
sampler2 = emcee.EnsembleSampler(NWALKERS, NDIM, ln_gaussian)
state1 = sampler1.run_mcmc(p0, 100) # burn in
state2 = sampler2.run_mcmc(p0, 100) # burn in
sampler1.reset()
sampler2.reset()
# run sampler 1k times (x32 walkers)
sampler1.run_mcmc(state1, NSAMPLES)
sampler2.run_mcmc(state2, NSAMPLES)
s1 = sampler1.get_chain(flat=True)
s2 = sampler2.get_chain(flat=True)
s1 - s2
The output is
array([[0., 0.],
[0., 0.],
[0., 0.],
...,
[0., 0.],
[0., 0.],
[0., 0.]])
If I use different initial states
p0 = np.random.rand(NWALKERS, NDIM)
p1 = np.random.rand(NWALKERS, NDIM)
it yields different samples
array([[-0.70474519, -0.09671908],
[-0.31555036, -0.33661664],
[ 0.75735537, 0.01540277],
...,
[ 2.84810783, -2.11736446],
[-0.55164227, -0.26478868],
[ 0.01301593, -1.76233017]])
But why should it matter? I thought it's random.

Equivalent of np.isin for TensorFlow

I have categories as a list of list integers as shown below:
categories = [
[0,2,4,6,8],
[1,3,5,7,9]
]
I have a label tensor y with num_batches integers (as classes):
y = tf.constant([0, 1, 1, 2, 5, 4, 7, 9, 3, 3])
I want to replace values in y with certain indices (let's say 0-even, 1-odd) with the categories list available, such that final result would be:
cat_labels = tf.constant([0, 1, 1, 0, 1, 0, 1, 1, 1, 1])
I can get it by iterating through each value in y like below:
cat_labels = tf.Variable(tf.identity(y))
for idx in range(len(categories)):
for i, _y in enumerate(y):
if _y in categories[idx]: # if _y value is in categories[idx]
cat_labels[i].assign(idx) # replace all of them with idx
But apparently iterating is not allowed when this block is encapsulated in a #tf.function parent function.
Is there a way to apply the logic without iterating, or converting to numpy and applying np.isin, while getting speedups of tf.function?
Edit: There seem to be workarounds on this like here, but any help on explaining in the context of this use case would be appreciated.
You can try this:
y = tf.constant([0., 1., 1., 2., 5., 4., 7., 9., 3., 3.], dtype=tf.float32)
categories = [[0,2,4,6,8],[1,3,5,7,9]]
c = tf.convert_to_tensor(categories, dtype=tf.float32)
cat_labels = tf.map_fn( # apply an operation on all of the elements of Y
lambda x:tf.gather_nd( # get index of category: 0 or 1 or anything else
tf.cast( # cast dtype of the result of the inner function
tf.where( # get index of the element of Y in categories
tf.equal(c, x)), # search an element of Y within categories
dtype=tf.float32),[0,0]), y)
tf.print(cat_labels, summarize=-1)
# [0 1 1 0 1 0 1 1 1 1]

Tensorflow to PyTorch

I'm transffering a Tensorflow code to a PyTorch code.
Below lines are the problem I couldn't solve yet.
I'm not familiar with PyTorch so that it's not easy for me to find the matching methods in PyTorch library.
Anyone can help me?
p.s. The shape of alpha is (batch, N).
alpha_cumsum = tf.cumsum(alpha, axis = 1)
len_batch = tf.shape(alpha_cumsum)[0]
rand_prob = tf.random_uniform(shape = [len_batch, 1], minval = 0., maxval = 1.)
alpha_relu = tf.nn.relu(rand_prob - alpha_cumsum)
alpha_index = tf.count_nonzero(alpha_relu, 1)
alpha_hard = tf.one_hot(alpha_index, len(a))
I've put all your functions followed by the corresponding pytorch function. Most are the same name and put in the pytorch docs (https://pytorch.org/docs/stable/index.html)
tf.cumsum(alpha, axis = 1)
torch.cumsum(alpha, dim=1)
tf.shape(alpha_cumsum)[0]
alpha_cumsum.shape[0]
tf.random_uniform(shape = [len_batch, 1], minval = 0., maxval = 1.)
torch.rand([len_batch,1])
tf.nn.relu(rand_prob - alpha_cumsum)
torch.nn.functional.relu(rand_prob - alpha_cumsum)
tf.count_nonzero(alpha_relu, 1)
torch.count_nonzero(alpha_relu, dim=1)
tf.one_hot(alpha_index, len(a))
torch.nn.functional.one_hot(alpha_index, len(a)) # assuming len(a) is number of classes

Foward pass in LSTM netwok learned by keras

I have the following code that I am hoping to get a forward pass from a 2 layer LSTM:
"""
this is a simple numerical example of LSTM forward pass to allow deep understanding
the LSTM is trying to learn the sin function by learning to predict the next value after a sequence of 3 inputs
example 1: {0.583, 0.633, 0.681} --> {0.725}, these values correspond to
{sin(35.66), sin(39.27}, sin(42.92)} --> {sin(46.47)}
example 2: {0.725, 0.767, 0.801} --> {0.849}, these values correspond to
{sin(46.47), sin(50.09), sin(53.23)} --> {sin(58.10)}
example tested: [[['0.725323664']
['0.7671179']
['0.805884672']]]
predicted_instance: [ 0.83467698]
training example pair: [['0.680666907']
['0.725323664']
['0.7671179']] 0.805884672
"""
import numpy as np
# linear activation matrix-wise (works also element-wise)
def linear(x):
return x
# sigmoid function matrix-wise (works also element-wise)
def sigmoid(x):
return 1/(1 + np.exp(-x))
# hard sigmoid function element wise
def hard_sig(x):
# in Keras for both tensorflow and theano backend
return np.max(np.array([0.0, np.min(np.array([1.0, x * 0.2 + 0.5]))]))
# Courbariaux et al. 2016 (Binarized Neural Networks)
# return np.max(np.array([0.0, np.min(np.array([1.0, (x + 1.0)/2.0]))]))
# hard sigmoid function matrix wise
def hard_sigmoid(x, fun=hard_sig):
return np.vectorize(fun)(x)
# hyperbolic tangent function matrix wise (works also element-wise)
def hyperbolic_tangent(x):
return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))
print(sigmoid(np.array([-100, 0, 100])))
print(hard_sigmoid(np.array([-100, 0, 0.1, 100])))
print(hyperbolic_tangent(np.array([-100, 0, 100])))
parameter_names = ['lstm_1_kernel_0.npy',
'lstm_1_recurrent_kernel_0.npy',
'lstm_1_bias_0.npy',
'lstm_2_kernel_0.npy',
'lstm_2_recurrent_kernel_0.npy',
'lstm_2_bias_0.npy',
'dense_1_kernel_0.npy',
'dense_1_bias_0.npy']
# LSTM 1 Weights
lstm_1_kernel_0 = np.load('lstm_1_kernel_0.npy')
print('lstm_1_kernel_0: ', lstm_1_kernel_0.shape)
lstm_1_recurrent_kernel_0 = np.load('lstm_1_recurrent_kernel_0.npy')
print('lstm_1_recurrent_kernel_0: ', lstm_1_recurrent_kernel_0.shape)
lstm_1_bias_0 = np.load('lstm_1_bias_0.npy')
print('lstm_1_bias_0: ', lstm_1_bias_0.shape)
# LSTM 2 Wights
lstm_2_kernel_0 = np.load('lstm_2_kernel_0.npy')
print('lstm_2_kernel_0: ', lstm_2_kernel_0.shape)
lstm_2_recurrent_kernel_0 = np.load('lstm_2_recurrent_kernel_0.npy')
print('lstm_2_recurrent_kernel_0: ', lstm_2_recurrent_kernel_0.shape)
lstm_2_bias_0 = np.load('lstm_2_bias_0.npy')
print('lstm_2_bias_0: ', lstm_2_bias_0.shape)
# Dense layer
dense_1_kernel_0 = np.load('dense_1_kernel_0.npy')
print('dense_1_kernel_0: ', dense_1_kernel_0.shape)
dense_1_bias_0 = np.load('dense_1_bias_0.npy')
print('dense_1_bias_0: ', dense_1_bias_0.shape)
time_seq = [0, 1, 2]
"""
input_seq = np.array([[[0.725323664],
[0.7671179],
[0.805884672]]])
"""
input_seq = np.array([[[0.680666907],
[0.725323664],
[0.7671179]]])
print('input_seq: ', input_seq.shape)
for time in time_seq:
print('input t', time, ':', input_seq[0, time, 0])
"""
# z0 = z[:, :self.units]
# z1 = z[:, self.units: 2 * self.units]
# z2 = z[:, 2 * self.units: 3 * self.units]
# z3 = z[:, 3 * self.units:]
# i = self.recurrent_activation(z0)
# f = self.recurrent_activation(z1)
# c = f * c_tm1 + i * self.activation(z2)
# o = self.recurrent_activation(z3)
# activation =' tanh'
# recurrent_activation = 'hard_sigmoid'
"""
# LSTM 1
x_1_lstm_1 = input_seq[0, 0, 0]
print('x_1: ', x_1_lstm_1)
x_2_lstm_1 = input_seq[0, 1, 0]
print('x_2: ', x_2_lstm_1)
x_3_lstm_1 = input_seq[0, 2, 0]
print('x_3: ', x_3_lstm_1)
c_0_lstm_1 = np.zeros((1, 3))
h_0_lstm_1 = np.zeros((1, 3))
z_1_lstm_1 = np.dot(x_1_lstm_1, lstm_1_kernel_0) + np.dot(h_0_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_1_lstm_1.shape)
i_1_lstm_1 = sigmoid(z_1_lstm_1[:, 0:3])
f_1_lstm_1 = sigmoid(z_1_lstm_1[:, 3:6])
input_to_c_1_lstm_1 = z_1_lstm_1[:, 6:9]
o_1_lstm_1 = sigmoid(z_1_lstm_1[:, 9:12])
c_1_lstm_1 = np.multiply(f_1_lstm_1, c_0_lstm_1) + np.multiply(i_1_lstm_1, hyperbolic_tangent(input_to_c_1_lstm_1))
h_1_lstm_1 = np.multiply(o_1_lstm_1, hyperbolic_tangent(c_1_lstm_1))
print('h_1_lstm_1: ', h_1_lstm_1.shape, h_1_lstm_1)
z_2_lstm_1 = np.dot(x_2_lstm_1, lstm_1_kernel_0) + np.dot(h_1_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_2_lstm_1.shape)
i_2_lstm_1 = sigmoid(z_2_lstm_1[:, 0:3])
f_2_lstm_1 = sigmoid(z_2_lstm_1[:, 3:6])
input_to_c_2_lstm_1 = z_2_lstm_1[:, 6:9]
o_2_lstm_1 = sigmoid(z_2_lstm_1[:, 9:12])
c_2_lstm_1 = np.multiply(f_2_lstm_1, c_1_lstm_1) + np.multiply(i_2_lstm_1, hyperbolic_tangent(input_to_c_2_lstm_1))
h_2_lstm_1 = np.multiply(o_2_lstm_1, hyperbolic_tangent(c_2_lstm_1))
print('h_2_lstm_1: ', h_2_lstm_1.shape, h_2_lstm_1)
z_3_lstm_1 = np.dot(x_3_lstm_1, lstm_1_kernel_0) + np.dot(h_2_lstm_1, lstm_1_recurrent_kernel_0) + lstm_1_bias_0
print(z_3_lstm_1.shape)
i_3_lstm_1 = sigmoid(z_3_lstm_1[:, 0:3])
f_3_lstm_1 = sigmoid(z_3_lstm_1[:, 3:6])
input_to_c_3_lstm_1 = z_3_lstm_1[:, 6:9]
o_3_lstm_1 = sigmoid(z_3_lstm_1[:, 9:12])
c_3_lstm_1 = np.multiply(f_3_lstm_1, c_2_lstm_1) + np.multiply(i_3_lstm_1, hyperbolic_tangent(input_to_c_3_lstm_1))
h_3_lstm_1 = np.multiply(o_3_lstm_1, hyperbolic_tangent(c_3_lstm_1))
print('h_3_lstm_1: ', h_3_lstm_1.shape, h_3_lstm_1)
# LSTM 2
x_1_lstm_2 = h_1_lstm_1
x_2_lstm_2 = h_2_lstm_1
x_3_lstm_2 = h_3_lstm_1
c_0_lstm_2 = np.zeros((1, 1))
h_0_lstm_2 = np.zeros((1, 1))
z_1_lstm_2 = np.dot(x_1_lstm_2, lstm_2_kernel_0) + np.dot(h_0_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_1_lstm_2.shape)
i_1_lstm_2 = sigmoid(z_1_lstm_2[:, 0])
f_1_lstm_2 = sigmoid(z_1_lstm_2[:, 1])
input_to_c_1_lstm_2 = z_1_lstm_2[:, 2]
o_1_lstm_2 = sigmoid(z_1_lstm_2[:, 3])
c_1_lstm_2 = np.multiply(f_1_lstm_2, c_0_lstm_2) + np.multiply(i_1_lstm_2, hyperbolic_tangent(input_to_c_1_lstm_2))
h_1_lstm_2 = np.multiply(o_1_lstm_2, hyperbolic_tangent(c_1_lstm_2))
print('h_1_lstm_2: ', h_1_lstm_2.shape, h_1_lstm_2)
z_2_lstm_2 = np.dot(x_2_lstm_2, lstm_2_kernel_0) + np.dot(h_1_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_2_lstm_2.shape)
i_2_lstm_2 = sigmoid(z_2_lstm_2[:, 0])
f_2_lstm_2 = sigmoid(z_2_lstm_2[:, 1])
input_to_c_2_lstm_2 = z_2_lstm_2[:, 2]
o_2_lstm_2 = sigmoid(z_2_lstm_2[:, 3])
c_2_lstm_2 = np.multiply(f_2_lstm_2, c_1_lstm_2) + np.multiply(i_2_lstm_2, hyperbolic_tangent(input_to_c_2_lstm_2))
h_2_lstm_2 = np.multiply(o_2_lstm_2, hyperbolic_tangent(c_2_lstm_2))
print('h_2_lstm_2: ', h_2_lstm_2.shape, h_2_lstm_2)
z_3_lstm_2 = np.dot(x_3_lstm_2, lstm_2_kernel_0) + np.dot(h_2_lstm_2, lstm_2_recurrent_kernel_0) + lstm_2_bias_0
print(z_3_lstm_2.shape)
i_3_lstm_2 = sigmoid(z_3_lstm_2[:, 0])
f_3_lstm_2 = sigmoid(z_3_lstm_2[:, 1])
input_to_c_3_lstm_2 = z_3_lstm_2[:, 2]
o_3_lstm_2 = sigmoid(z_3_lstm_2[:, 3])
c_3_lstm_2 = np.multiply(f_3_lstm_2, c_2_lstm_2) + np.multiply(i_3_lstm_2, hyperbolic_tangent(input_to_c_3_lstm_2))
h_3_lstm_2 = np.multiply(o_3_lstm_2, hyperbolic_tangent(c_3_lstm_2))
print('h_3_lstm_2: ', h_3_lstm_2.shape, h_3_lstm_2)
output = np.dot(h_3_lstm_2, dense_1_kernel_0) + dense_1_bias_0
print('output: ', output)
The weights have been saved to file at train time and they can be retrieved from the following location:
LSTM weights
In order to create the LSTM which is fitting a sinwave signal I have used the following code in Keras:
def build_simple_model(layers):
model = Sequential()
model.add(LSTM(input_shape=(layers[1], layers[0]),
output_dim=layers[1],
return_sequences=True,
activation='tanh',
recurrent_activation='sigmoid')) # 'hard_sigmoid'
# model.add(Dropout(0.2))
model.add(LSTM(layers[2],
return_sequences=False,
activation='tanh',
recurrent_activation='sigmoid')) # 'hard_sigmoid'
# model.add(Dropout(0.2))
model.add(Dense(output_dim=layers[3]))
model.add(Activation("linear"))
start = time.time()
model.compile(loss="mse", optimizer="rmsprop")
print("> Compilation Time : ", time.time() - start)
plot_model(model, to_file='lstm_model.png', show_shapes=True, show_layer_names=True)
print(model.summary())
return model
This resulted in the following model:
I have used the training procedure as follows:
seq_len = 3
model = lstm.build_simple_model([1, seq_len, 1, 1])
model.fit(X_train,
y_train,
batch_size=512,
nb_epoch=epochs,
validation_split=0.05)
Would it be possible to understand why my forward pass does not produce the desired output in predicting a future sin() signal value based on three previous consecutive ones.
The original example on which I am trying to base my forward pass exercise originates here. The weights uploaded in .npy format are from a network that is able to perfectly predict the next sin() value in a series.
I realised what the problem was. I was trying to extract my model weights using Tensorflow session (after model fitting), rather than via Keras methods directly. This resulted in weights matrices that made perfect sense (dimension wise) but contained the values from initialization step.
model.fit(X_train,
y_train,
batch_size=batch_size,
nb_epoch=epochs,
validation_split=0.05,
callbacks=callbacks_list)
print('n_parameters: ', len(model.weights))
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
parameter_names = ['lstm_1_kernel_0',
'lstm_1_recurrent_kernel_0',
'lstm_1_bias_0',
'lstm_2_kernel_0',
'lstm_2_recurrent_kernel_0',
'lstm_2_bias_0',
'dense_1_kernel_0',
'dense_1_bias_0']
weights = model.get_weights()
trainable_weights = model.trainable_weights
for parameter in range(len(model.weights)):
print('')
# using Keras methods is the correct way
print('parameter: ', trainable_weights[parameter])
print('parameter Keras: ', weights[parameter])
# using session with TF is the wrong way
print('parameter TF: ', model.weights[parameter].eval(session=sess))
#np.save(parameter_names[parameter], model.weights[parameter].eval(session=sess))
#np.save(parameter_names[parameter], weights[parameter])
This prints the following to screen:
parameter: <tf.Variable 'lstm_1/kernel:0' shape=(1, 12) dtype=float32_ref>
parameter Keras: [[ 0.02005039 0.59627813 -0.77670902 -0.17643917 0.64905447 -0.49418128
0.01204901 0.79791737 -1.58887422 -0.3566488 0.67758918 0.77245694]]
parameter TF: [[-0.20346385 -0.07166874 -0.58842945 0.03744811 0.46911311 -0.0469712
-0.07291448 0.27316415 -0.53298378 0.08367682 0.10194337 0.20933461]]
parameter: <tf.Variable 'lstm_1/recurrent_kernel:0' shape=(3, 12) dtype=float32_ref>
parameter Keras: [[ 0.01916649 -0.30881727 -0.07018201 0.28770521 -0.45713434 -0.33738521
0.53091544 -0.78456688 0.50647908 0.12326431 -0.18517831 -0.28752103]
[ 0.44490865 -0.09020164 1.00983524 0.43070397 -0.14646551 -0.53908533
1.33833826 0.76106179 -1.28808987 0.71029669 -0.19338571 -0.30499896]
[ 0.76727188 -0.10291406 0.53285897 0.31021088 0.46876401 0.04961515
0.0573149 1.17765784 -0.45716232 0.26181531 0.60458028 -0.6042906 ]]
parameter TF: [[-0.044281 -0.42013288 -0.06702472 0.16710882 0.07229936 0.20263752
0.01935999 -0.65925431 0.21676332 0.02481769 0.50321299 -0.08369029]
[-0.17725646 -0.14031938 -0.07758044 -0.39292315 0.36675838 -0.20198873
0.59491426 -0.12469263 0.14705807 0.39603388 -0.25511321 -0.01221756]
[ 0.51603764 0.34401873 0.36002275 0.05344227 -0.00293417 -0.36086732
0.1636388 -0.24916036 0.09064917 -0.04246153 0.05563453 -0.5006755 ]]
parameter: <tf.Variable 'lstm_1/bias:0' shape=(12,) dtype=float32_ref>
parameter Keras: [ 3.91339064e-01 -2.09703773e-01 -4.88098420e-04 1.15376031e+00
6.24452651e-01 2.24053934e-01 4.06851530e-01 4.78419960e-01
1.77846551e-01 3.19107175e-01 5.16630232e-01 -2.22970009e-01]
parameter TF: [ 0. 0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0.]
parameter: <tf.Variable 'lstm_2/kernel:0' shape=(3, 4) dtype=float32_ref>
parameter Keras: [[ 2.01334882 1.9168334 1.77633524 -0.90856379]
[ 1.17618477 1.02978265 -0.06435115 0.66180402]
[-1.33014703 -0.71629387 -0.87376142 1.35648465]]
parameter TF: [[ 0.83115911 0.72150767 0.51600969 -0.52725452]
[ 0.53043616 0.59162521 -0.59219611 0.0951736 ]
[-0.8030411 -0.00424314 -0.06715947 0.67533839]]
parameter: <tf.Variable 'lstm_2/recurrent_kernel:0' shape=(1, 4) dtype=float32_ref>
parameter Keras: [[-0.09348518 -0.7667768 0.24031806 -0.39155772]]
parameter TF: [[-0.085137 -0.59010917 0.61000961 -0.52193022]]
parameter: <tf.Variable 'lstm_2/bias:0' shape=(4,) dtype=float32_ref>
parameter Keras: [ 1.21466994 2.22224903 1.34946632 0.19186479]
parameter TF: [ 0. 1. 0. 0.]
parameter: <tf.Variable 'dense_1/kernel:0' shape=(1, 1) dtype=float32_ref>
parameter Keras: [[ 2.69569159]]
parameter TF: [[ 1.5422312]]
parameter: <tf.Variable 'dense_1/bias:0' shape=(1,) dtype=float32_ref>
parameter Keras: [ 0.20767514]
parameter TF: [ 0.]
The forward pass code was therefore correct.The weights were wrong.The correct weights .npy files have also been updated at the link mentioned in the question. This forward pass can be used to illustrate sequence generation with LSTM by recycling the output.

numpy vectorization of interdependent arrays

I need to populate two interdependent arrays simultaneously, based on their previous element, like so:
import numpy as np
a = np.zeros(100)
b = np.zeros(100)
c = np.random.random(100)
for num in range(1, len(a)):
a[num] = b[num-1] + c[num]
b[num] = b[num-1] + a[num]
Is there a way to truly vectorize this (i.e. not using numpy.vectorize) using numpy? Note that these are arbitrary arrays, not looking for a solution for these specific values.
As mentioned in #Praveen's post, we can write those expressions for few iterations trying to find the closed form and that would be a triangular matrix of course for c. Then, we just need to add in iteratively-scaled b[0] to get full b. To get a, we simply add shifted versions of b and c.
So, implementation-wise here's a different take on it using some NumPy broadcasting and dot-product for efficiency purposes -
p = 2**np.arange(a.size-1)
scale1 = p[:,None]//p
b_out = np.append(b[0],scale1.dot(c[1:]) + 2*p*b[0])
a_out = np.append(a[0],b_out[:-1] + c[1:])
If a and b are meant to be always start as 0, the code for the last two steps would simplify to -
b_out = np.append(0,scale1.dot(c[1:]))
a_out = np.append(0,b_out[:-1] + c[1:])
Yes there is:
c = np.arange(100)
a = 2 ** c - 1
b = numpy.cumsum(a)
Clearly, the updates are:
a_i = b_i-1 + c_i
b_i = 2*b_i-1 + c_i
Writing out the recursion,
b_0 = c_0 # I'm not sure if c_0 is to be used
b_1 = 2*b_0 + c_1
= 2*c_0 + c_1
b_2 = 2*b_1 + c_2
= 2*(2*c_0 + c_1) + c_2
= 4*c_0 + 2*c_1 + c_2
b_3 = 2*b_2 + c_3
= 2*(4*c_0 + 2*c_1 + c_2) + c_3
= 8*c_0 + 4*c_1 + 2*c_2 + c_3
So it would seem that
b_i = np.sum((2**np.arange(i+1))[::-1] * c[:i])
a_i = b_i-1 + c_i
It's not possible to do a cumulative sum here, because the coefficient of c_i keeps changing.
The easiest way to fully vectorize this is to probably just use a giant matrix. If c has size N:
t = np.zeros((N, N))
x, y = np.tril_indices(N)
t[x, y] = 2 ** (x - y)
This gives us:
>>> t
array([[ 1., 0., 0., 0.],
[ 2., 1., 0., 0.],
[ 4., 2., 1., 0.],
[ 8., 4., 2., 1.]])
So now you can do:
b = np.sum(t * c, axis=1)
a = np.zeros(N)
a[1:] = b[:-1] + c[1:]
I probably wouldn't recommend this solution. From what little I know of computational methods, this doesn't seem numerically stable for large N. But I have the feeling that this would be true of any vectorized solution which performs the summation at the end. Maybe you should try both the for-loop and this piece of code out and see if your errors keep blowing up with the vectorized solution.