LSTM calculation with Vanilla Numpy - tensorflow

I've compared LSTM result with Keras/Tensorflow calculation and Numpy calculation. However, the result is slightly different:
Numpy: [[ 0.16315128 -0.04277606 0.26504123 0.08014129 0.38561829]]
Keras: [[ 0.16836338 -0.04930305 0.25080156 0.08938988 0.3537751 ]]
Keras' LSTM implementation does not use tf.contrib.rnn but Keras directly manages the parameters, and tf.matmul is used to calculate. I found the corresponding implementation of Keras and tried the same calculation with Numpy, but the values are slightly different as shown above.
I have checked the formula several times and it seems like the same. The only difference is the differences between tf.matmul or np.dot. Maybe there are some differences about decimal point calculation method. Even so, I think the results are too much different. The biggest difference is about 10%. I'd like to match the Numpy calculation with the tensorflow calculation. If someone could give me some hint or point me to the right implementation, I'd really appreciate it.
Keras implementation and the Numpy code implemented myself:
Keras: https://github.com/keras-team/keras/blob/master/keras/layers/recurrent.py#L1921-L1948
Numpy: https://github.com/likejazz/jupyter-notebooks/blob/master/deep-learning/lstm-keras-inspect.py

The default value of recurrent_activation is 'hard_sigmoid' for Keras LSTM layer. However, the original sigmoid function is used in your NumPy implementation.
So you can either change the recurrent_activation argument to 'sigmoid',
model.add(LSTM(5, input_shape=(8, 3), recurrent_activation='sigmoid'))
or use the "hard" sigmoid function in your NumPy code.
def hard_sigmoid(x):
return np.clip(0.2 * x + 0.5, 0, 1)

Related

Implement BatchNormalization layer from get_weights using numpy

I need to re-implement a model (only for inferencing) that has been trained using Keras in numpy. I have gotten all the weights/biases using model.get_weights() and stored them as a pickle file, to use during inference. While I can implement all operations like matrix multiplication, bias addition, sigmoid etc. I am not able to implement a batch normalization layer.
When I save the weights I get the following shapes for my BatchNormalization layer.
(1000,)
(1000,)
(1000,)
(1000,)
I researched and found that these are in the following order, gamma, beta, moving mean and std. I used the following numpy operation for this layer,
output = gamma * (input - mean) / std + beta
But its not giving me the same results as when I do model.predict(). Am I doing something wrong? FYI I also tried np.sqrt(std), that also gives me the incorrect results. Are these four not enough to get our outputs, do we need anything more?

Implementing backprop in numpy

I a trying to implement backprop in numpy by defining a function that performs some kind operation given an input, weight matrix and bias, and returns the output with the backward function, which can be used to update weights.
Currently this is my code , however I think there are some bugs in the derivation, as the gradients for the W1 matrix are too large. Here is a pytorch implementation for the same thing as a reference torch.
Any help is appreciated.

how to convert pytorch adaptive_avg_pool2d method to keras or tensorflow

I don't know how to convert the PyTorch method adaptive_avg_pool2d to Keras or TensorFlow. Anyone can help?
PyTorch mehod is
adaptive_avg_pool2d(14,[14])
I tried to use the average pooling, the reshape the tensor in Keras, but got the error:
ValueError: total size of new array must be unchanged
I'm not sure if I understood your question, but in PyTorch, you pass the spatial dimensions to AdaptiveAvgPool2d. For instance, if you want to have an output sized 5x7, you can use nn.AdaptiveAvgPool2d((5,7)).
If you want a global average pooling layer, you can use nn.AdaptiveAvgPool2d(1). In Keras you can just use GlobalAveragePooling2D.
For other output sizes in Keras, you need to use AveragePooling2D, but you can't specify the output shape directly. You need to calculate/define the pool_size, stride, and padding parameters depending on how you want the output shape. If you need help with the calculations, check this page of CS231n course.

Get values of tensors in loss function

I would like to get the values of the y_pred and y_true tensors of this keras backend function. I need this to be able to perform some custom calculations and change the loss, these calculations are just possible with the real array values.
def mean_squared_error(y_true, y_pred):
#some code here
return K.mean(K.square(y_pred - y_true), axis=-1)
There is a way to do this in keras? Or in any other ML framework (tf, pytorch, theano)?
No, in general you can't compute the loss that way, because Keras is based on frameworks that do automatic differentiation (like Theano, TensorFlow) and they need to know which operations you are doing in between in order to compute the gradients of the loss.
You need to implement your loss computations using keras.backend functions, else there is no way to compute gradients and optimization won't be possible.
Try including this within the loss function:
y_true = keras.backend.print_tensor(y_true, message='y_true')
Following is an excerpt from the Keras documentation (https://keras.io/backend/):
print_tensor
keras.backend.print_tensor(x, message='')
Prints message and the tensor value when evaluated.
Note that print_tensor returns a new tensor identical to x which should be used in the later parts of the code. Otherwise, the print operation is not taken into account during evaluation.

How to wrap a custom TensorFlow loss function in Keras?

This is my third attempt to get a deep learning project off the ground. I'm working with protein sequences. First I tried TFLearn, then raw TensorFlow, and now I'm trying Keras.
The previous two attempts taught me a lot, and gave me some code and concepts that I can re-use. However there has always been an obstacle, and I've asked questions that the developers can't answer (in the case of TFLearn), or I've simply gotten bogged down (TensorFlow object introspection is tedious).
I have written this TensorFlow loss function, and I know it works:
def l2_angle_distance(pred, tgt):
with tf.name_scope("L2AngleDistance"):
# Scaling factor
count = tgt[...,0,0]
scale = tf.to_float(tf.count_nonzero(tf.is_finite(count)))
# Mask NaN in tgt
tgt = tf.where(tf.is_nan(tgt), pred, tgt)
# Calculate L1 losses
losses = tf.losses.cosine_distance(pred, tgt, -1, reduction=tf.losses.Reduction.NONE)
# Square the losses, then sum, to get L2 scalar loss.
# Divide the loss result by the scaling factor.
return tf.reduce_sum(losses * losses) / scale
My target values (tgt) can include NaN, because my protein sequences are passed in a 4D Tensor, despite the fact that the individual sequences differ in length. Before you ask, the data can't be resampled like an image. So I use NaN in the tgt Tensor to indicate "no prediction needed here." Before I calculate the L2 cosine loss, I replace every NaN with the matching values in the prediction (pred) so the loss for every NaN is always zero.
Now, how can I re-use this function in Keras? It appears that the Keras Lambda core layer is not a good choice, because a Lambda only takes a single argument, and a loss function needs two arguments.
Alternately, can I rewrite this function in Keras? I shouldn't ever need to use the Theano or CNTK backend, so it isn't necessary for me to rewrite my function in Keras. I'll use whatever works.
I just looked at the Keras losses.py file to get some clues. I imported keras.backend and had a look around. I also found https://keras.io/backend/. I don't seem to find wrappers for ANY of the TensorFlow function calls I happen to use: to_float(), count_nonzero(), is_finite(), where(), is_nan(), cosine_distance(), or reduce_sum().
Thanks for your suggestions!
I answered my own question. I'm posting the solution for anyone who may come across this same problem.
I tried using my TF loss function directly in Keras, as was independently suggested by Matias Valdenegro. I did not provoke any errors from Keras by doing so, however, the loss value went immediately to NaN.
Eventually I identified the problem. The calling convention for a Keras loss function is first y_true (which I called tgt), then y_pred (my pred). But the calling convention for a TensorFlow loss function is pred first, then tgt. So if you want to keep a Tensorflow-native version of the loss function around, this fix works:
def keras_l2_angle_distance(tgt, pred):
return l2_angle_distance(pred, tgt)
<snip>
model.compile(loss = keras_l2_angle_distance, optimizer = "something")
Maybe Theano or CNTK uses the same parameter order as Keras, I don't know. But I'm back in business.
You don't need to use keras.backend, as your loss is directly written in TensorFlow, then you can use it directly in Keras. The backend functions are an abstraction layer so you can code a loss/layer that will work with the multiple available backends in Keras.
You just have to put your loss in the model.compile call:
model.compile(loss = l2_angle_distance, optimizer = "something")