How do I flip a Tensor in Keras? - tensorflow

For example: I have a tensor with shape (5,10) and I want back a tensor with shape (5,10) but the first element should now be the last element. so [1,2,3,4,5]becomes [5,4,3,2,1] and [[1,2,3,4,5],[2,3,4,5,6]] becomes [[2,3,4,5,6],[1,2,3,4,5]].
If it matter, I am using tensorflow backend.

Using the Keras backend, there is the reverse function.
import keras.backend as K
flipped = K.reverse(x,axes=0)
For using it in a layer, you can create a Lambda layer:
from keras.layers import *
layer = Lambda(lambda x: K.reverse(x,axes=0),output_shape=(shape of x))
(If it's a sequential layer, model.add(layer), if a functional API model, output = layer(input)

Related

Learning a Categorical Variable with TensorFlow Probability

I would like to use TFP to write a neural network where the output are the probabilities of a categorical variable with 3 classes, and train it using the negative log-likelihood.
As I'm moving my first steps with TF and TFP, I started with a toy model where the input layer has only 1 unit receiving a null input, and the output layer has 3 units with softmax activation function. The idea is that the biases should learn (up to an additive constant) the log of the probabilities.
Here below is my code, true_p are the true parameters I use to generate the data and I would like to learn, while learned_p is what I get from the NN.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll
from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions
# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000
# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))
# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)
model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)
# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)
# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p
With this setup, I get the result:
>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)
I over-estimate the second category, and can't really distinguish between the first and the third one. What's worst, if I plot the probabilities at the end of each epoch, it looks like they are converging monotonically to the vector [0,1,0], which doesn't make sense (it seems to me the gradient should push in the opposite direction once I start to over-estimate).
I really can't figure out what's going on here, but have the feeling I'm doing something plain wrong. Any idea? Thank you for your help!
For the record, I also tried using other optimizers like Adam or Adagrad playing with the hyper-params, but with no luck.
I'm using Python 3.7.9, TensorFlow 2.3.1 and TensorFlow probability 0.11.1
I believe the default argument to Categorical is not the vector of probabilities, but the vector of logits (values you'd take softmax of to get probabilities). This is to help maintain precision in internal Categorical computations like log_prob. I think you can simply eliminate the softmax activation function and it should work. Please update if it doesn't!
EDIT: alternatively you can replace the tfd.Categorical with
lambda p: tfd.Categorical(probs=p)
but you'll lose the aforementioned precision gains. Just wanted to clarify that passing probs is an option, just not the default.

Unable to track record by record processing in LSTM algorithm for text classification?

We are working on multi-class text classification and following is the process which we have used.
1) We have created 300 dim's vector with word2vec word embedding using our own data and then passed that vector as a weights to LSTM embedding layer.
2) And then we have used one LSTM layer and one dense layer.
Here below is my code:
input_layer = layers.Input((train_seq_x.shape[1], ))
embedding_layer = layers.Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
embedding_layer = layers.SpatialDropout1D(0.3)(embedding_layer)
lstm_layer1 = layers.LSTM(300,return_sequences=True,activation="relu")(embedding_layer)
lstm_layer1 = layers.Dropout(0.5)(lstm_layer1)
flat_layer = layers.Flatten()(lstm_layer1)
output_layer = layers.Dense(33, activation="sigmoid")(flat_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
Please help me out on the below questions:
Q1) Why did we pass word embedding vector(300 dim's) as weights in LSTM embedding layer?
Q2) How can we know optimal number of neural in LSTM layer?
Q3) Can you please explain how the single record processing in LSTM algorithm?
Please let me know if you requires more information on the same.
Q1) Why did we pass word embedding vector(300 dim's) as weights in
LSTM embedding layer?
In a very simplistic way, you can think of an embedding layers as a lookup table which converts a word (represented by its index in a dictionary) to a vector. It is a trainable layers. Since you have already trained word embeddings instead of initializing the embedding layer with the random weight you initialize it with the vectors you have learned.
Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
So here you are
creating an embedding layer or a look up table which can lookup words
indices 0 to len(word_index).
Each lookuped up word will map to a vector of size 300.
This lookup table is loaded with the vectors from "embedding_matrix"
(which is a pretrained model).
trainable=False will freez the weight in this layer.
You have passed 300 because it is the vector size of your pretrained model (embedding_matrix)
Q2) How can we know optimal number of neural in LSTM layer?
You have created a LSTM layer with takes 300 size vector as input and returns a vector of size 300. The output size and number of stacked LSTMS are hyperparameters which is tuned manually (usually using KFold CV)
Q3) Can you please explain how the single record processing in LSTM
algorithm?
A single record/sentence(s) are converted into indices of the vocabulary. So for every sentence you have an array of indices.
A batch of these sentences are created and feed as input to the model.
LSTM is unwrapped by passing in one index at a time as input at each timestep.
Finally the ouput of the LSTM is forward propagated by a final dense
layer to size 33. So looks like each input is mapped to one of 33
classes in your case.
Simple example
import numpy as np
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten, LSTM
from keras.layers.embeddings import Embedding
from nltk.lm import Vocabulary
from keras.utils import to_categorical
training_data = [ "it was a good movie".split(), "it was a bad movie".split()]
training_target = [1, 0]
v = Vocabulary([word for s in training_data for word in s])
model = Sequential()
model.add(Embedding(len(v),50,input_length = 5, dropout = 0.2))
model.add(LSTM(10, dropout_U = 0.2, dropout_W = 0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())
x = np.array([list(map(lambda x: v[x], s)) for s in training_data])
y = to_categorical(training_target)
model.fit(x,y)

Flatten alongside with batch axis in TensorFlow / Keras

In a Sequential model, I'm trying to go from a layer output shape of (None, 300) to something like (1,1,None*300) to apply an AveragePooling layer. In fact I would like to flatten everything (even the batch axis), while both Flatten and Reshape layers always skip the batch axis. Any idea?
You can use a Lambda layer and the K.reshape from backend like this:
from keras import backend as K
out = Lambda(lambda x: K.reshape(x, (1, 1, -1)))(inp)

Custom linear transformation in keras

I want to build a customized layer in keras to do a linear transformation on the output of last layer.
For example, I got an output X from last layer, my new layer will output X.dot(W)+b.
The shape of W is (49,10), and the shape of X should be (64,49), the shape of b is (10,)
However, the shape of X is (?, 7, 7, 64), when I am trying to reshape it, it becomes shape=(64, ?). What is the meaning of question mark? Could you tell me a proper way to do linear transformation on the output of last layer?
The question mark generally represents the batch size, which has no effect on the model architecture.
You should be able to reshape your X with keras.layers.Reshape((64,49))(X).
You can wrap arbitrary tensorflow operations such as tf.matmul in a Lambda layer to include custom layers in your Keras model. Minimal working example that does the trick:
import tensorflow as tf
from keras.layers import Dense, Lambda, Input
from keras.models import Model
W = tf.random_normal(shape=(128,20))
b = tf.random_normal(shape=(20,))
inp = Input(shape=(10,))
x = Dense(128)(inp)
y = Lambda(lambda x: tf.matmul(x, W) + b)(x)
model = Model(inp, y)
Finally: refer to the Keras documentation on how to write custom layers with trainable weights.

Keras fails to set dynamic shape of layer properly

I am using keras==2.0.8 with tensorflow==1.3.0 backend.
Here is the example which I am confused with:
from keras.layers import Input, Reshape, Conv2DTranspose
x = Input((5000,))
y = Reshape((25, 25, 8))(x)
y = Conv2DTranspose(10, 5, padding='same', strides=2)(y)
print(y)
It's just part of my model and after these lines I use y in some tensorflow operations, but code above prints node of shape (?, ?, ?, 10). I have no idea why TF cannot deduce height and width of resulting tensor statically. (I know that keras can, but I want TF node with proper shape)
If you intend to use these tensorflow operations in a keras model, you have to use them inside Lambda layers.
In the function you create for the lambda layer, you can use the given tensor normally. Unless you have a very specific reason for tensorflow to have this fixed size explicit, there won't be any problem. Is there any special need that demands you to have the tensorflow tensor with explicit shape?
In Keras, you can always use K.shape() in a keras tensor to get its shape. Many keras backend functions can take this shape (mostly with tensorflow) as input. If you can use the keras backend functions instead of pure tensorflow functions, your code may be portable to other backends later.
Example of function:
def tensorflowPart(x):
#do tensorflow operations with the tensor x
shape = K.shape(x) #use the shape of the tensor, as a tensor
#more tensorflow operations
return result
Use the lambda layer in your model:
y = Lambda(tensorflowPart)(y)