tfidf weighted average of word embeddings with Keras - tensorflow

I don't know how this is possible, but I want to calculated some weighted average of word embeddings in a sentence like with tfidf scores.
Is it exactly this, but with just weights:
averaging a sentence’s word vectors in Keras- Pre-trained Word Embedding
import keras
from keras.layers import Embedding
from keras.models import Sequential
import numpy as np
# Set parameters
vocab_size=1000
max_length=10
# Generate random embedding matrix for sake of illustration
embedding_matrix = np.random.rand(vocab_size,300)
model = Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_matrix],
input_length=max_length, trainable=False))
# Average the output of the Embedding layer over the word dimension
model.add(keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1)))
model.summary()
How could you get with a custom layer or lambda layer the proper weights belonging to a specific word? You would need access somehow the embedding layer to get the index and then look up the proper weight.
Or is there a simple way I don't see?

embeddings = model.layers[0].get_weights()[0] # get embedding layer, shape (vocab, embedding_dim)
Alternatively, if you define the layer object:
embedding_layer = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_length, trainable=False)
embeddings = emebdding_layer.get_weights()[0]
From here, you can probably directly address the individual weights by just querying their positions using your unprocessed bag of words or integer inputs.
If you want to, you can additionally go through the actual word vectors by the string words, though that shouldn't be necessary for simply accumulating all word vectors of each sentence:
# `word_to_index` is a mapping (i.e. dict) from words to their index that you need to provide (from your original input data which should be ints)
word_embeddings = {w:embeddings[idx] for w, idx in word_to_index.items()}
print(word_embeddings['chair']) # gives you the word vector

Related

How to get token index map from tf hub pre-trained embedding?

I'm trying to use tfhub pre-trained word embedding in a text generation project. The setting is that there is a corpus of English text. I want to convert each word to dense vector (embedding) and then feed the sequence to a LSTM model and try to learn how to generate next word given a sequence.
Initially I was trying to load the embedding as a KerasLayer.
embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[],
dtype=tf.string, trainable=True, name='embedding')
However, the KerasLayer doesn't seem to take sequences as input which is 2D. It looks like I have to preprocess the text, tokenize and convert each token to vector, and then feed the vector directly to a LSTM layer.
In this case, I will need to use token to int mapping from the model. I located the tokens.txt file in the assets directory from local cache.
./tf_cache/510580b203329a4a95dfdfefd838bdcd202f0d13/assets/tokens.txt
But I don't want to manually copy the file out and load it to memory. Is there an API from tensorflow that I can call to get the token mapping instead of reading the file manually?
You should be able to manipulate the tensors so that you can pass it into the KerasLayer.
If you are using ragged tensors tf.ragged.map_flat_values is your friend, e.g.:
sentences = ["sentence 1", "sentence number 2"]
words = tf.strings.split(sentences)
word_embeddings = tf.ragged.map_flat_values(hub_layer, words)
word_embeddings.to_tensor() # Convert to dense now to feed into next layers.
If you already have a dense tensor of shape [num_sentences, num_words], you could reshape it to a [num_sentencesnum_words], then embed it transforming it into [num_sentencesnum_words, embedding_size] and then you could reshape back into [num_sentences, num_words, embedding_size]. In this case tf.reshape is your friend.
Something like:
dense_features = tf.constant([["sentence", "with", "four", "words"], ["hello", "world", "", ""]])
# Reshape to 1-d tensor.
flatten_words = tf.reshape(tokenized_dense_sentences, [-1])
# Embed each element as if it was a single batch of words.
flatten_word_embeddings = hub_layer(flatten_words)
# Reshape back to 3-d tensor.
num_sentences = tf.shape(dense_features)[0]
max_num_words = tf.shape(dense_features)[1]
embedded_features = tf.reshape(flatten_word_embeddings, [num_sentences, max_num_words, -1])
These examples differ on how they treat the non existent words.

tf.keras loss from two images in serial

I want to use the stability training approach of the paper and apply it to a very simple CNN.
The principle architecture is given by:
As shown in the figure you compute the loss based on the output f(I) for the input image I and on
the output f(I') for the perturbed image I'.
My question would be how to do this in a valid way without having two instances of the DNN,
as I'm training on large 3D images. In other words: how can I process two images in serial and compute the loss based on those two images?
I'm using tf2 with keras.
You can first write your DNN as a tf.keras Model.
After that, you can write another model which takes two image inputs, applies some Gaussian noise to one, passes them to DNN.
Design a custom loss function which finds the proper loss from the two outputs.
Here's a demo code:
from tensorflow.keras.layers import Input, Dense, Add, Activation, Flatten
from tensorflow.keras.models import Model
import tensorflow as tf
import numpy as np
import random
from tensorflow.python.keras.layers import Input, GaussianNoise, BatchNormalization
# shared DNN, this is the base model with a feature-space output, there is only once instance of the model
ip = Input(shape=(32,32,1)) # same as original inputs
f0 = Flatten()(ip)
d0 = Dense(10)(f0) # 10 dimensional feature embedding
dnn = Model(ip, d0)
# final model with two version of images and loss
input_1 = Input(shape=(32,32,1))
input_2 = Input(shape=(32,32,1))
g0 = GaussianNoise(0.5)(input_2) # only input_2 passes through gaussian noise layer, you can design your own custom layer too
# passing the two images to same DNN
path1 = dnn(input_1) # no noise
path2 = dnn(g0) # noise
model = Model([input_1, input_2], [path1, path2])
def my_loss(y_true, y_pred):
# calculate your loss based on your two outputs path1, path2
pass
model.compile('adam', my_loss)
model.summary()

Unable to track record by record processing in LSTM algorithm for text classification?

We are working on multi-class text classification and following is the process which we have used.
1) We have created 300 dim's vector with word2vec word embedding using our own data and then passed that vector as a weights to LSTM embedding layer.
2) And then we have used one LSTM layer and one dense layer.
Here below is my code:
input_layer = layers.Input((train_seq_x.shape[1], ))
embedding_layer = layers.Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
embedding_layer = layers.SpatialDropout1D(0.3)(embedding_layer)
lstm_layer1 = layers.LSTM(300,return_sequences=True,activation="relu")(embedding_layer)
lstm_layer1 = layers.Dropout(0.5)(lstm_layer1)
flat_layer = layers.Flatten()(lstm_layer1)
output_layer = layers.Dense(33, activation="sigmoid")(flat_layer)
model = models.Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer=optimizers.Adam(), loss='categorical_crossentropy',metrics=['accuracy'])
Please help me out on the below questions:
Q1) Why did we pass word embedding vector(300 dim's) as weights in LSTM embedding layer?
Q2) How can we know optimal number of neural in LSTM layer?
Q3) Can you please explain how the single record processing in LSTM algorithm?
Please let me know if you requires more information on the same.
Q1) Why did we pass word embedding vector(300 dim's) as weights in
LSTM embedding layer?
In a very simplistic way, you can think of an embedding layers as a lookup table which converts a word (represented by its index in a dictionary) to a vector. It is a trainable layers. Since you have already trained word embeddings instead of initializing the embedding layer with the random weight you initialize it with the vectors you have learned.
Embedding(len(word_index)+1, 300, weights=[embedding_matrix], trainable=False)(input_layer)
So here you are
creating an embedding layer or a look up table which can lookup words
indices 0 to len(word_index).
Each lookuped up word will map to a vector of size 300.
This lookup table is loaded with the vectors from "embedding_matrix"
(which is a pretrained model).
trainable=False will freez the weight in this layer.
You have passed 300 because it is the vector size of your pretrained model (embedding_matrix)
Q2) How can we know optimal number of neural in LSTM layer?
You have created a LSTM layer with takes 300 size vector as input and returns a vector of size 300. The output size and number of stacked LSTMS are hyperparameters which is tuned manually (usually using KFold CV)
Q3) Can you please explain how the single record processing in LSTM
algorithm?
A single record/sentence(s) are converted into indices of the vocabulary. So for every sentence you have an array of indices.
A batch of these sentences are created and feed as input to the model.
LSTM is unwrapped by passing in one index at a time as input at each timestep.
Finally the ouput of the LSTM is forward propagated by a final dense
layer to size 33. So looks like each input is mapped to one of 33
classes in your case.
Simple example
import numpy as np
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten, LSTM
from keras.layers.embeddings import Embedding
from nltk.lm import Vocabulary
from keras.utils import to_categorical
training_data = [ "it was a good movie".split(), "it was a bad movie".split()]
training_target = [1, 0]
v = Vocabulary([word for s in training_data for word in s])
model = Sequential()
model.add(Embedding(len(v),50,input_length = 5, dropout = 0.2))
model.add(LSTM(10, dropout_U = 0.2, dropout_W = 0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())
x = np.array([list(map(lambda x: v[x], s)) for s in training_data])
y = to_categorical(training_target)
model.fit(x,y)

Add L2 regularization to specific embeddings in Tensorflow

I am building a model like wide & deep using Tensorflow. For discrete features I first embed them into vector space and I am wondering how to add L2 normalization on embeddings.
The L2 regularization operator tf.nn.l2_loss accept the embedding tensor as input, but I only want to regularize specific embeddings whose id appear in current batch of data, not the whole matrix.
Just use the specific embeddings whose id appear in current batch of data to calculate regularization loss.
import tensorflow as tf
ids = sparse_tensor.values
uniq_ids, _ = tf.python.ops.array_ops.unique(ids)
embedding_index_slices = tf.gather(large_embedding_variable, uniq_ids)
regularization_loss = tf.nn.l2_loss(embedding_index_slices.values)
...
loss = train_loss + FLAGS.l2 * regularization_loss

How to perform row wise or column wise max pooling in keras

I am trying to perform row wise and column wise max pooling over an attention layer as described in the link below:
http://www.dfki.de/~neumann/ML4QAseminar2016/presentations/Attentive-Pooling-Network.pdf (slide-15)
I am using text dataset where a sentence is fed to CNN. Each word of the sentence has been embedded. The code for it is as below:
model.add(Embedding(MAX_NB_WORDS, emb_dim, weights=[embedding_matrix],input_length=MAX_SEQUENCE_LENGTH, trainable=False))
model.add(Conv1D(k, FILTER_LENGTH, border_mode = "valid", activation = "relu"))
The output from the CNN is of shape (None, 256). This act as an input to attention layer.
Can anyone suggest how to implement the row wise or column wise max-pooling in keras with tensorflow as the backend?
If you have images along your model with shape (batch, width, height, channels), you can reshape the data to hide one of the spatial dimensions and use a 1D pooling:
For the width:
model.add(Reshape((width, height*channels)))
model.add(MaxPooling1D())
model.add(Reshape((width/2, height, channels))) #if you had an odd number, add +1 or -1 (one of them will work)
For the height:
#Here, the time distributed will consider that "width" is an extra time dimension,
#and will simply think of it as an extra "batch" dimension
model.add(TimeDistributed(MaxPooling1D()))
Working example, functional API model with two branches, one for each pooling:
import numpy as np
from keras.layers import *
from keras.models import *
inp = Input((30,50,4))
out1 = Reshape((30,200))(inp)
out1 = MaxPooling1D()(out1)
out1 = Reshape((15,50,4))(out1)
out2 = TimeDistributed(MaxPooling1D())(inp)
model = Model(inp,[out1,out2])
model.summary()
Alternatively to Reshape, in case you don't want to bother about the numbers:
#swap height and width
model.add(Permute((2,1,3)))
#apply the pooling to width
model.add(TimeDistributed(MaxPooling1D()))
#bring height and width to the correct order
model.add(Permute((2,1,3)))