How to do word embedding to provide input to RNN? - tensorflow

I am trying to do word prediction using basic RNN. I need to provide input to the RNN cell; I am trying following code
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, sequence_length, 1)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
output = tf.transpose(output, (1,0,2))
output = tf.reshape(output, (sequence_length*num_samples,hidden_layer_size))
I am getting error ValueError: Layer gru_cell_2 expects 1 inputs, but it received 39 input tensors. I think this error is due to the embedding as that is not giving a tensor of dimension which can be input to the GRUCell. So, How to provide the input to GRU Cell?

The way you're initializing X_input is probably wrong. That extra one dimension is causing the problem. If you remove that then there's no need to use unstack. This following code would work.
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
output, states = tf.nn.dynamic_rnn(rnn, x, dtype = tf.float32)
##shape of output here is (None,sequence_length,hidden_layer_size)
But if you really need to use that dimension then you need to make a small modification in unstack. You're unstacking it along axis=1 into sequence_length number of tensors, which again doesn't seem right. So do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.unstack(x, 1, 2)
output, states = tf.nn.dynamic_rnn(rnn, x[0], dtype = tf.float32)
##shape of output here is again same (None,sequence_length,hidden_layer_size)
Lastly if you really really need to unstack it in sequence_length number of tensors then replace unstack with tf.map_fn() and do this:
X_input = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
Y_target = tf.placeholder(tf.int32, shape = (None, sequence_length, 1))
tfWe = tf.Variable(tf.random_uniform((V, embedding_dim)))
W1 = tf.Variable(np.random.randn(hidden_layer_size, label).astype(np.float32))
b = tf.Variable(np.zeros(label).astype(np.float32))
rnn = tf.contrib.rnn.GRUCell(num_units = hidden_layer_size, activation = tf.nn.relu)
x = tf.nn.embedding_lookup(tfWe, X_input)
x = tf.transpose(x,[1,0,2,3])
##tf.map_fn unstacks a tensor along the first dimension only so we need to make seq_len as first dimension by taking transpose
output,states = tf.map_fn(lambda x: tf.nn.dynamic_rnn(rnn,x,dtype=tf.float32),x,dtype=(tf.float32, tf.float32))
##shape of output here is (sequence_length,None,1,hidden_layer_size)
A warning: Notice the shape of the output in each solution. be wary of what type of shape you want.
EDIT:
To answer your question about when to use what type of inputs:
Suppose you have 25 sentences, each has 15 words and you divided it into 5 batches of size 5 each. Also, suppose you're using word embedding of 50 dimensions(let's say u are using word2vec), then your input shape would be (batch_size=5,time_step=15, features=50). In this case, you don't need to use unstacking or any kind of mapping.
Next, suppose you have 30 documents, each has 25 sentences, each sentence 15 words long, and you divided documents into 6 batches of size 5 each. Again, suppose you're using word embedding of 50 dimensions, then your input shape has now one extra dimension. Here batch_size=5,time_step=15 and features=50 but what about number of sentences? Now your input is (batch_size=5,num_sentences=25,time_step=15, features=50) which is a invalid shape for any type of RNNs. In that case, you need to unstack it along the sentence dimension to make 25 tensors, each will have the shape (5,15,50). To make that work, I used tf.map_fn.

Related

Getting error while adding embedding layer to lstm autoencoder

I have a seq2seq model which is working fine. I want to add an embedding layer in this network which I faced with an error.
this is my architecture using pretrained word embedding which is working fine(Actually the code is almost the same code available here, but I want to include the Embedding layer in the model rather than using the pretrained embedding vectors):
LATENT_SIZE = 20
inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input")
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(inputs)
encoded = Lambda(rev_ent)(encoded)
decoded = RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = Bidirectional(LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
NUM_EPOCHS = 1
num_train_steps = len(Xtrain) // BATCH_SIZE
num_test_steps = len(Xtest) // BATCH_SIZE
checkpoint = ModelCheckpoint(filepath=os.path.join('Data/', "simple_ae_to_compare"), save_best_only=True)
history = autoencoder.fit_generator(train_gen, steps_per_epoch=num_train_steps, epochs=NUM_EPOCHS, validation_data=test_gen, validation_steps=num_test_steps, callbacks=[checkpoint])
This is the summary:
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 45, 50) 0
_________________________________________________________________
encoder_lstm (Bidirectional) (None, 20) 11360
_________________________________________________________________
lambda_1 (Lambda) (512, 20) 0
_________________________________________________________________
repeater (RepeatVector) (512, 45, 20) 0
_________________________________________________________________
decoder_lstm (Bidirectional) (512, 45, 50) 28400
when I change the code to add the embedding layer like this:
inputs = Input(shape=(SEQUENCE_LEN,), name="input")
embedding = Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = Bidirectional(LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
I received this error:
expected decoder_lstm to have 3 dimensions, but got array with shape (512, 45)
So my question, what is wrong with my model?
Update
So, this error is raised in the training phase. I also checked the dimension of the data being fed to the model, it is (61598, 45) which clearly do not have the number of features or here, Embed_dim.
But why this error raises in the decoder part? because in the encoder part I have included the Embedding layer, so it is totally fine. though when it reached the decoder part and it does not have the embedding layer so it can not correctly reshape it to three dimensional.
Now the question comes why this is not happening in a similar code?
this is my view, correct me if I'm wrong. because Seq2Seq code usually being used for Translation, summarization. and in those codes, in the decoder part also there is input (in the translation case, there is the other language input to the decoder, so the idea of having embedding in the decoder part makes sense).
Finally, here I do not have seperate input, that's why I do not need any separate embedding in the decoder part. However, I don't know how to fix the problem, I just know why this is happening:|
Update2
this is my data being fed to the model:
sent_wids = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'int32')
sample_seq_weights = np.zeros((len(parsed_sentences),SEQUENCE_LEN),'float')
for index_sentence in range(len(parsed_sentences)):
temp_sentence = parsed_sentences[index_sentence]
temp_words = nltk.word_tokenize(temp_sentence)
for index_word in range(SEQUENCE_LEN):
if index_word < sent_lens[index_sentence]:
sent_wids[index_sentence,index_word] = lookup_word2id(temp_words[index_word])
else:
sent_wids[index_sentence, index_word] = lookup_word2id('PAD')
def sentence_generator(X,embeddings, batch_size, sample_weights):
while True:
# loop once per epoch
num_recs = X.shape[0]
indices = np.random.permutation(np.arange(num_recs))
# print(embeddings.shape)
num_batches = num_recs // batch_size
for bid in range(num_batches):
sids = indices[bid * batch_size : (bid + 1) * batch_size]
temp_sents = X[sids, :]
Xbatch = embeddings[temp_sents]
weights = sample_weights[sids, :]
yield Xbatch, Xbatch
LATENT_SIZE = 60
train_size = 0.95
split_index = int(math.ceil(len(sent_wids)*train_size))
Xtrain = sent_wids[0:split_index, :]
Xtest = sent_wids[split_index:, :]
train_w = sample_seq_weights[0: split_index, :]
test_w = sample_seq_weights[split_index:, :]
train_gen = sentence_generator(Xtrain, embeddings, BATCH_SIZE,train_w)
test_gen = sentence_generator(Xtest, embeddings , BATCH_SIZE,test_w)
and parsed_sentences is 61598 sentences which are padded.
Also, this is the layer I have in the model as Lambda layer, I just added here in case it has any effect ever:
def rev_entropy(x):
def row_entropy(row):
_, _, count = tf.unique_with_counts(row)
count = tf.cast(count,tf.float32)
prob = count / tf.reduce_sum(count)
prob = tf.cast(prob,tf.float32)
rev = -tf.reduce_sum(prob * tf.log(prob))
return rev
nw = tf.reduce_sum(x,axis=1)
rev = tf.map_fn(row_entropy, x)
rev = tf.where(tf.is_nan(rev), tf.zeros_like(rev), rev)
rev = tf.cast(rev, tf.float32)
max_entropy = tf.log(tf.clip_by_value(nw,2,LATENT_SIZE))
concentration = (max_entropy/(1+rev))
new_x = x * (tf.reshape(concentration, [BATCH_SIZE, 1]))
return new_x
Any help is appreciated:)
I tried the following example on Google colab (TensorFlow version 1.13.1),
from tensorflow.python import keras
import numpy as np
SEQUENCE_LEN = 45
LATENT_SIZE = 20
EMBED_SIZE = 50
VOCAB_SIZE = 100
inputs = keras.layers.Input(shape=(SEQUENCE_LEN,), name="input")
embedding = keras.layers.Embedding(output_dim=EMBED_SIZE, input_dim=VOCAB_SIZE, input_length=SEQUENCE_LEN, trainable=True)(inputs)
encoded = keras.layers.Bidirectional(keras.layers.LSTM(LATENT_SIZE), merge_mode="sum", name="encoder_lstm")(embedding)
decoded = keras.layers.RepeatVector(SEQUENCE_LEN, name="repeater")(encoded)
decoded = keras.layers.Bidirectional(keras.layers.LSTM(EMBED_SIZE, return_sequences=True), merge_mode="sum", name="decoder_lstm")(decoded)
autoencoder = keras.models.Model(inputs, decoded)
autoencoder.compile(optimizer="sgd", loss='mse')
autoencoder.summary()
And then trained the model using some random data,
x = np.random.randint(0, 90, size=(10, 45))
y = np.random.normal(size=(10, 45, 50))
history = autoencoder.fit(x, y, epochs=NUM_EPOCHS)
This solution worked fine. I feel like the issue might be the way you are feeding in labels/outputs for MSE calculation.
Update
Context
In the original problem, you are attempting to reconstruct word embeddings using a seq2seq model, where embeddings are fixed and pre-trained. However you want to use a trainable embedding layer as a part of the model it becomes very difficult to model this problem. Because you don't have fixed targets (i.e. targets change every single iteration of the optimization because your embedding layer is changing). Furthermore this will lead to a very unstable optimization problem, because the targets are changing all the time.
Fixing your code
If you do the following you should be able to get the code working. Here embeddings is the pre-trained GloVe vector numpy.ndarray.
def sentence_generator(X, embeddings, batch_size):
while True:
# loop once per epoch
num_recs = X.shape[0]
embed_size = embeddings.shape[1]
indices = np.random.permutation(np.arange(num_recs))
# print(embeddings.shape)
num_batches = num_recs // batch_size
for bid in range(num_batches):
sids = indices[bid * batch_size : (bid + 1) * batch_size]
# Xbatch is a [batch_size, seq_length] array
Xbatch = X[sids, :]
# Creating the Y targets
Xembed = embeddings[Xbatch.reshape(-1),:]
# Ybatch will be [batch_size, seq_length, embed_size] array
Ybatch = Xembed.reshape(batch_size, -1, embed_size)
yield Xbatch, Ybatch

Unpredicted tensor shape after dynamical slicing

I am relatively new to TF and am wondering how to get tensor slice dynamically, from a unknown shape of tensor?
I want to get the weights from the last layer (output_layer) and do softmax and then only look at those indices on the 2-nd dimensions (from the out_reshape). The numpy-type of striding didnt work, so I am using tf.gather instead (after changing the axis so that the desired axis is on the first axis).
And this works:
out_reshape = tf.gather(out_reshape, [1,2,3,4])
This outputs a tensor with [4, 3, ?] (as we expected). But I want to change the indices based on the data fed to T (instead of [1,2,3,4] as shown above).
this gives an unpredicted result (as shown in the code below):
out_reshape = tf.gather(out_reshape, T)
and
out_reshape.shape
this gives TensorShape(None)), but I was expecting to get [?, 3, ?], where the first value is the same length as T (the data fed into the T is an 1-d ndarray, such as [100, 200, 300, 400]).
What is going on here? Why its output shape collapses to None?
The entire code is something like this:
graph = tf.Graph()
tf.reset_default_graph()
with graph.as_default():
y=tf.placeholder(tf.float32, shape =(31, 3, None), name = 'Y_observed') # (samples x) ind x seqlen
T=tf.placeholder(tf.int32, shape =(None), name = "T_observed")
x = tf.placeholder(tf.float32, shape = (None, None, 4) , name = 'X_observed')
model = Conv1D(filters = 16,
kernel_size = filter_width,
padding = "same",
activation='relu')(x)
model = Conv1D(filters = 16,
kernel_size = filter_width,
padding = "same",
activation='relu')(model)
model = Conv1D(filters = n_output_channels,
kernel_size = 1,
padding = "same",
activation='relu')(model)
model_output = tf.identity(model, name='last_layer')
output_layer = tf.get_default_graph().get_tensor_by_name('last_layer:0')
out = output_layer[:, 512:, :]
out_norm = tf.nn.softmax( out, axis=1 )
out_reshape = tf.transpose(out_norm, (1, 2, 0)) # this gives a [?,3,?] tensor
out_reshape = tf.gather(out_reshape, T) # --> Problematic part !
...
updates = tf.train.AdamOptimizer(1e-4).minimize....
...

FC Layer followed by LSTM - Tensorflow

I'm trying to work with LSTMs. My input data is 224*1 and my labels are 70*1.
Before connecting my inputs to LSTMs I'm trying to match the Input data to label values.
So i'm trying to use FC layer in the beginning, leave it to FC layer to learn the non-linear scale of Input-Label and then connect back the output of FC layer to LSTM.
I have tried using tf.reshape by flattening and reshape, it does not work has the size is different.
Can anyone help me with this ? Is this possibe at all?
My output of FC layer I am getting now is:
fc_layer:tf.Tensor 'Reshape:0' shape=(224, 70, 1) dtype=float32
Code
fc_layer = tf.contrib.layers.fully_connected(inputs =
batchX_placeholder, num_outputs = 70, activation_fn =
tf.nn.relu)
fc_layer = tf.reshape(fc_layer,[-1, 70 , 1])
#######RNN Layer
init_state = tf.placeholder(dtype = tf.float32, shape = [num_layers, 2,
batch_size, state_size],name = 'init_state')
state_per_layer_list = tf.unstack(init_state, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(num_layers)]
)
I tried this by creating 2 FC layer back-to-back.
Inputs are [1,224]
fc_layer1 = tf.contrib.layers.fully_connected(inputs, num_outputs = 224, activation_fn = tf.nn.relu)
fc_layer2 = tf.contrib.layers.fully_connected(inputs = fc_layer1, num_outputs = 70, activation_fn = tf.nn.relu)
So now I have fc_layer2 with shape (1,70). My LSTM labels are (70). I think I can now proceed for LSTM design

TensorFlow sparse_softmax_cross_entropy rank error

I'm trying to build an RNN with LSTM on TensorFlow. Both the input and output are 5000 by 2 matrices, where the columns represent the features. Those matrices are then fed to the batchX and batchY placeholders which enable the backpropagation. The main definition of the code is at the bottom. I am getting the following error :
"Rank mismatch: Rank of labels (received 2) should equal rank of logits minus 1 (received 2)."
I have checked both logits_series and labels_series and they seem to both contain backpropagation amount of tensors of the shape of [batch_size, num_features]
The thing I am confused about is the following: since logits are predictions of labels, shouldn't they have the same dimensions?
'''
RNN definitions
input_dimensions = [batch_size, truncated_backprop_length, num_features_input]
output_dimensions = [batch_size, truncated_backprop_length, num_features_output]
state_dimensions = [batch_size, state_size]
'''
batchX_placeholder = tf.placeholder(tf.float32, (batch_size, truncated_backprop_length, num_features_input))
batchY_placeholder = tf.placeholder(tf.int32, (batch_size, truncated_backprop_length, num_features_output))
init_state = tf.placeholder(tf.float32, (batch_size, state_size))
inputs_series = tf.unstack(batchX_placeholder, axis=1)
labels_series = tf.unstack(batchY_placeholder, axis=1)
w = tf.Variable(np.random.rand(num_features_input+state_size,state_size), dtype = tf.float32)
b = tf.Variable(np.zeros((batch_size, state_size)), dtype = tf.float32)
w2 = tf.Variable(np.random.rand(state_size, num_features_output), dtype = tf.float32)
b2 = tf.Variable(np.zeros((batch_size, num_features_output)), dtype=tf.float32)
#calculate state and output variables
state_series = []
output_series = []
current_state = init_state
#iterate over each truncated_backprop_length
for current_input in inputs_series:
current_input = tf.reshape(current_input,[batch_size, num_features_input])
input_and_state_concatenated = tf.concat([current_input,current_state], 1)
next_state = tf.tanh(tf.matmul(input_and_state_concatenated, w) + b)
state_series.append(next_state)
current_state = next_state
output = tf.matmul(current_state, w2)+b2
output_series.append(output)
#calculate expected output for each state
logits_series = [tf.matmul(state, w2) + b2 for state in state_series]
#print(logits_series)
predictions_series = [tf.nn.softmax(logits) for logits in logits_series]
'''
batchY_placeholder = np.zeros((batch_size,truncated_backprop_length))
for i in range(batch_size):
for j in range(truncated_backprop_length):
batchY_placeholder[i,j] = batchY1_placeholder[j, i, 0]+batchY1_placeholder[j, i, 1]
'''
print("logits_series", logits_series)
print("labels_series", labels_series)
#calculate losses given each actual and calculated output
losses = [tf.nn.sparse_softmax_cross_entropy_with_logits(logits = logits, labels = labels) for logits, labels in zip(logits_series,labels_series)]
total_loss = tf.reduce_mean(losses)
Thanks to Maosi Chen, I found the issue. It was because the
tf.nn.sparse_softmax_cross_entropy_with_logits
Requires labels to have one less dimension than logits. Specifically, the labels argument takes values of the shape [batch_size] and the dtype int32 or int64
I solved the issue by enumerating the one hot encoded labels I had, reducing the dimension
However, it was also possible to use
tf.nn.softmax_cross_entropy_with_logits
Which does not have the dimension reduction requirement, as it takes labels values with shape [batch_size, num_classes] and dtype float32 or float64.

How shape Tensor array?

I have lately been vexed by the following error message:
ValueError: Cannot feed value of shape (2455040,) for Tensor 'Placeholder:0', which has shape '(2455040, ?)'
Which is being produced from running the following code:
NUMCLASSES=16
NUMPIXELS=959*640*4
# set up to feed an array of images [images, size_of_image]
x = tf.placeholder(tf.float32, [NUMPIXELS,None])
....deletia....
# Define loss and optimizer..why is this 2d?
y_ = tf.placeholder(tf.float32, [None,NUMCLASSES])
sess = tf.InteractiveSession()
tf.global_variables_initializer().run(session=sess)
tl = get_tensor_list()
for f, n in tl:
str = '/users/me/downloads/train/' + f
mm = Image.open(str)
mm = mm.convert('F')
mma=np.array(mm)
i = mma.flatten() #now this is an array of floats of size NUMPIXELS
sess.run(train_step, feed_dict={x: i, y_: n}) # <<DEATH
Somehow, that array is getting a shape that tf does not like [(x,) when it wants (x,?)]. How to satisfy the tensorgods in this case? The tensor must be what it must be for other mathematical reasons not discussed.
reshaping the array might help.
i = mma.flatten().reshape((NUMPIXELS,1))
The error happens because the two tensors have different ranks: tensor with shape (2455040,) has rank 1, while tensor with shape (2455040,?) has rank 2.
You can do this:
x = tf.placeholder(tf.float32, [None])
x = tf.reshape(x, [NUMPIXELS,-1])