Random boolean mask sampled according to custom PDF in Tensorflow - tensorflow

I am trying to generate a random boolean mask sampled according to a predefined probability distribution. The probability distribution is stored in a tensor of the same shape as the resulting mask. Each entry contains the probability that the mask will be true at that particular location.
In short I am looking for a function that takes 4 inputs:
pdf: A tensor to use as a PDF
s: The number of samples per mask
n: The total number of masks to generate
replace: A boolean indicating if sampling should be done with replacement
and returns n boolean masks
A simplified way to do this using numpy would look like this:
def sample_mask(pdf, s, replace):
hight, width = pdf.shape
# Flatten to 1 dimension
pdf = np.resize(pdf, (hight*width))
# Sample according to pdf, the result is an array of indices
samples=np.random.choice(np.arange(hight*width),
size=s, replace=replace, p=pdf)
mask = np.zeros(hight*width)
# Apply indices to mask
for s in samples:
mask[s]=1
# Resize back to the original shape
mask = np.resize(mask, (hight, width))
return mask
I already figured out that the sampling part, without the replace parameter, can be done like this:
samples = tf.multinomial(tf.log(pdf_tensor), n)
But I am stuck when it comes to transforming the samples to a mask.

I must have been sleeping, here is how I solved it:
def sample_mask(pdf, s, n, replace):
"""Initialize the model.
Args:
pdf: A 3D Tensor of shape (batch_size, hight, width, channels=1) to use as a PDF
s: The number of samples per mask. This value should be less than hight*width
n: The total number of masks to generate
replace: A boolean indicating if sampling should be done with replacement
Returns:
A Tensor of shape (batch_size, hight, width, channels=1, n) containing
values 1 or 0.
"""
batch_size, hight, width, channels = pdf.shape
# Flatten pdf
pdf = tf.reshape(pdf, (batch_size, hight*width))
if replace:
# Sample with replacement. Output is a tensor of shape (batch_size, n)
sample_fun = lambda: tf.multinomial(tf.log(pdf), s)
else:
# Sample without replacement. Output is a tensor of shape (batch_size, n).
# Cast the output to 'int64' to match the type needed for SparseTensor's indices
sample_fun = lambda: tf.cast(sample_without_replacement(tf.log(pdf), s), dtype='int64')
# Create batch indices
idx = tf.range(batch_size, dtype='int64')
idx = tf.expand_dims(idx, 1)
# Transform idx to a 2D tensor of shape (batch_size, samples_per_batch)
# Example: [[0 0 0 0 0],[1 1 1 1 1],[2 2 2 2 2]]
idx = tf.tile(idx, [1, s])
mask_list = []
for i in range(n):
# Generate samples
samples = sample_fun()
# Combine batch indices and samples
samples = tf.stack([idx,samples])
# Transform samples to a list of indicies: (batch_index, sample_index)
sample_indices = tf.transpose(tf.reshape(samples, [2, -1]))
# Create the mask as a sparse tensor and set sampled indices to 1
mask = tf.SparseTensor(indices=sample_indices, values=tf.ones(s*batch_size), dense_shape=[batch_size, hight*width])
# Convert mask to a dense tensor. Non-sampled values are set to 0.
# Don't validate the indices, since this requires indices to be ordered
# and unique.
mask = tf.sparse.to_dense(mask, default_value=0,validate_indices=False)
# Reshape to input shape and append to list of tensors
mask_list.append(tf.reshape(mask, [batch_size, hight, width, channels]))
# Combine all masks into a tensor of shape:
# (batch_size, hight, width, channels=1, number_of_masks)
return tf.stack(mask_list, axis=-1)
Function for sampling without replacement as proposed here: https://github.com/tensorflow/tensorflow/issues/9260#issuecomment-437875125
It uses the Gumble-max trick: https://timvieira.github.io/blog/post/2014/07/31/gumbel-max-trick/
def sample_without_replacement(logits, K):
z = -tf.log(-tf.log(tf.random_uniform(tf.shape(logits),0,1)))
_, indices = tf.nn.top_k(logits + z, K)
return indices

Related

Custom TensorFlow loss function with batch size > 1?

I have some neural network with following code snippets, note that batch_size == 1 and input_dim == output_dim:
net_in = tf.Variable(tf.zeros(shape = [batch_size, input_dim]), dtype=tf.float32)
input_placeholder = tf.compat.v1.placeholder(shape = [batch_size, input_dim], dtype=tf.float32)
assign_input = net_in.assign(input_placeholder)
# Some matmuls, activations, dropouts, normalizations...
net_out = tf.tanh(output_before_activation)
def loss_fn(output, input):
#input.shape = output.shape = (batch_size, input_dim)
output = tf.reshape(output, [input_dim,]) # shape them into 1d vectors
input = tf.reshape(input, [input_dim,])
return my_fn_that_only_takes_in_vectors(output, input)
# Create session, preprocess data ...
for epoch in epoch_num:
for batch in range(total_example_num // batch_size):
sess.run(assign_input, feed_dict = {input_placeholder : some_appropriate_numpy_array})
sess.run(optimizer.minimize(loss_fn(net_out, net_in)))
Currently the neural network above works fine, but it is very slow because it updates gradient every sample (batch size = 1). I would like to set batch size > 1, but my_fn_that_only_takes_in_vectors cannot accommodate matrices whose first dimension is not 1. Due to the nature of my custom loss, flattening the batch input into a vector of length (batch_size * input_dim) seems to not work.
How would I write my new custom loss_fn now that the input and output are N x input_dim where N > 1? In Keras this would not have been an issue because keras somehow takes the average of the gradients of each example in the batch. For my TensorFlow function, should I take each row as a vector individually, pass them to my_fn_that_only_takes_in_vectors, then take the average of the results?
You can use a function that computes the loss on the whole batch, and works independently on the batch size. Basically the operations are applied to the whole first dimension of the input (the first dimension represents the element number in the batch). Here is an example, I hope this helps to see how the operations are carried out:
def my_loss(y_true, y_pred):
dx2 = tf.math.squared_difference(y_true[:, 0], y_true[:, 2]) # shape (BatchSize, )
dy2 = tf.math.squared_difference(y_true[:, 1], y_true[:, 3]) # shape: (BatchSize, )
denominator = dx2 + dy2 # shape: (BatchSize, )
dst_vec = tf.math.squared_difference(y_true, y_pred) # shape: (Batch, n_labels)
numerator = tf.reduce_sum(dst_vec, axis=-1) # shape: (BatchSize,)
loss_vector = tf.cast(numerator / denominator, dtype="float32") # shape: (BatchSize,) this is a vector containing the loss of each element of the batch
loss = tf.reduce_sum(loss_vector ) #if you want to sum the losses
return loss
I am not sure whether you need to return the sum or the avg of the losses for the batch.
If you sum, make sure to use a validation dataset with same batch size, otherwise the loss is not comparable.

What's the effect of projection layer after n-grams convolution banks?

I'm studying CBHG module for extracting representations from sequence in Tacotron.
CBHG is consisted of (1-D convolution bank - highway network - bidirectional GRU).
Inputs of 1-d conv is made by 'lookup_table' for embedding a~z.
After 1-D conv caculate, there is 'tf.concat' for concated the results.
And there is projection with this result.
I think it's associate with word2vec embedding like 'CBOW' but It's very hard to me.
What is the effect of projection layer after concated result by n-grams convolution? Extracting meaningful 'n' in n-grams conv banks?
Please help me.
with tf.variable_scope('conv_bank'):
# Convolution bank: concatenate on the last axis
# to stack channels from all convolutions
conv_fn = lambda k: \
conv1d(inputs, k, bank_channel_size,
tf.nn.relu, is_training, 'conv1d_%d' % k)
conv_outputs = tf.concat(
[conv_fn(k) for k in range(1, bank_size+1)], axis=-1,
)
# Maxpooling:
maxpool_output = tf.layers.max_pooling1d(
conv_outputs,
pool_size=maxpool_width,
strides=1,
padding='same')
# Two projection layers:
proj_out = maxpool_output
for idx, proj_size in enumerate(proj_sizes):
activation_fn = None if idx == len(proj_sizes) - 1 else tf.nn.relu
proj_out = conv1d(
proj_out, proj_width, proj_size, activation_fn,
is_training, 'proj_{}'.format(idx + 1))

tensorflow cross entropy loss for sequence with different lengths

i'm building a seq2seq model with LSTM using tensorflow. The loss function i'm using is the softmax cross entropy loss. The problem is my input sequences have different lenghts so i padded it. The output of the model have the shape [max_length, batch_size, vocab_size]. How can i calculate the loss that the 0 padded values don't affect the loss? tf.nn.softmax_cross_entropy_with_logits provide axis parameter so we can calculate the loss with 3-dimention but it doesn't provide weights. tf.losses.softmax_cross_entropy provides weights parameter but it recieves input with shape [batch_size, nclass(vocab_size)]. Please help!
I think you'd have to write your own loss function. Check out https://danijar.com/variable-sequence-lengths-in-tensorflow/.
In this case you need to pad the two logits and labels so that they have the same length. So, if you have the tensors logits with the size of (batch_size, length, vocab_size) and labels with the size of (batch_size, length) in which length is the size of your sequence. First, you have to pad them to same length:
def _pad_tensors_to_same_length(logits, labels):
"""Pad x and y so that the results have the same length (second dimension)."""
with tf.name_scope("pad_to_same_length"):
logits_length = tf.shape(logits)[1]
labels_length = tf.shape(labels)[1]
max_length = tf.maximum(logits_length, labels_length)
logits = tf.pad(logits, [[0, 0], [0, max_length - logits_length], [0, 0]])
labels = tf.pad(labels, [[0, 0], [0, max_length - labels_length]])
return logits, labels
Then you can do the padded cross entropy:
def padded_cross_entropy_loss(logits, labels, vocab_size):
"""Calculate cross entropy loss while ignoring padding.
Args:
logits: Tensor of size [batch_size, length_logits, vocab_size]
labels: Tensor of size [batch_size, length_labels]
vocab_size: int size of the vocabulary
Returns:
Returns the cross entropy loss
"""
with tf.name_scope("loss", values=[logits, labels]):
logits, labels = _pad_tensors_to_same_length(logits, labels)
# Calculate cross entropy
with tf.name_scope("cross_entropy", values=[logits, labels]):
xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=targets)
weights = tf.to_float(tf.not_equal(labels, 0))
return xentropy * weights
The function below takes two tensors with shapes (batch_size,time_steps,vocab_len). computes the mask for zeroing the time steps related to padding. the mask will remove the loss of padding from the categorical cross entropy.
# the labels that has 1 as the first element
def mask_loss(y_true, y_pred):
mask_value = np.zeros((vocab_len))
mask_value[0] = 1
# find out which timesteps in `y_true` are not the padding character
mask = K.equal(y_true, mask_value)
mask = 1 - K.cast(mask, K.floatx())
mask = K.sum(mask,axis=2)/2
# multplying the loss by the mask. the loss for padding will be zero
loss = tf.keras.layers.multiply([K.categorical_crossentropy(y_true, y_pred), mask])
return K.sum(loss) / K.sum(mask)

How to calculate distance between two vectors efficiently?

I need calculate cosine_distance repeatedly, and tf.losses.cosine_distance returns a scalar Tensor, so I did it like this:
x # a tensor list
y # a tensor list
for i in x:
for j in y:
distance = tf.losses.cosine_distance(i, j, dim=0)
This approach makes graph too big and loading of the program too slow. How can I optimize it?
Loops are no good in tensorflow.
I am assuming all the vectors in the tensor lists are of equal length
Try this:
x_t = tf.stack(x)
y_t = tf.stack(y)
prod = tf.matmul(x_t, y_t, transpose_b=True)
x_len = tf.sqrt(tf.reduce_sum(tf.matmul(x_t, x_t), axis=0))
y_len = tf.sqrt(tf.reduce_sum(tf.matmul(y_t, y_t), axis=0))
cosine_dist = prod/tf.matmul(x_len, y_len, transpose_b=True)

How to iterate a variable length tensor in tensorflow?

I want to build a graph to accept variable size images. So after the last conv layer, I want to iterate through every pixel and feed that vector to softmax layer. And then calculate the mean of them.
It may look like this:
last_max_pool = tf.nn.max_pool(...)
output = tf.reshape(last_max_pool, [batch_size, None, channels])
results = []
for one_batch in output:
result = []
for pixel in one_batch:
result.append(tf.nn.softmax(tf.matmul(pixel, W_sf) + B_sf))
result = tf.reduce_mean(result)
results.append(result)
But how to build the graph like this in tensorflow?
I don't think you need for loop, instead you can reshape last_max_pool to [None, channels]
output = tf.reshape(last_max_pool, [None, channels])
softmax = tf.nn.softmax(tf.matmul(output, W_sf) + B_sf)
final_output = tf.reduce_mean(results)