Is there a differentiable way to cast a tensor into int64? - tensorflow

I'm creating a layer which uses its input tensors to create a SparseTensor, i.e., the input tensors are the respective indices and values of the SparseTensor.
Since:
indices: A 2-D int64 tensor of shape
And because tf.cast(x, tf.int64) is not differentiable, I'm not sure if this is achievable.
Another option is to find a turnaround based on tf.round(), but SparseTensor won't accept a different type of tensor as indices.
Is there a way to cast a tensor to integer and not having None for gradient ?
How can I create a SparseTensor using previous layers outputs, which are float tensors ?

Related

tf.data.Dataset.reduce of SparseTensor elements

I have a tf.data.Dataset object d, where each element is an integer tf.sparse.SparseTensor, and I would like to sum them, returning a sparse tensor. One way I see is the following:
d.reduce(tf.sparse.SparseTensor(tf.zeros([0, 1], tf.int64),
tf.zeros([0], tf.int32),
dense_shape),
tf.sparse.add)
Problem:
How do I construct the zero for the reduce operation if I do not know the dense_shape ahead of time? I know all the sparse tensors in the dataset will have same shape, but it is not statically known. Perhaps it depends on data from which the sparse tensors are constructed, and setting this interactively in eager mode is not a viable option.

In tf.keras.layers.Embedding, why it is important to know the size of dictionary?

Same as the title, in tf.keras.layers.Embedding, why it is important to know the size of dictionary as input dimension?
Because internally, the embedding layer is nothing but a matrix of size vocab_size x embedding_size. It is a simple lookup table: row n of that matrix stores the vector for word n.
So, if you have e.g. 1000 distinct words, your embedding layer needs to know this number in order to store 1000 vectors (as a matrix).
Don't confuse the internal storage of a layer with its input or output shape.
The input shape is (batch_size, sequence_length) where each entry is an integer in the range [0, vocab_size[. For each of these integers the layer will return the corresponding row (which is a vector of size embedding_size) of the internal matrix, so that the output shape becomes (batch_size, sequence_length, embedding_size).
In such setting, the dimensions/shapes of the tensors are the following:
The input tensor has size [batch_size, max_time_steps] such that each element of that tensor can have a value in the range 0 to vocab_size-1.
Then, each of the values from the input tensor pass through an embedding layer, that has a shape [vocab_size, embedding_size]. The output of the embedding layer is of shape [batch_size, max_time_steps, embedding_size].
Then, in a typical seq2seq scenario, this 3D tensor is the input of a recurrent neural network.
...
Here's how this is implemented in Tensorflow so you can get a better idea:
inputs = tf.placeholder(shape=(batch_size, max_time_steps), ...)
embeddings = tf.Variable(shape=(vocab_size, embedding_size], ...)
inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)
Now, the output of the embedding lookup table has the [batch_size, max_time_steps, embedding_size] shape.

LSTM Followed by Mean Pooling (TensorFlow)

I am aware that there is a similar topic at LSTM Followed by Mean Pooling, but that is about Keras and I work in pure TensorFlow.
I have an LSTM network where the recurrence is handled by:
outputs, final_state = tf.nn.dynamic_rnn(cell,
embed,
sequence_length=seq_lengths,
initial_state=initial_state)
where I pass the correct sequence lengths for each sample (padding by zeros). In any case, outputs contains irrelevant outputs since some samples produce longer outputs than others, based on sequence lengths.
Right now I'm extracting the last relevant output by means of the following method:
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.reduce_mean(tf.gather_nd(data, indices), axis=0)
where I pass sequence_length - 1 as indices. In reference to the last topic, I would like to select all relevant outputs followed by average pooling, instead of just the last one.
Now, I tried passing nested lists as indeces to extract_axis_1 but tf.stack does not accept this.
Any solution directions for this?
You can exploit the weight parameter of the tf.contrib.seq2seq.sequence_loss function.
From the documentation:
weights: A Tensor of shape [batch_size, sequence_length] and dtype float. weights constitutes the weighting of each prediction in the sequence. When using weights as masking, set all valid timesteps to 1 and all padded timesteps to 0, e.g. a mask returned by tf.sequence_mask.
You need to compute a binary mask that distinguish between your valid outputs and invalid ones. Then you can just provide this mask to the weights parameter of the loss function (probably, you will want to use a loss like this one); the function will not consider the outputs with a 0 weight in the computation of the loss.
If you can't/don't need to use a sequence loss you can do exactly the same thing manually. You compute a binarymask and then multiply your outputs by this mask and provide these as inputs to your fully connected layer.

Use coo_matrix in TensorFlow

I'm doing a Matrix Factorization in TensorFlow, I want to use coo_matrix from Spicy.sparse cause it uses less memory and it makes it easy to put all my data into my matrix for training data.
Is it possible to use coo_matrix to initialize a variable in tensorflow?
Or do I have to create a session and feed the data I got into tensorflow using sess.run() with feed_dict.
I hope that you understand my question and my problem otherwise comment and i will try to fix it.
The closest thing TensorFlow has to scipy.sparse.coo_matrix is tf.SparseTensor, which is the sparse equivalent of tf.Tensor. It will probably be easiest to feed a coo_matrix into your program.
A tf.SparseTensor is a slight generalization of COO matrices, where the tensor is represented as three dense tf.Tensor objects:
indices: An N x D matrix of tf.int64 values in which each row represents the coordinates of a non-zero value. N is the number of non-zeroes, and D is the rank of the equivalent dense tensor (2 in the case of a matrix).
values: A length-N vector of values, where element i is the value of the element whose coordinates are given on row i of indices.
dense_shape: A length-D vector of tf.int64, representing the shape of the equivalent dense tensor.
For example, you could use the following code, which uses tf.sparse_placeholder() to define a tf.SparseTensor that you can feed, and a tf.SparseTensorValue that represents the actual value being fed :
sparse_input = tf.sparse_placeholder(dtype=tf.float32, shape=[100, 100])
# ...
train_op = ...
coo_matrix = scipy.sparse.coo_matrix(...)
# Wrap `coo_matrix` in the `tf.SparseTensorValue` form that TensorFlow expects.
# SciPy stores the row and column coordinates as separate vectors, so we must
# stack and transpose them to make an indices matrix of the appropriate shape.
tf_coo_matrix = tf.SparseTensorValue(
indices=np.array([coo_matrix.rows, coo_matrix.cols]).T,
values=coo_matrix.data,
dense_shape=coo_matrix.shape)
Once you have converted your coo_matrix to a tf.SparseTensorValue, you can feed sparse_input with the tf.SparseTensorValue directly:
sess.run(train_op, feed_dict={sparse_input: tf_coo_matrix})

shape of a sparse tensor without invoking run()

sparse tensor.shape method returns a tensor object which seems to be of no use to extract the actual shape of the sparse tensor without resorting to run function.
To clarify what I mean, first consider a sparse tensor:
a = tf.SparseTensor(indices=[[0, 0, 0], [1, 2, 1]], values=[1.0+2j, 2.0], shape=[3, 4, 2])
a.shape returns:
tf.Tensor 'SparseTensor_1/shape:0' shape=(3,) dtype=int64
This is kind of no use.
Now, consider a dense tensor:
a = tf.constant(np.random.normal(0.0, 1.0, (4, 4)).astype(dtype=np.complex128))
a.get_shape() returns:
TensorShape([Dimension(4), Dimension(4)])
I can use this output and cast it into a list or tuple of integers without ever invoking run(). However, I cannot do the same for sparse tensor, unless I first convert sparse tensor to dense (which is not implemented for complex sparse tensor yet) and then call get_shape() method on it, but this is kind of redundant, defeats the purpose of using a sparse tensor in the first place and also leads to error down the road if the input sparse tensor is complex.
Is there a way to obtain the shape of a sparse tensor without invoking run() or converting it to a dense tensor first?
tf.SparseTensor is implemented as a triple of dense Tensors under the hood. The shape of a SparseTensor is just a Tensor; if you want to know its value, your best bet is to evaluate it using session.run:
print(sess.run(a.shape))
In general, Tensorflow does not promise to compute an exact shape even for dense tensors at graph construction time; shapes are best effort and may not even have a fixed value. So even for a dense Tensor you may have to evaluate the Tensor using run to get a precise shape.