My question is: Are the tf.nn.dynamic_rnn and keras.layers.RNN(cell) truly identical as stated in docs?
I am planning on building an RNN, however, it seems that tf.nn.dynamic_rnn is depricated in favour of Keras.
In particular, it states that:
Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future
version. Instructions for updating: Please use keras.layers.RNN(cell),
which is equivalent to this API
But I don't see how the APIs are equivalent, in the case of variable sequence lengths!
In raw TF, we can specify a tensor of shape (batch_size, seq_lengths). This way, if our sequence is [0, 1, 2, 3, 4] and the longest sequence in the batch is of size 10, we can pad it with 0s and [0, 1, 2, 3, 4, 0, 0, 0, 0, 0], we can say seq_length=5 to process [0, 1, 2, 3, 4].
However, in Keras, this is not how it works! What we can do, is specify the mask_zero=True in previous Layers, e.g. the Embedding Layer. This will also mask the 1st zero!
I can go around it by adding ones to the whole vector, but then thats extra preprocessing that I need to do after processing using tft.compute_vocabulary(), which maps vocabulary words to 0 indexed vector.
No, but they are (or can be made to be) not so different either.
TL;DR
tf.nn.dynamic_rnn replaces elements after the sequence end with 0s. This cannot be replicated with tf.keras.layers.* as far as I know, but you can get a similar behaviour with RNN(Masking(...) approach: it simply stops the computation and carries the last outputs and states forward. You will get the same (non-padding) outputs as those obtained from tf.nn.dynamic_rnn.
Experiment
Here is a minimal working example demonstrating the differences between tf.nn.dynamic_rnn and tf.keras.layers.GRU with and without the use of tf.keras.layers.Masking layer.
import numpy as np
import tensorflow as tf
test_input = np.array([
[1, 2, 1, 0, 0],
[0, 1, 2, 1, 0]
], dtype=int)
seq_length = tf.constant(np.array([3, 4], dtype=int))
emb_weights = (np.ones(shape=(3, 2)) * np.transpose([[0.37, 1, 2]])).astype(np.float32)
emb = tf.keras.layers.Embedding(
*emb_weights.shape,
weights=[emb_weights],
trainable=False
)
mask = tf.keras.layers.Masking(mask_value=0.37)
rnn = tf.keras.layers.GRU(
1,
return_sequences=True,
activation=None,
recurrent_activation=None,
kernel_initializer='ones',
recurrent_initializer='zeros',
use_bias=True,
bias_initializer='ones'
)
def old_rnn(inputs):
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(
rnn.cell,
inputs,
dtype=tf.float32,
sequence_length=seq_length
)
return rnn_outputs
x = tf.keras.layers.Input(shape=test_input.shape[1:])
m0 = tf.keras.Model(inputs=x, outputs=emb(x))
m1 = tf.keras.Model(inputs=x, outputs=rnn(emb(x)))
m2 = tf.keras.Model(inputs=x, outputs=rnn(mask(emb(x))))
print(m0.predict(test_input).squeeze())
print(m1.predict(test_input).squeeze())
print(m2.predict(test_input).squeeze())
sess = tf.keras.backend.get_session()
print(sess.run(old_rnn(mask(emb(x))), feed_dict={x: test_input}).squeeze())
The outputs from m0 are there to show the result of applying the embedding layer.
Note that there are no zero entries at all:
[[[1. 1. ] [[0.37 0.37]
[2. 2. ] [1. 1. ]
[1. 1. ] [2. 2. ]
[0.37 0.37] [1. 1. ]
[0.37 0.37]] [0.37 0.37]]]
Now here are the actual outputs from the m1, m2 and old_rnn architectures:
m1: [[ -6. -50. -156. -272.7276 -475.83362]
[ -1.2876 -9.862801 -69.314 -213.94202 -373.54672 ]]
m2: [[ -6. -50. -156. -156. -156.]
[ 0. -6. -50. -156. -156.]]
old [[ -6. -50. -156. 0. 0.]
[ 0. -6. -50. -156. 0.]]
Summary
The old tf.nn.dynamic_rnn used to mask padding elements with zeros.
The new RNN layers without masking run over the padding elements as if they were data.
The new rnn(mask(...)) approach simply stops the computation and carries the last outputs and states forward. Note that the (non-padding) outputs that I obtained for this approach are exactly the same as those from tf.nn.dynamic_rnn.
Anyway, I cannot cover all possible edge cases, but I hope that you can use this script to figure things out further.
Related
I'm trying to understand how tf.keras.metrics.AUC(multi_label=True) works. From the docs, I'm led to understand that when working with multi-label vectors, each class is computed individually, then averaged.
However, I can't seem to get the following trivial case to compute correctly. That is, if the prediction is the same as the expected vector, why is the output not 1.0?
y_true = [
[1, 0, 0, 0, 1],
]
acc = tf.keras.metrics.AUC(multi_label=True, num_labels=5)
acc.reset_state()
acc.update_state(tf.constant(y_true), tf.constant(y_true))
acc.result().numpy()
>>> 0.0
I'm trying to build a model which takes list of sparse tensors as input. (list length is equal to batch size)
The reason I use sparse tensor is that I have to pass adjacency matrix to my GNN model and it is very sparse. (~99%)
I'm familiar with using pytorch, and it is very easy to feed sparse tensor into the network.
However I found that I have to use tf.data.Dataset or keras.utils.Sequence for making dataset in tensorflow.
But those methods throw error to me when I use list of sparse tensors as input.
For example, code below makes TypeError
import tensorflow as tf
tf.data.Dataset.from_tensor_slices(sparse_lists)
TypeError: Neither a SparseTensor nor SparseTensorValue:
[<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2e25b5c0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c22ada0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c22a400>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed240>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed390>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed470>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed5c0>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed710>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed828>,
<tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7fbf2c1ed940>].
I know that it will work if I concat all sparse tensors in list as a huge tensor.
However it is not my option because I have to use indexing for sparse tensors later.
(If I concat 2D sparse tensors into 3D sparse tensors, I cannot use indexing like below)
Some3DSparseTensor[:10]
Also, it will take more time because I have to slice 3D tensors for matrix multiplication with other dense networks.
Furthermore, I know that it will be fine if I make sparse tensor by indices, values for every batch, but it would take too much time for each batch.
As a result, I want to make tf.data.Dataset to be able to generate batch from list of sparse tensors due to indexing, time issue.
Can anybody help me? :)
Long story short,
What I have: List of sparse tensors (e.g 1000000 length list)
What I need to do: Batch list of sparse tensors (e.g 1024 length list, not a sparse concat)
If the SparseTensors have the same dense_shape you can create a unique SparseTensor instead of a list and pass it to from_tensor_slices.
For example the following code produce separate SparseTensors from a large SparseTensor s splitting them along the first dimension
s = tf.sparse.SparseTensor(
indices=tf.constant([[0, 0, 0], [1, 0, 0], [1, 0, 1], [2, 1, 1]], dtype=tf.int64),
values=tf.range(4, dtype=tf.float32),
dense_shape=(3, 2, 2))
d = tf.data.Dataset.from_tensor_slices(s)
for t in d:
print(t)
>>> SparseTensor(indices=tf.Tensor([[0 0]], shape=(1, 2), dtype=int64), values=tf.Tensor([0.], shape=(1,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
SparseTensor(indices=tf.Tensor(
[[0 0]
[0 1]], shape=(2, 2), dtype=int64), values=tf.Tensor([1. 2.], shape=(2,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
SparseTensor(indices=tf.Tensor([[1 1]], shape=(1, 2), dtype=int64), values=tf.Tensor([3.], shape=(1,), dtype=float32), dense_shape=tf.Tensor([2 2], shape=(2,), dtype=int64))
To use from_tensor_slices in this way, you need a function to convert the list sparse_lists to a large SparseTensor s (reported below).
To recap, you can do
import tensorflow as tf
def sparse_list_to_sparse_tensor(sparse_lists):
n = len(sparse_lists)
shape = sparse_lists[0].dense_shape
out_shape = (n, *shape)
out_values = tf.concat([s.values for s in sparse_lists], axis=0)
out_indices = []
for i, s in enumerate(sparse_lists):
element_idx = tf.cast(tf.fill((s.indices.shape[0], 1), i), dtype=tf.int64)
out_indices.append(tf.concat([element_idx, s.indices], axis=1))
out_indices = tf.concat(out_indices, axis=0)
return tf.sparse.SparseTensor(out_indices, out_values, out_shape)
tf.data.Dataset.from_tensor_slices(sparse_list_to_sparse_tensor(sparse_lists))
An alternative solution uses from_tensor_slices on every sparse tensor (after the addition of a dummy batch dimension) to create many datasets with a single element that can be concatenated in a single dataset.
dataset = None
for sparse_tensor in sparse_list:
batched_sparse_tensor = tf.sparse.expand_dims(sparse_tensor, axis=0)
element_dataset = tf.data.Dataset.from_tensor_slices(batched_sparse_tensor)
if dataset is None:
dataset = element_dataset
else:
dataset = dataset.concatenate(element_dataset)
Notice that using this solution the sparse tensors can have different dense_shapes.
I was using Tensorflow 2.0 to build a super resolution model. During pre-processing, I wanted to crop both the low and high resolution images by a given patch size. In order to do so, I wanted to get the height and width of the low and high resolution images. But tf.shape(image) is returning None.
Is there a better approach?
Currently I am just resizing every image to some size before using tf.shape, but since not all images have equal size, it is affecting the quality of the imaged. Looking forward to your suggestions.
Edited part:
Here is some parts of the code
low_r = tf.io.decode_jpeg(lr_filename, channels=3)
low_r = tf.cast(low_r, dtype=tf.float32)
print(low_r.shape)
The print statement prints (None, None, 3)
What I wanted was to get the height and weight, like (240,360,3)
I'm not sure if this is also your case, but in my TensorFlow (v2.4.0rc2), my_tensor.shape() also returns TensorShape([None, None, None, None]). It is connected to the fact that the TensorShape tensor is generated during the build and not during the execution.
Using tf.shape() (mentioned in your question, but not used in your code snippet actually) solves it for me.
> my_tensor.shape()
TensorShape([None, None, None, None])
> tf.shape(my_tensor)
[10 512 512 8]
I'm unable to repeat your issue, but this should give you a way to test out your Tensorflow 2.0 install and compare with the results you're currently getting.
Create a tensor and check it's shape:
import tensorflow as tf
t = tf.constant([[[1, 1, 1], [2, 2, 2]], [[3, 3, 3], [4, 4, 4]]])
tf.shape(t) # [2, 2, 3]
Out[1]: <tf.Tensor: id=1, shape=(3,), dtype=int32, numpy=array([2, 2, 3])>
Next, checking what the function return when called:
tf_shape_var = tf.shape(t)
print(tf_shape_var)
Output:
tf.Tensor([2 2 3], shape=(3,), dtype=int32)
Finally, calling it on an int and string to get back a valid return:
tf.shape(1)
Out[10]: <tf.Tensor: id=12, shape=(0,), dtype=int32, numpy=array([], dtype=int32)>
tf.shape('asd')
Out[11]: <tf.Tensor: id=15, shape=(0,), dtype=int32, numpy=array([], dtype=int32)>
And the print statements:
print(tf.shape(1))
print(tf.shape('asd'))
Output:
tf.Tensor([], shape=(0,), dtype=int32)
tf.Tensor([], shape=(0,), dtype=int32)
Link for tf.shape() https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/shape
I'm trying to use the dynamic_rnn function in Tensorflow to speed up training. After doing some reading, my understanding is that one way to speed up training is to explicitly pass a value to the sequence_length parameter in this function. After a bit more reading, and finding this SO explanation, it seems like what I need to pass is a vector (maybe defined by a tf.placeholder) that contains the length of each sequence within a batch.
Here's where I'm confused: in order to take advantage of this, should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set? How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences? Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training? Any help/context would be appreciated.
should I pad each of my batches to the longest-length sequence within the batch instead of the longest-length sequence in the training set?
The sequences within a batch must be aligned, i.e., have to have the same length. So the general answer to your question is "yes". But different batches doesn't have to be of the same length, so you can stratify input sequences into groups that have roughly the same size and pad them accordingly. This technique is called bucketing and you can read about it in this tutorial.
How does Tensorflow handle the remaining zeros/pad-tokens in any of the shorter sequences?
Pretty much intuitive. tf.nn.dynamic_rnn returns two tensors: output and states. Suppose the actual sequence length is t and the padded sequence length is T.
Then the output will contain zeros after i > t and states will contain the t-th cell state, ignoring the states of trailing cells.
Here's an example:
import numpy as np
import tensorflow as tf
n_steps = 2
n_inputs = 3
n_neurons = 5
X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])
basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
sequence_length=seq_length, dtype=tf.float32)
X_batch = np.array([
# t = 0 t = 1
[[0, 1, 2], [9, 8, 7]], # instance 0
[[3, 4, 5], [0, 0, 0]], # instance 1
[[6, 7, 8], [6, 5, 4]], # instance 2
])
seq_length_batch = np.array([2, 1, 2])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
outputs_val, states_val = sess.run([outputs, states], feed_dict={
X: X_batch,
seq_length: seq_length_batch
})
print(outputs_val)
print()
print(states_val)
Note that instance 1 is padded, so outputs_val[1,1] is a zero vector and states_val[1] == outputs_val[1,0]:
[[[ 0.76686853 0.8707901 -0.79509073 0.7430128 0.63775384]
[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]]
[[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0. 0. 0. 0. 0. ]]
[[ 0.99999994 0.9982034 -0.9934515 0.43735617 0.1671598 ]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]]
[[ 1. 0.7427926 -0.9452815 -0.93113345 -0.94975543]
[ 0.9998851 0.98436266 -0.9620067 0.61259484 0.43135557]
[ 0.99999785 -0.5612586 -0.57177305 -0.9255771 -0.83750355]]
Also, is the main advantage here really speed, or just extra assurance that we're masking pad-tokens during training?
Of course, batch processing is more efficient, than feeding the sequences one by one. But the main advantage of specifying the length is that you get the reasonable state out of RNN, i.e., padded items don't affect the result tensor. You will get exactly the same result (and the same speed) if you don't set the length, but select the right states manually.
does anyone know how to use map_fn or any other tensorflow-func to do a computation on every combination of two input-tensors?
So what i want is something like this:
Having two arrays ([1,2] and [4,5]) i want as a result a matrix with the output of the computation (e.g. add) on every possible combination of the two arrays. So the result would be:
[[5,6],
[6,7]]
I used map_fn but this only takes the elements index-wise:
[[5]
[7]]
Has anyone an idea how implement this?
Thanks
You can add new unit dimensions to each Tensor, then rely on broadcasting addition:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
first = tf.constant([1, 2])
second = tf.constant([4, 5])
print(first[None, :] + second[:, None])
Prints:
tf.Tensor(
[[5 6]
[6 7]], shape=(2, 2), dtype=int32)